Python中非法JSON字符导致无法解析Json字符串

117 阅读3分钟

在处理从Twitter流式API收集到的JSON字符串时,可能会遇到无法正确解析字符串的问题。例如,以下字符串无法被正确解析:

{
  "created_at": "Mon Mar 11 20:15:36 +0000 2013",
  "id": 311208808837951488,
  "id_str": "311208808837951488",
  "text": "ALIENS ENTRATE E' IMPORTANTE!!!\n\n\n\nMTV's Musical March Madness ritorna il 18 marzo...Siete pronti A http://t.co/ABXEfquTJw via @Hopee_dream",
  "source": "\u003ca href="http://twitter.com/download/android" rel="nofollow"\u003eTwitter for Android\u003c/a\u003e",
  "truncated": false,
  "in_reply_to_status_id": null,
  "in_reply_to_status_id_str": null,
  "in_reply_to_user_id": null,
  "in_reply_to_user_id_str": null,
  "in_reply_to_screen_name": null,
  "user": {
    "id": 1025970793,
    "id_str": "1025970793",
    "name": "Tom's Perfection\u2665",
    "screen_name": "_MyGreenEyes_",
    "location": "",
    "url": null,
    "description": "Angel,don't you cry,i'll meet you on the other side.\u2661",
    "protected": false,
    "followers_count": 387,
    "friends_count": 520,
    "listed_count": 1,
    "created_at": "Fri Dec 21 08:39:17 +0000 2012",
    "favourites_count": 174,
    "utc_offset": null,
    "time_zone": null,
    "geo_enabled": true,
    "verified": false,
    "statuses_count": 772,
    "lang": "it",
    "contributors_enabled": false,
    "is_translator": false,
    "profile_background_color": "C0DEED",
    "profile_background_image_url": "http://a0.twimg.com/images/themes/theme1/bg.png",
    "profile_background_image_url_https": "https://si0.twimg.com/images/themes/theme1/bg.png",
    "profile_background_tile": false,
    "profile_image_url": "http://a0.twimg.com/profile_images/3363059730/3d791e51eefa800150cd99917abc1d2c_normal.jpeg",
    "profile_image_url_https": "https://si0.twimg.com/profile_images/3363059730/3d791e51eefa800150cd99917abc1d2c_normal.jpeg",
    "profile_banner_url": "https://si0.twimg.com/profile_banners/1025970793/1362500832",
    "profile_link_color": "0084B4",
    "profile_sidebar_border_color": "C0DEED",
    "profile_sidebar_fill_color": "DDEEF6",
    "profile_text_color": "333333",
    "profile_use_background_image": true,
    "default_profile": true,
    "default_profile_image": false,
    "following": null,
    "follow_request_sent": null,
    "notifications": null
  },
  "geo": null,
  "coordinates": null,
  "place": null,
  "contributors": null,
  "retweet_count": 0,
  "entities": {
    "hashtags": [],
    "urls": [
      {
        "url": "http://t.co/ABXEfquTJw",
        "expanded_url": "http://tl.gd/l9f5j7",
        "display_url": "tl.gd/l9f5j7",
        "indices": [
          101,
          123
        ]
      }
    ],
    "user_mentions": []
  },
  "favorited": false,
  "retweeted": false,
  "possibly_sensitive": false,
  "filter_level": "medium"
}

尝试使用json.loads()解析此字符串时,可能会遇到如下错误:

json.decoder.JSONDecodeError: Expecting value: line 5 column 1 (char 107)

2、解决方案

造成此问题的原因是字符串中包含非法JSON字符,例如换行符(\n)。为了解决此问题,需要对字符串进行处理,以删除非法字符。

可以使用以下方法之一来删除非法字符:

1、使用str.strip()方法

json_string = json_string.strip()

str.strip()方法可以删除字符串开头和结尾的空格符。

2、使用正则表达式

import re
json_string = re.sub(r'\n', '', json_string)

re.sub(r'\n', '', json_string)可以删除字符串中的所有换行符。

3、使用string.replace()方法

json_string = json_string.replace('\n', '')

string.replace('\n', '')可以删除字符串中的所有换行符。

删除非法字符后,就可以使用json.loads()方法来解析字符串了。

import json
json_dict = json.loads(json_string)

代码示例

import json

# 从Twitter流式API收集到的JSON字符串
json_string = """{
  "created_at": "Mon Mar 11 20:15:36 +0000 2013",
  "id": 311208808837951488,
  "id_str": "311208808837951488",
  "text": "ALIENS ENTRATE E' IMPORTANTE!!!\n\n\n\nMTV's Musical March Madness ritorna il 18 marzo...Siete pronti A http://t.co/ABXEfquTJw via @Hopee_dream",
  "source": "\u003ca href="http://twitter.com/download/android" rel="nofollow"\u003eTwitter for Android\u003c/a\u003e",
  "truncated": false,
  "in_reply_to_status_id": null,
  "in_reply_to_status_id_str": null,
  "in_reply_to_user_id": null,
  "in_reply_to_user_id_str": null,
  "in_reply_to_screen_name": null,
  "user": {
    "id": 1025970793,
    "id_str": "1025970793",
    "name": "Tom's Perfection\u2665",
    "screen_name": "_MyGreenEyes_",
    "location": "",
    "url": null,
    "description": "Angel,don't you cry,i'll meet you on the other side.\u2661",
    "protected": false,
    "followers_count": 387,
    "friends_count": 520,
    "listed_count": 1,
    "created_at": "Fri Dec 21 08:39:17 +0000 2012",
    "favourites_count": 174,
    "utc_offset": null,
    "time_zone": null,
    "geo_enabled": true,
    "verified": false,
    "statuses_count": 772,
    "lang": "it",