在处理从Twitter流式API收集到的JSON字符串时,可能会遇到无法正确解析字符串的问题。例如,以下字符串无法被正确解析:
{
"created_at": "Mon Mar 11 20:15:36 +0000 2013",
"id": 311208808837951488,
"id_str": "311208808837951488",
"text": "ALIENS ENTRATE E' IMPORTANTE!!!\n\n\n\nMTV's Musical March Madness ritorna il 18 marzo...Siete pronti A http://t.co/ABXEfquTJw via @Hopee_dream",
"source": "\u003ca href="http://twitter.com/download/android" rel="nofollow"\u003eTwitter for Android\u003c/a\u003e",
"truncated": false,
"in_reply_to_status_id": null,
"in_reply_to_status_id_str": null,
"in_reply_to_user_id": null,
"in_reply_to_user_id_str": null,
"in_reply_to_screen_name": null,
"user": {
"id": 1025970793,
"id_str": "1025970793",
"name": "Tom's Perfection\u2665",
"screen_name": "_MyGreenEyes_",
"location": "",
"url": null,
"description": "Angel,don't you cry,i'll meet you on the other side.\u2661",
"protected": false,
"followers_count": 387,
"friends_count": 520,
"listed_count": 1,
"created_at": "Fri Dec 21 08:39:17 +0000 2012",
"favourites_count": 174,
"utc_offset": null,
"time_zone": null,
"geo_enabled": true,
"verified": false,
"statuses_count": 772,
"lang": "it",
"contributors_enabled": false,
"is_translator": false,
"profile_background_color": "C0DEED",
"profile_background_image_url": "http://a0.twimg.com/images/themes/theme1/bg.png",
"profile_background_image_url_https": "https://si0.twimg.com/images/themes/theme1/bg.png",
"profile_background_tile": false,
"profile_image_url": "http://a0.twimg.com/profile_images/3363059730/3d791e51eefa800150cd99917abc1d2c_normal.jpeg",
"profile_image_url_https": "https://si0.twimg.com/profile_images/3363059730/3d791e51eefa800150cd99917abc1d2c_normal.jpeg",
"profile_banner_url": "https://si0.twimg.com/profile_banners/1025970793/1362500832",
"profile_link_color": "0084B4",
"profile_sidebar_border_color": "C0DEED",
"profile_sidebar_fill_color": "DDEEF6",
"profile_text_color": "333333",
"profile_use_background_image": true,
"default_profile": true,
"default_profile_image": false,
"following": null,
"follow_request_sent": null,
"notifications": null
},
"geo": null,
"coordinates": null,
"place": null,
"contributors": null,
"retweet_count": 0,
"entities": {
"hashtags": [],
"urls": [
{
"url": "http://t.co/ABXEfquTJw",
"expanded_url": "http://tl.gd/l9f5j7",
"display_url": "tl.gd/l9f5j7",
"indices": [
101,
123
]
}
],
"user_mentions": []
},
"favorited": false,
"retweeted": false,
"possibly_sensitive": false,
"filter_level": "medium"
}
尝试使用json.loads()解析此字符串时,可能会遇到如下错误:
json.decoder.JSONDecodeError: Expecting value: line 5 column 1 (char 107)
2、解决方案
造成此问题的原因是字符串中包含非法JSON字符,例如换行符(\n)。为了解决此问题,需要对字符串进行处理,以删除非法字符。
可以使用以下方法之一来删除非法字符:
1、使用str.strip()方法
json_string = json_string.strip()
str.strip()方法可以删除字符串开头和结尾的空格符。
2、使用正则表达式
import re
json_string = re.sub(r'\n', '', json_string)
re.sub(r'\n', '', json_string)可以删除字符串中的所有换行符。
3、使用string.replace()方法
json_string = json_string.replace('\n', '')
string.replace('\n', '')可以删除字符串中的所有换行符。
删除非法字符后,就可以使用json.loads()方法来解析字符串了。
import json
json_dict = json.loads(json_string)
代码示例
import json
# 从Twitter流式API收集到的JSON字符串
json_string = """{
"created_at": "Mon Mar 11 20:15:36 +0000 2013",
"id": 311208808837951488,
"id_str": "311208808837951488",
"text": "ALIENS ENTRATE E' IMPORTANTE!!!\n\n\n\nMTV's Musical March Madness ritorna il 18 marzo...Siete pronti A http://t.co/ABXEfquTJw via @Hopee_dream",
"source": "\u003ca href="http://twitter.com/download/android" rel="nofollow"\u003eTwitter for Android\u003c/a\u003e",
"truncated": false,
"in_reply_to_status_id": null,
"in_reply_to_status_id_str": null,
"in_reply_to_user_id": null,
"in_reply_to_user_id_str": null,
"in_reply_to_screen_name": null,
"user": {
"id": 1025970793,
"id_str": "1025970793",
"name": "Tom's Perfection\u2665",
"screen_name": "_MyGreenEyes_",
"location": "",
"url": null,
"description": "Angel,don't you cry,i'll meet you on the other side.\u2661",
"protected": false,
"followers_count": 387,
"friends_count": 520,
"listed_count": 1,
"created_at": "Fri Dec 21 08:39:17 +0000 2012",
"favourites_count": 174,
"utc_offset": null,
"time_zone": null,
"geo_enabled": true,
"verified": false,
"statuses_count": 772,
"lang": "it",