在使用 Python 从 Google 搜索中下载图像时遇到以下错误:
Traceback (most recent call last):
File "C:\Python27\img_google3.py", line 37, in <module>
dataInfo = data['results']
TypeError: 'NoneType' object has no attribute 'getitem'
想要使用 Python 下载图像,作为训练神经网络进行图像分类的一部分。
解决方案
-
检查结果
错误消息表明
results['responseData']为None。可以通过打印results来检查实际获取的内容,以确定如何访问想要的数据。 -
检查搜索结果范围
在错误发生时,会获取以下结果:
{u'responseData': None, # hence the error u'responseDetails': u'out of range start', # what went wrong u'responseStatus': 400} # http response code for "Bad request"错误是由于加载了一个超出搜索结果范围的 URL。在较低的数字下,可以从
results中获得合理的内容。 -
检查响应状态
可以通过检查
results["responseStatus"]的值来检查响应状态是否为 200(表示响应正常),如果是,则可以继续执行下载图像的操作。 -
简化 URL 构建代码
可以使用 Python 的
format()函数来简化 URL 构建代码,使其更易读和维护。例如,可以将以下代码:url = ('https://ajax.googleapis.com/ajax/services/search/images?' + 'v=1.0&q='+searchTerm+'&start='+str(i*10)+'&userip=MyIP')替换为:
template = 'https://ajax.googleapis.com/ajax/services/search/images?v=1.0&q={}&start={}&userip=MyIP' url = template.format(searchTerm, str(i * 10)) -
代码示例
以下是一个完整的 Python 代码示例,可以从 Google 搜索中下载图像:
import os import sys import time from urllib import FancyURLopener import urllib2 import simplejson # Define search term searchTerm = "parrot" # Replace spaces ' ' in search term for '%20' in order to comply with request searchTerm = searchTerm.replace(' ', '%20') # Start FancyURLopener with defined version class MyOpener(FancyURLopener): version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11' myopener = MyOpener() # Set count to 0 count = 0 for i in range(0, 10): # Notice that the start changes for each iteration in order to request a new set of images for each loop url = ('https://ajax.googleapis.com/ajax/services/search/images?' + 'v=1.0&q=' + searchTerm + '&start=' + str(i * 10) + '&userip=MyIP') print(url) request = urllib2.Request(url, None, {'Referer': 'testing'}) # Get results using JSON results = simplejson.load(urllib2.urlopen(request)) # Check response status if results["responseStatus"] == 200: data = results['responseData'] dataInfo = data['results'] # Iterate for each result and get unescaped url for myUrl in dataInfo: count = count + 1 my_url = myUrl['unescapedUrl'] myopener.retrieve(myUrl['unescapedUrl'], str(count) + '.jpg') print(count, "images downloaded.")