常见爬虫框架

2,075 阅读1分钟

排名前50的开源Web爬虫

项目名 开发语言
平台
HeritrixJavaLinux
NutchJavaCross-platform
ScrapyPythonCross-platform
DataparkSearchC++Cross-platform
GNU WgetCLinux
GRUBC#, C, Python, PerlCross-platform
ht://DigC++Unix
HTTrackC/C++Cross-platform
ICDL CrawlerC++Cross-platform
mnoGoSearchCWindows
Norconex HTTP CollectorJavaCross-platform
Open Source ServerC/C++, Java PHPCross-platform
PHP-CrawlerPHPCross-platform
YaCyJavaCross-platform
WebSPHINXJavaCross-platform
WebLechJavaCross-platform
AraleJavaCross-platform
JSpiderJavaCross-platform
HyperSpiderJavaCross-platform
ArachnidJavaCross-platform
SpindleJavaCross-platform
SpiderJavaCross-platform
LARMJavaCross-platform
MetisJavaCross-platform
SimpleSpiderJavaCross-platform
GrunkJavaCross-platform
CAPEKJavaCross-platform
ApertureJavaCross-platform
Smart and Simple Web CrawlerJavaCross-platform
Web HarvestJavaCross-platform
AspseekC++Linux
BixoJavaCross-platform
crawler4jJavaCross-platform
EbotErlandLinux
HounderJavaCross-platform
Hyper EstraierC/C++Cross-platform
OpenWebSpiderC#, PHPCross-platform
PavukCLunix
SphiderPHPCross-platform
XapianC++Cross-platform
Arachnode.netC#Windows
CrawwwlerC++Java
Distributed Web CrawlerC, Java, PythonCross-platform
iCrawlerJavaCross-platform
pycreepJavaCross-platform
OpeseC++Linux
AndjingJava
CcrawlerC#Windows
WebEaterJavaCross-platform
JoBoJavaCross-platform