第一章--kali渗透-9.信息收集工具---命令行搜索工具这种工具会调用搜索引擎完成大量的并发搜索任务，如果足够细心并

这种工具会调用搜索引擎完成大量的并发搜索任务，如果足够细心并且实践够多，可能会发现像Google这样的搜索引擎，在网页搜索引擎内搜索依然是有限制性的，当你搜索量过于庞大，搜索次数过于频繁就会对你进行限制性操作，防止你对他们的网站进行爬虫

第一种工具:

theharvester -d sina.com -l 300 -b google

它的参数

Usage: theharvester options 

       -d: Domain to search or company name
       -b: data source: baidu, bing, bingapi, censys, crtsh, dogpile,
                        google, google-certificates, googleCSE, googleplus, google-profiles,
                        hunter, linkedin, netcraft, pgp, threatcrowd,
                        twitter, vhost, virustotal, yahoo, all
       -g: use Google dorking instead of normal Google search
       -s: start in result number X (default: 0)
       -v: verify host name via DNS resolution and search for virtual hosts
       -f: save the results into an HTML and XML file (both)
       -n: perform a DNS reverse query on all ranges discovered
       -c: perform a DNS brute force for the domain name
       -t: perform a DNS TLD expansion discovery
       -e: use this DNS server
       -p: port scan the detected hosts and check for Takeovers (80,443,22,21,8080)
       -l: limit the number of results to work with(Bing goes from 50 to 50 results,
            Google 100 to 100, and PGP doesn't use this option)
       -h: use SHODAN database to query discovered hosts

Examples:
        theharvester -d microsoft.com -l 500 -b google -f myresults.html
        theharvester -d microsoft.com -b pgp, virustotal
        theharvester -d microsoft -l 200 -b linkedin
        theharvester -d microsoft.com -l 200 -g -b google
        theharvester -d apple.com -b googleCSE -l 500 -s 300
        theharvester -d cornell.edu -l 100 -b bing -h

他这个参数最重要的就是-d，因为需要指定搜索的域名

在他搜索前，我们还需要通过-b来指定搜索引擎，无需加搜索引擎后缀，它还可以指定社交媒体，例如推特，书脸，有一些网站会在这些媒体进行发布一些信息，这些信息都会被搜索到

我们还需要在指定他的线程数量 -l 如果我们不进行限制的话，他搜索的数量会比较多，如果被搜索引擎检测到同样会被限制搜索，不光Google，百度推特 yandex等都会进行限制如果进行限制他默认会每次并发50个结果，哪怕我们指定的线程比50大他默认也是并发50个线程(搜索50个结果输出以后会再次搜索50个结果，依次类推)

如果我们要使用SHODAN查询对方主机的物联网信息，就是用-h参数

这个工具在kali内也是已经集成过了，工具命令较长，可以使用补全(Tab)，命令实例如下

root@kali:~# theharvester -d sina.com -l 300 -b bing

Warning: Pycurl is not compiled against Openssl. Wfuzz might not work correctly when fuzzing SSL sites. Check Wfuzz's documentation for more information.


*******************************************************************
*                                                                 *
* | |_| |__   ___    /\  /\__ _ _ ____   _____  ___| |_ ___ _ __  *
* | __| '_ \ / _ \  / /_/ / _` | '__\ \ / / _ \/ __| __/ _ \ '__| *
* | |_| | | |  __/ / __  / (_| | |   \ V /  __/\__ \ ||  __/ |    *
*  \__|_| |_|\___| \/ /_/ \__,_|_|    \_/ \___||___/\__\___|_|    *
*                                                                 *
* theHarvester Ver. 3.0.6                                         *
* Coded by Christian Martorella                                   *
* Edge-Security Research                                          *
* cmartorella@edge-security.com                                   *
*******************************************************************


found supported engines
[-] Starting harvesting process for domain: sina.com

[-] Searching in Bing:
	Searching 50 results...
	Searching 100 results...
	Searching 150 results...
	Searching 200 results...
	Searching 250 results...
	Searching 300 results...

Harvesting results
No IP addresses found


[+] Emails found:
------------------
No emails found
 
[+] Hosts found in search engines:
------------------------------------

Total hosts: 6

[-] Resolving hostnames IPs... 
 
finance.sina.com:10.10.10.10
mail.sina.com:123.126.45.14
mil.news.sina.com:empty
news.sina.com:10.10.10.10
sports.sina.com:10.10.10.10
www.sina.com:61.158.251.244

它在bing搜索引擎中分别50个50个进行搜索，没有找到sina新浪的IP地址，也没有找到发布的电子邮件，但是他找到了新浪的主机，找到了6个，并帮助我们直接解析了他们主机的IP地址和DNS

使用这个这些东西都需要翻墙，翻墙请查看我的其他文章

theharvester这个工具本身并不支持代理，如果你使用的并不是路由全局代理那么需要使用到另一种工具来进行设置代理proxychains工具，也可以使用pptp进行全局代理

下面是theharvester的使用

在这个工具内-l参数代表并发，之前提到过，这个命令不要经常性的使用，因为当你-l参数过大时，谷歌会认为你是在用爬虫软件爬取它的数据库，会被谷歌屏蔽掉，它提供的搜索引擎有很多，当我们被屏蔽掉时，我们可以使用其他的搜索引擎搜索，我们现在随机使用另一种引擎 linkedin

theharvester  -d youku.com -l 50 -b linkedin

它这种搜索方式是先调用google搜索引擎到linkedin在进行搜索

这就是theharvester 搜索方式，它是通过命令行，并发大量线程搜索方式(不要经常使用，使用时注意线程数量，否则会被屏蔽)它的优点在于搜索效率特别高，比我们手动搜索效率要高很多

第二种工具：metagoofil

这个工具在kali内被移除了，需要进行安装

apt-get install metagoofil

下面是它的参数

optional arguments:
  -h, --help            show this help message and exit
  -d DOMAIN             Domain to search.
  -e DELAY              Delay (in seconds) between searches. If it's too small
                        Google may block your IP, too big and your search may
                        take a while. DEFAULT: 30.0
  -f                    Save the html links to html_links_.txt
                        file.
  -i URL_TIMEOUT        Number of seconds to wait before timeout for
                        unreachable/stale pages. DEFAULT: 15
  -l SEARCH_MAX         Maximum results to search. DEFAULT: 100
  -n DOWNLOAD_FILE_LIMIT
                        Maximum number of files to download per filetype.
                        DEFAULT: 100
  -o SAVE_DIRECTORY     Directory to save downloaded files. DEFAULT is cwd,
                        "."
  -r NUMBER_OF_THREADS  Number of search threads. DEFAULT: 8
  -t FILE_TYPES         file_types to download
                        (pdf,doc,xls,ppt,odp,ods,docx,xlsx,pptx). To search
                        all 17,576 three-letter file extensions, type "ALL"
  -u [USER_AGENT]       User-Agent for file retrieval against -d domain.
                        no -u = "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
                        -u = Randomize User-Agent
                        -u "My custom user agent 2.0" = Your customized User-Agent
  -w                    Download the files, instead of just viewing search
                        results.

它的参数比较少，下面来介绍常用的一些

-d 指定域名

-t 指定搜索并下载文件类型(pdf、doc、xls、ppt、odp、ods、docx、xlsx、pptx等)，如果下载所有的文件类型就输入all，它会将世界上所有的扩展名文件下载下来，一共17576种类型文件

-e 指定每次搜索延迟(秒单位)，如果太大Google会对你进行限制，不指定默认30秒延迟

-f 指定输出文件，下载的结果都输出在这个文件内，html格式注意:这个参数不能指定文件名

-i 指定搜索时等待超时时间，不指定默认15秒等待

-l 指定搜索结果数量，不指定默认100结果

-n 指定每个文件类型下载数量，不指定默认100

-o 指定搜索结束后保存到的目录然后目录里面保存下载的文件，不指定目录默认cwd

-r 指定每次搜索线程数量，不指定默认值是8

-w 指定搜索到的文件下载下来，而不是只查看搜索到的结果

我们接下来，搜索一下

metagoofil -d baidu.com -t txt,pdf -l 50 -n 50  -o test -f

上方搜索是指定百度域名，查找的文件类型分别是txt和pdf，搜索的结果限制到50条，搜索到的txt和pdf下载数量最大是50个，然后将下载的文件都保存在test目录文件内，最后将搜索的内容保存在html开头txt结尾的文件内

需要注意的是，这个命令是需要通过谷歌进行搜索的，如果没有代理这个命令就无法使用，搜索时同样不能大量搜索，否则还是会被屏蔽

转载请注明出处!!!