Python操作ElasticsearchElasticsearch是一个基于Lucene的分布式搜索服务器。Lucen

Elasticsearch是一个基于Lucene的分布式搜索服务器。Lucene是一个非常优秀的全文搜索引擎，是Apache的顶级项目。在Lucene上面可以开发出各种全文搜索的应用。

1. 安装elasticsearch包\

pip install elasticsearch就可以

2. 建立连接

比如，elasticsearch是部署在如下连个服务器上的，在建立连接的时候，就通过这样的方式进行访问。

from elasticsearch import Elasticsearches = Elasticsearch(['192.168.110', '192.168.120'])es.count()

输出结果是：{'_shards': {'failed': 0, 'successful': 380, 'total': 380}, 'count': 326714007}

3. 查询记录

count

(1) 查询全量数据，统计记录数

es.search(index='cleaned', body={"query":{"match_all":{}}})

结果输出：

{'_shards': {'failed': 0, 'successful': 5, 'total': 5}, 'count': 103040179}

(2) 查询满足一定条件的记录，统计数量

es.search(index='cleaned',body={"query":{"match_phrase":{"ABS":"machine learning"}}})

结果输出：

{'_shards': {'failed': 0, 'successful': 5, 'total': 5}, 'count': 2752}

match

(1) 查询全量数据，使用match_all，value值为空

es.search(index='cleaned', body={"query":{"match_all":{}}})

(2) 查询整个关键词，使用match_phrase

es.search(index='cleaned', body={"query":{"match_phrase":{"machine learning"}}})

elasticsearch不区分大小写，查出的结果可能是包含machine learning，也可能包含Machine Learning。

(3) 自动分词查询，使用match

es.search(index='cleaned', body={"query":{"match":{"machine learning"}}})

这样命中的结果，可能是只命中了machine，也可能是只命中了learning。

filter_path

过滤功能，若缺省这个参数，结果会返回所有字段，类似结构化查询中select *的功能。设置改参数即可选择输出结果中包含的字段，类似mongo的通过设置字段值为1，实现指定输出字段的功能。

es.search(index='cleaned',body={"query":{"match_phrase":{"content":"machine learning"}}},filter_path=['hits.hits._source.PAO'])

表示输出的结果只包含hits->hits->_source下的PAO字段。输出结果如下所示：

query_string

es.search(index='wipoclean',body={"query":{'query_string':{'query':'CN1524236A','fields':['UID']}}})

这里的field指定了搜索的字段，如果不使用field字段，elasticsearch会默认自动生成名字为"_all"的特殊字段，来基于所有文档中的各个字段匹配搜索，即'fields' : '_all'。当然也可以人工将搜索的字段范围设置为'fields':['UID', '_id']，搜索指定的两个字段。

这两个查询语句是等价的。小结一下就是使用match的话，需要指定检索字段，而且只能指定一个字段。使用query_string的话，检索字段fields可以指定，可以不指定，不指定的话会检索所有字段，最终的检索结果可能与期望的相比有偏差。

match要实现检索多个字段，需要通过multi_match来实现，这两句是等价的。如果要检索全部字段的话，只能选择query_string。

欢迎关注“数据分析师手记”微信公众号