Elasticsearch基础知识提升这是我参与8月更文挑战的第15天，活动详情查看：8月更文挑战简介 Elastic

这是我参与8月更文挑战的第15天，活动详情查看：8月更文挑战

简介

Elasticsearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎，基于RESTful web接口。Elasticsearch是用Java语言开发的，并作为Apache许可条款下的开放源码发布，是一种流行的企业级搜索引擎。Elasticsearch用于云计算中，能够达到实时搜索，稳定，可靠，快速，安装使用方便。官方客户端在Java、.NET（C#）、PHP、Python、Apache Groovy、Ruby和许多其他语言中都是可用的。根据DB-Engines的排名显示，Elasticsearch是最受欢迎的企业搜索引擎，其次是Apache Solr，也是基于Lucene。

ES核心概念

ElasticSearch是面向文档型的数据库，一条数据在这里就是一个文档。比如：

{
    "name" :     "John",
    "sex" :      "Male",
    "age" :      25,
    "birthDate": "1990/05/01",
    "about" :    "I love to go rock climbing",
    "interests": [ "sports", "music" ]
}

在MySql中这样的数据存储容易想到建立一张User表，其中有一些字段，而在es中就是一个文档，文档会属于一个User类型，各种各样的类型存储于一个索引中。下表是关系型数据库和es的疏于对照表：

关系型数据库	ElasticSearch
数据库	索引
表	type
行	document
列	field

es中可以包含多个索引（数据库），每个索引中可以包含多个类型（表），每个类型下又包含多个文档（行），每个文档又包含多个字段（列）。

物理设计：

es在后台把每个索引划分成多个分片，每个分片可以在集群中的不同服务器中转移。

逻辑设计：

一个索引类型，包含多个文档，当我们索引一篇文档时，可以通过这样的顺序找到他：索引-》类型-》文档id（该id实际是个字符串），通过这个组合我们就能索引到某个具体的文档。

创建 Index

创建一个名为 news 的索引：
from elasticsearch import Elasticsearch
 
es = Elasticsearch()
result = es.indices.create(index='news', ignore=400)
print(result)

如果创建成功，会返回如下结果：

{
    "acknowledged":true,
    "shards_acknowledged":true,
    "index":"news"
}

返回结果是 JSON 格式，其中的 acknowledged 字段表示创建操作执行成功。

但这时如果我们再把代码执行一次的话，就会返回如下结果：

{
    "error":{
        "root_cause":[
            {
                "type":"resource_already_exists_exception",
                "reason":"index [news/QM6yz2W8QE-bflKhc5oThw] already exists",
                "index_uuid":"QM6yz2W8QE-bflKhc5oThw",
                "index":"news"
            }
        ],
        "type":"resource_already_exists_exception",
        "reason":"index [news/QM6yz2W8QE-bflKhc5oThw] already exists",
        "index_uuid":"QM6yz2W8QE-bflKhc5oThw",
        "index":"news"
    },
    "status":400
}

它提示创建失败，status 状态码是 400，错误原因是 Index 已经存在了。

注意这里我们的代码里面使用了 ignore 参数为 400，这说明如果返回结果是 400 的话，就忽略这个错误不会报错，程序不会执行抛出异常。

假如我们不加 ignore 这个参数的话：

es = Elasticsearch()
result = es.indices.create(index='news')
print(result)

再次执行就会报错了：

raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: TransportError(400, 'resource_already_exists_exception', 'index [news/QM6yz2W8QE-bflKhc5oThw] already exists')

这样程序的执行就会出现问题，所以说，我们需要善用 ignore 参数，把一些意外情况排除，这样可以保证程序的正常执行而不会中断。

删除 Index

删除 Index 也是类似的，代码如下：

from elasticsearch import Elasticsearch
 
es = Elasticsearch()
result = es.indices.delete(index='news', ignore=[404])
print(result)

这里也是使用了 ignore 参数，来忽略 Index 不存在而删除失败导致程序中断的问题。

如果删除成功，会输出如下结果：

{
    "acknowledged":true
}

插入数据

Elasticsearch 就像 MongoDB 一样，在插入数据的时候可以直接插入结构化字典数据。

from elasticsearch import Elasticsearch

es = Elasticsearch()
es.indices.create(index='news', ignore=400)

data = {'title': '美国留给伊拉克的是个烂摊子吗', 'url': 'http://view.news.qq.com/zt2011/usa_iraq/index.htm'}
result = es.create(index='news', id=1, body=data)
print(result)

调用 create() 方法时，index 参数代表了索引名称，body 则代表了文档具体内容，id 则是数据的唯一标识 ID。

运行结果如下：

{
    "_index":"news",
    "_type":"_doc",
    "_id":"1",
    "_version":1,
    "result":"created",
    "_shards":{
        "total":2,
        "successful":1,
        "failed":0
    },
    "_seq_no":0,
    "_primary_term":1
}

结果中 result 字段为 created，代表该数据插入成功。