一、CRUD
在开发过程中,主要都在围绕着数据的CRUD进行处理,具体来说就是:
- C – Create
- R – Retrieve or Read
- U – Update
- D – Delete
下表将每个CRUD命令与其各自的ElasticSearch HTTP/REST命令进行了一一对应,
| CRUD command | HTTP/REST command |
|---|---|
| Create | PUT or POST |
| Read | GET |
| Update | PUT or POST |
| Delete | DELETE |
1.创建索引:
PUT /index_name
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
}
}
2.查看索引
GET /index_name
3.删除索引
DELETE /index_name
4.创建文档
POST /index_name/_doc
{
"field1": "value1",
"field2": "value2"
}
5.获取文档
GET /index_name/_doc/doc_id
6.更新文档
POST /index_name/_doc/doc_id/_update
{
"doc": {
"field1": "new_value1"
}
}
7、删除文档:
DELETE /index_name/_doc/doc_id
二、CRUD 详细示例讲解
1、添加文档
1.1、指定文档ID
PUT blog/_doc/1
{
"title":"1、VMware Workstation虚拟机软件安装图解",
"author":"chengyuqiang",
"content":"1、VMware Workstation虚拟机软件安装图解...",
"url":"http://x.co/6nc81"
}
Elasticsearch服务会返回一个JSON格式的响应。
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 2
}
响应结果说明:
- _index:文档所在的索引名。
- _type:文档所在的类型名。
- _id:文档ID。
- _version:文档的版本。
- result:created已经创建。
- _shards: _shards表示索引操作的复制过程的信息。
- total:指示应在其上执行索引操作的分片副本(主分片和副本分片)的数量。
- successful:表示索引操作成功的分片副本数。
- failed:在副本分片上索引操作失败的情况下包含复制相关错误。
1.2、不指定文档ID
添加文档时可以不指定文档id,则文档id是自动生成的字符串。注意,需要使用POST方法,而不是PUT方法。
POST blog/_doc
{
"title":"2、Linux服务器安装图解",
"author":"chengyuqiang",
"content":"2、Linux服务器安装图解解...",
"url":"http://x.co/6nc82"
}
输出:
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "5P2-O2gBNSQY7o-KMw2P",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 1,
"_primary_term" : 1
}
2、获取文档
2.1、通过文档id获取指定的文档
GET blog/_doc/1
输出:
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source" : {
"title" : "1、VMware Workstation虚拟机软件安装图解",
"author" : "chengyuqiang",
"content" : "1、VMware Workstation虚拟机软件安装图解...",
"url" : "http://x.co/6nc81"
}
}
响应结果说明:
found值为true,表明查询到该文档。_source字段是文档的内容。
2.2、文档不存在的情况
GET blog/_doc/2
输出:
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "2",
"found" : false
}
found字段值为false表明查询的文档不存在。
2.3、判定文档是否存在
HEAD blog/_doc/1
输出:
200 - OK
3、更新文档
3.1、更改id为1的文档,删除了author,修改content字段。
PUT blog/_doc/1
{
"title":"1、VMware Workstation虚拟机软件安装图解",
"content":"下载得到VMware-workstation-full-15.0.2-10952284.exe可执行文件...",
"url":"http://x.co/6nc81"
}
输出:
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "1",
"_version" : 2,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 1,
"_primary_term" : 1
}
_version更新为2,查看该文档GET blog/_doc/1输出:
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "1",
"_version" : 2,
"found" : true,
"_source" : {
"title" : "1、VMware Workstation虚拟机软件安装图解",
"content" : "下载得到VMware-workstation-full-15.0.2-10952284.exe可执行文件...",
"url" : "http://x.co/6nc81"
}
}
3.2、添加文档时,防止覆盖已存在的文档,可以通过_create加以限制
PUT blog/_doc/1/_create
{
"title":"1、VMware Workstation虚拟机软件安装图解",
"content":"下载得到VMware-workstation-full-15.0.2-10952284.exe可执行文件...",
"url":"http://x.co/6nc81"
}
该文档已经存在,添加失败。
{
"error": {
"root_cause": [
{
"type": "version_conflict_engine_exception",
"reason": "[_doc][1]: version conflict, document already exists (current version [2])",
"index_uuid": "GqC2fSqPS06GRfTLmh1TLg",
"shard": "1",
"index": "blog"
}
],
"type": "version_conflict_engine_exception",
"reason": "[_doc][1]: version conflict, document already exists (current version [2])",
"index_uuid": "GqC2fSqPS06GRfTLmh1TLg",
"shard": "1",
"index": "blog"
},
"status": 409
}
3.3、更新文档的字段
通过脚本更新制定字段,其中ctx是脚本语言中的一个执行对象,先获取_source,再修改content字段
POST blog/_doc/1/_update
{
"script": {
"source": "ctx._source.content="从官网下载VMware-workstation,双击可执行文件进行安装...""
}
}
响应结果如下:
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "1",
"_version" : 3,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 2,
"_primary_term" : 1
}
再次获取文档 GET blog/_doc/1,响应结果如下
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "1",
"_version" : 3,
"found" : true,
"_source" : {
"title" : "1、VMware Workstation虚拟机软件安装图解",
"content" : "从官网下载VMware-workstation,双击可执行文件进行安装...",
"url" : "http://x.co/6nc81"
}
}
3.4、添加字段
POST blog/_doc/1/_update
{
"script": {
"source": "ctx._source.author="chengyuqiang""
}
}
再次获取文档 GET blog/_doc/1,响应结果如下:
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "1",
"_version" : 4,
"found" : true,
"_source" : {
"title" : "1、VMware Workstation虚拟机软件安装图解",
"content" : "从官网下载VMware-workstation,双击可执行文件进行安装...",
"url" : "http://x.co/6nc81",
"author" : "chengyuqiang"
}
}
3.5、删除字段
POST blog/_doc/1/_update
{
"script": {
"source": "ctx._source.remove("url")"
}
}
再次获取文档 GET blog/_doc/1,响应结果如下:
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "1",
"_version" : 5,
"found" : true,
"_source" : {
"title" : "1、VMware Workstation虚拟机软件安装图解",
"content" : "从官网下载VMware-workstation,双击可执行文件进行安装...",
"author" : "chengyuqiang"
}
}
4、删除文档
DELETE blog/_doc/1
输出:
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "1",
"_version" : 6,
"result" : "deleted",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 6,
"_primary_term" : 1
}
再次判定该文档是否存在,执行 HEAD blog/_doc/1,响应结果 404 - Not Found。
5、批量操作
如果文档数量非常庞大,ElasticSearch提供了文档的批量操作机制。mget允许一次性检索多个文档,ElasticSearch提供了Bulk API,可以执行批量索引、批量删除、批量更新等操作,也就是说Bulk API允许使用在单个步骤中进行多次 create 、 index 、 update 或 delete 请求。
bulk 与其他的请求体格式稍有不同,bulk请求格式如下:
{ action: { metadata }}\n
{ request body }\n
{ action: { metadata }}\n
{ request body }\n
...
这种格式类似一个有效的单行 JSON 文档 流 ,它通过换行符(\n)连接到一起。注意两个要点:
- 每行一定要以换行符
(\n)结尾, 包括最后一行。这些换行符被用作一个标记,可以有效分隔行。 - 这些行不能包含未转义的换行符,因为他们将会对解析造成干扰。这意味着这个
JSON不能使用pretty参数打印。 action/metadata行指定哪一个文档做什么操作。metadata应该指定被索引、创建、更新或者删除的文档的_index、_type和_id。request body行由文档的_source本身组成–文档包含的字段和值。它是index和create操作所必需的。
5.1、批量导入
POST /_bulk
{ "create": { "_index": "blog", "_type": "_doc", "_id": "1" }}
{ "title": "1、VMware Workstation虚拟机软件安装图解" ,"author":"chengyuqiang","content":"官网下载VMware-workstation,双击可执行文件进行安装" , "url":"http://x.co/6nc81" }
{ "create": { "_index": "blog", "_type": "_doc", "_id": "2" }}
{ "title": "2、Linux服务器安装图解" ,"author": "chengyuqiang" ,"content": "VMware模拟Linux服务器安装图解" , "url": "http://x.co/6nc82" }
{ "create": { "_index": "blog", "_type": "_doc", "_id": "3" }}
{ "title": "3、Xshell 6 个人版安装与远程操作连接服务器" , "author": "chengyuqiang" ,"content": "Xshell 6 个人版安装与远程操作连接服务器..." , "url": "http://x.co/6nc84" }
这个 Elasticsearch 响应包含 items 数组, 这个数组的内容是以请求的顺序列出来的每个请求的结果。
{
"took" : 132,
"errors" : false,
"items" : [
{
"create" : {
"_index" : "blog",
"_type" : "_doc",
"_id" : "1",
"_version" : 7,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 7,
"_primary_term" : 1,
"status" : 201
}
},
{
"create" : {
"_index" : "blog",
"_type" : "_doc",
"_id" : "2",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 8,
"_primary_term" : 1,
"status" : 201
}
},
{
"create" : {
"_index" : "blog",
"_type" : "_doc",
"_id" : "3",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1,
"status" : 201
}
}
]
}
5.2、批量操作,包括删除、更新、新增
POST /_bulk
{ "delete": { "_index": "blog", "_type": "_doc", "_id": "1" }}
{ "update": { "_index": "blog", "_type": "_doc", "_id": "3", "retry_on_conflict" : 3} }
{ "doc" : {"title" : "Xshell教程"} }
{ "index": { "_index": "blog", "_type": "_doc", "_id": "4" }}
{ "title": "4、CentOS 7.x基本设置" ,"author":"chengyuqiang","content":"CentOS 7.x基本设置","url":"http://x.co/6nc85" }
{ "create": { "_index": "blog", "_type": "_doc", "_id": "5" }}
{ "title": "5、图解Linux下JDK安装与环境变量配置","author":"chengyuqiang" ,"content": "图解JDK安装配置" , "url": "http://x.co/6nc86" }
在7.0版本中,retry_on_conflict 参数取代了之前的 _retry_on_conflict
{
"took" : 125,
"errors" : false,
"items" : [
{
"delete" : {
"_index" : "blog",
"_type" : "_doc",
"_id" : "1",
"_version" : 2,
"result" : "deleted",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 3,
"_primary_term" : 1,
"status" : 200
}
},
{
"update" : {
"_index" : "blog",
"_type" : "_doc",
"_id" : "3",
"_version" : 2,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 4,
"_primary_term" : 1,
"status" : 200
}
},
{
"index" : {
"_index" : "blog",
"_type" : "_doc",
"_id" : "4",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 1,
"_primary_term" : 1,
"status" : 201
}
},
{
"create" : {
"_index" : "blog",
"_type" : "_doc",
"_id" : "5",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 5,
"_primary_term" : 1,
"status" : 201
}
}
]
}
6、批量获取
GET blog/_doc/_mget
{
"ids" : ["1", "2","3"]
}
id为1的文档已经删除,所以没有搜索到
{
"docs" : [
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "1",
"found" : false
},
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "2",
"_version" : 1,
"found" : true,
"_source" : {
"title" : "2、Linux服务器安装图解",
"author" : "chengyuqiang",
"content" : "VMware模拟Linux服务器安装图解",
"url" : "http://x.co/6nc82"
}
},
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "3",
"_version" : 2,
"found" : true,
"_source" : {
"title" : "Xshell教程",
"author" : "chengyuqiang",
"content" : "Xshell 6 个人版安装与远程操作连接服务器...",
"url" : "http://x.co/6nc84"
}
}
]
}
7、简单搜索
7.1、词项查询, 也称 term 查询
【示例一】
GET blog/_search
{
"query": {
"term": {
"title": "centos"
}
}
}
输出:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.71023846,
"hits" : [
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.71023846,
"_source" : {
"title" : "4、CentOS 7.x基本设置",
"author" : "chengyuqiang",
"content" : "CentOS 7.x基本设置",
"url" : "http://x.co/6nc85"
}
}
]
}
}
【示例二】
GET blog/_search
{
"query": {
"term": {
"title": "远程"
}
}
}
输出:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
【示例三】
GET blog/_search
{
"query": {
"term": {
"title": "程"
}
}
}
输出:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.3486402,
"hits" : [
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.3486402,
"_source" : {
"title" : "Xshell教程",
"author" : "chengyuqiang",
"content" : "Xshell 6 个人版安装与远程操作连接服务器...",
"url" : "http://x.co/6nc84"
}
}
]
}
}
7.2、匹配查询,也称match查询
与term精确查询不同,对于match查询,只要被查询字段中存在任何一个词项被匹配,就会搜索到该文档。
GET blog/_search
{
"query": {
"match": {
"title": {
"query": "远程"
}
}
}
}
输出:
{
"took" : 9,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.3486402,
"hits" : [
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.3486402,
"_source" : {
"title" : "Xshell教程",
"author" : "chengyuqiang",
"content" : "Xshell 6 个人版安装与远程操作连接服务器...",
"url" : "http://x.co/6nc84"
}
}
]
}
}
7.3、范围查询(Range Query)
在指定字段中搜索满足指定范围条件(如大于、小于、大于等于、小于等于)的文档:
{
"query": {
"range": {
"price": {
"gte": 100,
"lte": 500
}
}
}
}
7.4、布尔查询(Bool Query)
使用逻辑运算符(AND、OR、NOT)组合多个查询条件,实现复杂的查询逻辑:
{
"query": {
"bool": {
"must": {
"match": {
"title": "Elasticsearch"
}
},
"must_not": {
"match": {
"category": "deprecated"
}
}
}
}
}
7.5、全文查询(Full Text Query)
在文本字段中执行全文搜索,处理文本的分词、词语的匹配度评分等:
{
"query": {
"match": {
"content": "full text search"
}
}
}
7.6、模糊查询(Fuzzy Query)
搜索与指定词条相似的文档,通过设置模糊度参数控制匹配程度:
{
"query": {
"fuzzy": {
"title": {
"value": "elasticserch",
"fuzziness": "AUTO"
}
}
}
}
7.7、字段存在查询(Exists Query)
搜索具有指定字段的文档:
{
"query": {
"exists": {
"field": "category"
}
}
}
7.8、正则表达式查询(Regexp Query)
搜索与指定正则表达式匹配的文档:
{
"query": {
"regexp": {
"title": ".*search.*"
}
}
}
7.9、地理位置查询(Geo Queries)
搜索在指定地理范围内的文档,包括距离查询、范围查询等:
{
"query": {
"geo_distance": {
"distance": "10km",
"location": {
"lat": 40,
"lon": -70
}
}
}
}
7.10、聚合查询(Aggregation Query)
对搜索结果进行分组、统计、计算等聚合操作,如求和、平均值、最大值、最小值等:
{
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
7.11、脚本查询(Script Query)
使用脚本来执行自定义的查询逻辑:
{
"query": {
"script": {
"script": {
"source": "doc['price'].value > 100"
}
}
}
}
8、 refresh
8.1、立即刷新,文档可见
这些将创建一个文档并立即刷新索引,使其可见:
DELETE test
PUT test/_doc/1?refresh
{"message": "测试文档1"}
PUT test/_doc/2?refresh=true
{"message": "测试文档2"}
8.2、不刷新
这些将创建一个文档而不做任何使搜索可见的内容:
PUT test/_doc/3
{"message": "测试文档3"}
PUT test/_doc/4?refresh=false
{"message": "测试文档4"}
8.3、等待刷新可见
PUT test/_doc/5?refresh=wait_for
{"message": "测试文档5"}
二、数据导入
2.1、数据转换
- 通过将
.json数据文件转换为ElasticSearch的API需要的格式 - 通过解析
.json数据文件,使用JSON库(例如gson)提取其值,然后使用ElasticSearch的REST API导入数据
ElasticSearch对数据格式有特定的格式要求:
{``"index"``:{``"_id"``:4800770}}
{``"Rcvr"``:1,``"HasSig"``:``false``,``"Icao"``:``"494102"``, ``"Bad"``:``false``,``"Reg"``:``"CS-PHB"``, ...}
...
这就意味着,你需要把下载的每一份json数据按照上述格式进行转换。主要满足如下2点:
- 在每个数据文档前面加入一行以
index开头的数据 - 把
"Id":<value>修改为{"_id":<value>}我们可以通过编写简单的Java程序,快速把json文件转换成对应格式:
package com.jgc;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import static java.util.stream.Collectors.toList;
/**
* Converts a flight data json file to a format that can be imported to
* ElasticSearch using its bulk API.
*/
public class JsonFlightFileConverter {
private static final Path flightDataJsonFile =
Paths.get("src/main/resources/flightdata/2016-07-01-1300Z.json");
public static void main(String[] args) {
List<String> list = new ArrayList<>();
try (Stream<String> stream = Files.lines(flightDataJsonFile.toAbsolutePath())) {
list = stream
.map(line -> line.split("\{"))
.flatMap(Arrays::stream)
.collect(toList());
} catch (IOException e) {
e.printStackTrace();
}
System.out.println(list);
}
}
最后,通过简单的拼接,输出我们想要的结果:
final String result = list.stream().skip(3)
.map(s -> "{" + s + "\n")
.collect(Collectors.joining());
System.out.println(result);
现在,可以看到输出已经非常接近我们想要的结果:
{"Id":4800770,"Rcvr":1,"HasSig":false,"Icao":"494102", ...
实际上,我们可以将最后一个代码片段添加到原始流中,如下所示:
String result = "";
try (Stream<String> stream = Files.lines(flightDataJsonFile.toAbsolutePath())) {
result = stream
.map(line -> line.split("\{"))
.flatMap(Arrays::stream)
.skip(3)
.map(s -> "{" + s + "\n")
.collect(Collectors.joining());
} catch (IOException e) {
e.printStackTrace();
}
现在,我们需要在每行的上方插入新行,其中包含文档的索引,如下所示:
{"index":{"_id":4800770}}
我们可以创建一个函数,这样处理会更加简洁明了:
private static String insertIndex(String s) {
final String[] keyValues = s.split(",");
final String[] idKeyValue = keyValues[0].split(":");
return "{"index":{"_id":"+ idKeyValue[1] +"}}\n";
}
这样,就可以对每个输入进行转换,给出我们需要的输出。 我们还需要解决的更多细节,从每个文档中删除最后一个逗号。
private static String removeLastComma(String s) {
return s.charAt(s.length() - 1) == ',' ? s.substring(0, s.length() - 1) : s;
}
这时候,数据处理代码就变成了下面这个样子:
public class JsonFlightFileConverter {
public static void main(String[] args) {
if (args.length == 1) {
Path inDirectoryPath = Paths.get(args[0]);
if (inDirectoryPath != null) {
Path outDirectoryPath = Paths.get(inDirectoryPath.toString(), "out");
try {
if (Files.exists(outDirectoryPath)) {
Files.walk(outDirectoryPath)
.sorted(Comparator.reverseOrder())
.map(Path::toFile)
.forEach(File::delete);
}
Files.createDirectory(Paths.get(inDirectoryPath.toString(), "out"));
} catch (IOException e) {
e.printStackTrace();
}
try (DirectoryStream ds = Files.newDirectoryStream(inDirectoryPath, "*.json")) {
for (Path inFlightDataJsonFile : ds) {
String result = "";
try (Stream stream =
Files.lines(inFlightDataJsonFile.toAbsolutePath())) {
result = stream
.parallel()
.map(line -> line.split("\{"))
.flatMap(Arrays::stream)
.skip(3)
.map(s -> createResult(s))
.collect(Collectors.joining());
Path outFlightDataJsonFile =
Paths.get(outDirectoryPath.toString(),
inFlightDataJsonFile.getFileName().toString());
Files.createFile(outFlightDataJsonFile);
Files.writeString(outFlightDataJsonFile, result);
}
} catch (IOException e) {
e.printStackTrace();
}
}
} else {
System.out.println("Usage: java JsonFlightFileConverter ");
}
...
2.2、使用ElasticSearch的批量API导入数据
文件必须以空行结尾。如果不是,则添加一个(实际上前面的程序已经在文件末尾添加了换行符)。在产生新的.json文件的目录(输出目录)内,执行以下命令:
curl -H "Content-Type: application/x-ndjson" -XPOST http://localhost:9200/flight/_bulk --data-binary "@2016-07-01-1300Z.json"
注意,内容类型是application / x-ndjson,而不是application / x-json。还要注意,我们将数据表示为二进制以便保留换行符。文件名为2016-07-01-1300Z.json。ElasticSearch中任何具有相同ID的现有文档都将被.json文件中的文档替换。
最后,可以发现有7679文件被导入:
"hits" : {
"total" : {
"value" : 7679,
"relation" : "eq"
},
GET /_cat/shards?v
返回结果:
index shard prirep state docs store ip node
flight 0 p STARTED 7679 71mb 127.0.0.1 MacBook-Pro.local
flight 0 r UNASSIGNED
将这些文档导入ElasticSearch的另一种方法是将JSON数据文件解析到内存中,并使用ElasticSearch的REST API将其导入ElasticSearch。
有许多库可用于解析Java中的JSON文件:
- GSon
- Jackson
- mJson
- JSON-Simple
- JSON-P
我们将使用Google的GSon库,但其他任何JSON库都可以完成此工作。GSon提供了多种表示JSON数据的方法,具体使用哪一种,则取决于下一步,即如何将数据导入到ElasticSearch。ElasticSearch API要求数据的格式为:Map<String, Object>,这是我们将解析后的JSON数据存储到的位置。
首先,将下面依赖加入到pom.xml中:
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.8.6</version>
</dependency>
使用下方代码解析json数据:
package com.jcg;
import com.google.gson.Gson;
import com.google.gson.internal.LinkedTreeMap;
import com.google.gson.reflect.TypeToken;
import java.io.BufferedReader;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.List;
import java.util.Map;
public class JsonFlightFileReader {
private static final String flightDataJsonFile = "src/main/resources/flightdata/2016-07-01-1300Z.json";
private static final Gson gson = new Gson();
public static void main(String[] args) {
parseJsonFile(flightDataJsonFile);
}
private static void parseJsonFile(String file) {
try (BufferedReader reader = Files.newBufferedReader(Paths.get(file))) {
Map<String, Object> map = gson.fromJson(reader,
new TypeToken<Map<String, Object>>() { }.getType());
List<Object> acList = (List<Object>) (map.get("acList"));
for (Object item : acList) {
LinkedTreeMap<String, Object> flight =
(LinkedTreeMap<String, Object>) item;
for (Map.Entry<String, Object> entry : flight.entrySet()) {
String key = entry.getKey();
Object value = entry.getValue();
String outEntry = (key.equals("Id") ? "{" + key : key) + " : " + value + ", ";
System.out.print(outEntry);
}
System.out.println("}");
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
通过下述方法可以使用数据:
Map<String, Object> map = gson.fromJson(reader, new TypeToken<Map<String, Object>>() {}.getType());
List<Object> acList = (List<Object>) (map.get("acList"));
2.3、使用ElasticSearch REST API导入数据
首先,在pom.xml中加入下方依赖:
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-client</artifactId>
<version>7.10.0</version>
</dependency>
我们可以通过RestClient与ElasticSearch进行交互:
RestClient restClient = RestClient.builder(
new HttpHost("localhost", 9200, "http"));
.setDefaultHeaders(new Header[]{
new BasicHeader("accept", "application/json"),
new BasicHeader("content-type", "application/json")})
.setFailureListener(new RestClient.FailureListener() {
public void onFailure(Node node) {
System.err.println("Low level Rest Client Failure on node " +
node.getName());
}
}).build();
创建好RestClient之后,下一步就是创建一个Request,并将json数据传递给它:
Request request = new Request("POST", "/flight/_doc/4800770");
String jsonDoc = "{"Rcvr":1,"HasSig":false,"Icao":"494102",...]}";
request.setJsonEntity(jsonDoc);
最后,我们发送请求。有两种方式,
同步 :
Response response = restClient.performRequest(request);
if (response.getStatusLine().getStatusCode() != 200) {
System.err.println("Could not add document with Id: " + id + " to index /flight");
}
异步 :
Cancellable cancellable = restClient.performRequestAsync(request,
new ResponseListener() {
@Override
public void onSuccess(Response response) {
System.out.println("Document with Id: " + id + " was successfully added to index /flight");
}
@Override
public void onFailure(Exception exception) {
System.err.println("Could not add document with Id: " + id + " to index /flight");
}
});
最后,不要忘记关闭restClient连接:
} finally {
try {
restClient.close();
} catch (IOException e) {
e.printStackTrace();
}
}