Elasticsearch:使用最新的 Elasticsearch Java client 8.0 来创建索引并搜索

3,339 阅读8分钟

在这篇文章中,我来详细地描述如何使用最新的 Elasticsearch Java client 8.0 来创建索引并进行搜索。最新的 Elasticsearch Java client API 和之前的不同。在之前的一些教程中,我们使用 High Level API来进行操作。在官方文档中,已经显示为 deprecated。

前提条件

  • Java 8 及以后的版本
  • 一个 JSON 对象映射库,允许你的应用程序类与 Elasticsearch API 无缝集成。 Java 客户端支持 Jackson 或像 Eclipse Yasson 的 JSON-B 库。

版本托管在 Maven Central 上。 如果你正在寻找 SNAPSHOT 版本,可以从 snapshots.elastic.co/maven/获得 Elastic Maven 快照存储库。

为什么需要一个新的 Java client?

也许有许多的开发者好奇为啥需要新的 client,以前的那个 High level rest client 不是好好的吗?以前的那个 High level REST client API 有如下的问题:

  • 和 Elasticsearch server 共享很多的代码
    • 拉取大量依赖 (30 + MB)。很多代码并不实用
    • 容易误解:之前的 API 暴露了许多 Elasticsearch server 的内部情况
  • 用手来书写 API
    • API 在不同的版本中有时并不一致
    • 需要大量的维护工作(400 多个 endpoints)
  • 没有 JSON/object 映射的集成
    • 你需要使用 byte buffers 来自己映射

新的 Java client API 具有一下的优点:

  • 使用代码来生成 API
    • 基于官方的 Elasticsearch API 正式文档
    • Java client API 是新一代 Elasticsearch client 的第一个。后续有针对其它的语言发布
    • 99% 的代码是自动生成的 
  • 一个提供更加现代 API 接口的机会
    • 流畅的 functional builders
    • 接近 Elasticsearch JSON 格式的分层 DSL
    • 到/从和应用程序类的自动映射
    • 保持 Java 8 的兼容性

安装

如果你还没有安装好自己的 Elasticsearch 及 Kibana 的话,请参阅我之前的文章:

如果你想在 Elastic Stack 8.0 上试用的话。你可以参阅文章 “Elastic Stack 8.0 安装 - 保护你的 Elastic Stack 现在比以往任何时候都简单”。在本文章中,我们不启用 HTTPS 的访问。你需要查看文章中 “如何配置 Elasticsearch 只带有基本安全” 这个部分。我们为 Elasticsearch 配置基本安全。

展示

在今天的展示中,我将使用 Maven 项目来进行展示尽管 gradle 也可以。为了方便大家的学习,我把我创建的项目上传到 github 上 GitHub - liu-xiao-guo/ElasticsearchJava-search8

首先,我们的 pom.xml 文件如下:

pom.xml



1.  <?xml version="1.0" encoding="UTF-8"?>
2.  <project xmlns="http://maven.apache.org/POM/4.0.0"
3.           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
4.           xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
5.      <modelVersion>4.0.0</modelVersion>

7.      <groupId>org.example</groupId>
8.      <artifactId>ElasticsearchJava-search8</artifactId>
9.      <version>1.0-SNAPSHOT</version>

11.      <properties>
12.          <maven.compiler.source>8</maven.compiler.source>
13.          <maven.compiler.target>8</maven.compiler.target>
14.          <elastic.version>8.0.1</elastic.version>
15.      </properties>

17.      <dependencies>
18.          <dependency>
19.              <groupId>co.elastic.clients</groupId>
20.              <artifactId>elasticsearch-java</artifactId>
21.              <version>${elastic.version}</version>
22.          </dependency>

24.          <dependency>
25.              <groupId>com.fasterxml.jackson.core</groupId>
26.              <artifactId>jackson-databind</artifactId>
27.              <version>2.12.3</version>
28.          </dependency>

30.          <!-- Needed only if you use the spring-boot Maven plugin -->
31.          <dependency>
32.              <groupId>jakarta.json</groupId>
33.              <artifactId>jakarta.json-api</artifactId>
34.              <version>2.0.1</version>
35.          </dependency>
36.      </dependencies>
37.  </project>


如上所示,我们使用了 8.0.1 的版本。你也可以使用在地址 Maven Central Repository Search 上的最新版本 8.1.1。

接下来,我们创建一个叫做 Product.java 的文件:

Product.java



1.  public class Product {
2.      private String id;
3.      private String name;
4.      private int price;

6.      public Product() {
7.      }

9.      public Product(String id, String name, int price) {
10.          this.id = id;
11.          this.name = name;
12.          this.price = price;
13.      }

15.      public String getId() {
16.          return id;
17.      }

19.      public String getName() {
20.          return name;
21.      }

23.      public int getPrice() {
24.          return price;
25.      }

27.      public void setId(String id) {
28.          this.id = id;
29.      }

31.      public void setName(String name) {
32.          this.name = name;
33.      }

35.      public void setPrice(int price) {
36.          this.price = price;
37.      }

39.      @Override
40.      public String toString() {
41.          return "Product{" +
42.                  "id='" + id + '\'' +
43.                  ",  + name + '\'' +
44.                  ", price=" + price +
45.                  '}';
46.      }
47.  }


我们再接下来创建 ElasticsearchJava.java 文件:



1.  import co.elastic.clients.elasticsearch.ElasticsearchAsyncClient;
2.  import co.elastic.clients.elasticsearch.ElasticsearchClient;
3.  import co.elastic.clients.elasticsearch._types.query_dsl.QueryBuilders;
4.  import co.elastic.clients.elasticsearch._types.query_dsl.TermQuery;
5.  import co.elastic.clients.elasticsearch.core.*;
6.  import co.elastic.clients.elasticsearch.core.search.Hit;
7.  import co.elastic.clients.json.jackson.JacksonJsonpMapper;
8.  import co.elastic.clients.transport.ElasticsearchTransport;
9.  import co.elastic.clients.transport.rest_client.RestClientTransport;
10.  import org.apache.http.HttpHost;
11.  import org.apache.http.auth.AuthScope;
12.  import org.apache.http.auth.UsernamePasswordCredentials;
13.  import org.apache.http.client.CredentialsProvider;
14.  import org.apache.http.impl.client.BasicCredentialsProvider;
15.  import org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
16.  import org.elasticsearch.client.RestClient;
17.  import org.elasticsearch.client.RestClientBuilder;

19.  import java.io.IOException;

21.  public class ElasticsearchJava {

23.      private static ElasticsearchClient client = null;
24.      private static ElasticsearchAsyncClient asyncClient = null;

26.      private static synchronized void makeConnection() {
27.          // Create the low-level client
28.          final CredentialsProvider credentialsProvider =
29.                  new BasicCredentialsProvider();
30.          credentialsProvider.setCredentials(AuthScope.ANY,
31.                  new UsernamePasswordCredentials("elastic", "password"));

33.          RestClientBuilder builder = RestClient.builder(
34.                          new HttpHost("localhost", 9200))
35.                  .setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
36.                      @Override
37.                      public HttpAsyncClientBuilder customizeHttpClient(
38.                              HttpAsyncClientBuilder httpClientBuilder) {
39.                          return httpClientBuilder
40.                                  .setDefaultCredentialsProvider(credentialsProvider);
41.                      }
42.                  });

44.          RestClient restClient = builder.build();

46.          // Create the transport with a Jackson mapper
47.          ElasticsearchTransport transport = new RestClientTransport(
48.                  restClient, new JacksonJsonpMapper());

50.          // And create the API client
51.          client = new ElasticsearchClient(transport);
52.          asyncClient = new ElasticsearchAsyncClient(transport);
53.      }

55.      public static void main(String[] args) throws IOException {
56.          makeConnection();

58.          // Index data to an index products
59.          Product product = new Product("abc", "Bag", 42);

61.          IndexRequest<Object> indexRequest = new IndexRequest.Builder<>()
62.                  .index("products")
63.                  .id("abc")
64.                  .document(product)
65.                  .build();

67.          client.index(indexRequest);

69.          Product product1 = new Product("efg", "Bag", 42);

71.          client.index(builder -> builder
72.                  .index("products")
73.                  .id(product1.getId())
74.                  .document(product1)
75.          );

77.          // Search for a data
78.          TermQuery query = QueryBuilders.term()
79.                  .field("name")
80.                  .value("bag")
81.                  .build();

83.          SearchRequest request = new SearchRequest.Builder()
84.                  .index("products")
85.                  .query(query._toQuery())
86.                  .build();

88.          SearchResponse<Product> search =
89.                  client.search(
90.                          request,
91.                          Product.class
92.                  );

94.          for (Hit<Product> hit: search.hits().hits()) {
95.              Product pd = hit.source();
96.              System.out.println(pd);
97.          }

99.          SearchResponse<Product> search1 = client.search(s -> s
100.                          .index("products")
101.                          .query(q -> q
102.                                  .term(t -> t
103.                                          .field("name")
104.                                          .value(v -> v.stringValue("bag"))
105.                                  )),
106.                  Product.class);

108.          for (Hit<Product> hit: search1.hits().hits()) {
109.              Product pd = hit.source();
110.              System.out.println(pd);
111.          }

113.          // Splitting complex DSL
114.          TermQuery termQuery = TermQuery.of(t ->t.field("name").value("bag"));

116.          SearchResponse<Product> search2 = client.search(s -> s
117.                  .index("products")
118.                  .query(termQuery._toQuery()),
119.                  Product.class
120.          );

122.          for (Hit<Product> hit: search2.hits().hits()) {
123.              Product pd = hit.source();
124.              System.out.println(pd);
125.          }

127.          // Creating aggregations
128.          SearchResponse<Void> search3 = client.search( b-> b
129.                  .index("products")
130.                  .size(0)
131.                  .aggregations("price-histo", a -> a
132.                          .histogram(h -> h
133.                                  .field("price")
134.                                  .interval(20.0)
135.                          )
136.                  ),
137.                  Void.class
138.          );

140.          long firstBucketCount = search3.aggregations()
141.                  .get("price-histo")
142.                  .histogram()
143.                  .buckets().array()
144.                  .get(0)
145.                  .docCount();

147.          System.out.println("doc count: " + firstBucketCount);
148.      }
149.  }


在上面,代码也非常直接。我们使用如下的代码来连接到 Elasticsearch:

 1.    private static synchronized void makeConnection() {
2.          // Create the low-level client
3.          final CredentialsProvider credentialsProvider =
4.                  new BasicCredentialsProvider();
5.          credentialsProvider.setCredentials(AuthScope.ANY,
6.                  new UsernamePasswordCredentials("elastic", "password"));

8.          RestClientBuilder builder = RestClient.builder(
9.                          new HttpHost("localhost", 9200))
10.                  .setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
11.                      @Override
12.                      public HttpAsyncClientBuilder customizeHttpClient(
13.                              HttpAsyncClientBuilder httpClientBuilder) {
14.                          return httpClientBuilder
15.                                  .setDefaultCredentialsProvider(credentialsProvider);
16.                      }
17.                  });

19.          RestClient restClient = builder.build();

21.          // Create the transport with a Jackson mapper
22.          ElasticsearchTransport transport = new RestClientTransport(
23.                  restClient, new JacksonJsonpMapper());

25.          // And create the API client
26.          client = new ElasticsearchClient(transport);
27.          asyncClient = new ElasticsearchAsyncClient(transport);
28.      }

在上面,我们使用 elastic 这个超级用户来进行访问。它的密码是 password。这个在实际的使用中,需要根据自己的情况来进行设置。

在下面,我们使用如下的两种格式来写入数据到 products 索引中:

 1.          // Index data to an index products
2.          Product product = new Product("abc", "Bag", 42);

4.          IndexRequest<Object> indexRequest = new IndexRequest.Builder<>()
5.                  .index("products")
6.                  .id("abc")
7.                  .document(product)
8.                  .build();

10.          client.index(indexRequest);

12.          Product product1 = new Product("efg", "Bag", 42);

14.          client.index(builder -> builder
15.                  .index("products")
16.                  .id(product1.getId())
17.                  .document(product1)
18.          );

我们可以在 Kibana 中进行查看:

GET products/_search

上面的命令显示:



1.  {
2.    "took" : 0,
3.    "timed_out" : false,
4.    "_shards" : {
5.      "total" : 1,
6.      "successful" : 1,
7.      "skipped" : 0,
8.      "failed" : 0
9.    },
10.    "hits" : {
11.      "total" : {
12.        "value" : 2,
13.        "relation" : "eq"
14.      },
15.      "max_score" : 1.0,
16.      "hits" : [
17.        {
18.          "_index" : "products",
19.          "_id" : "abc",
20.          "_score" : 1.0,
21.          "_source" : {
22.            "id" : "abc",
23.            "name" : "Bag",
24.            "price" : 42
25.          }
26.        },
27.        {
28.          "_index" : "products",
29.          "_id" : "efg",
30.          "_score" : 1.0,
31.          "_source" : {
32.            "id" : "efg",
33.            "name" : "Bag",
34.            "price" : 42
35.          }
36.        }
37.      ]
38.    }
39.  }


显然我们写入的数据是成功的。

接下来,我使用了如下的两种格式来进行搜索:

 1.         // Search for a data
2.          TermQuery query = QueryBuilders.term()
3.                  .field("name")
4.                  .value("bag")
5.                  .build();

7.          SearchRequest request = new SearchRequest.Builder()
8.                  .index("products")
9.                  .query(query._toQuery())
10.                  .build();

12.          SearchResponse<Product> search =
13.                  client.search(
14.                          request,
15.                          Product.class
16.                  );

18.          for (Hit<Product> hit: search.hits().hits()) {
19.              Product pd = hit.source();
20.              System.out.println(pd);
21.          }

23.          SearchResponse<Product> search1 = client.search(s -> s
24.                          .index("products")
25.                          .query(q -> q
26.                                  .term(t -> t
27.                                          .field("name")
28.                                          .value(v -> v.stringValue("bag"))
29.                                  )),
30.                  Product.class);

32.          for (Hit<Product> hit: search1.hits().hits()) {
33.              Product pd = hit.source();
34.              System.out.println(pd);
35.          }

这个搜索相当于:



1.  GET products/_search
2.  {
3.    "query": {
4.      "term": {
5.        "name": {
6.          "value": "bag"
7.        }
8.      }
9.    }
10.  }


上面的搜索结果为:



1.  {
2.    "took" : 0,
3.    "timed_out" : false,
4.    "_shards" : {
5.      "total" : 1,
6.      "successful" : 1,
7.      "skipped" : 0,
8.      "failed" : 0
9.    },
10.    "hits" : {
11.      "total" : {
12.        "value" : 2,
13.        "relation" : "eq"
14.      },
15.      "max_score" : 0.18232156,
16.      "hits" : [
17.        {
18.          "_index" : "products",
19.          "_id" : "abc",
20.          "_score" : 0.18232156,
21.          "_source" : {
22.            "id" : "abc",
23.            "name" : "Bag",
24.            "price" : 42
25.          }
26.        },
27.        {
28.          "_index" : "products",
29.          "_id" : "efg",
30.          "_score" : 0.18232156,
31.          "_source" : {
32.            "id" : "efg",
33.            "name" : "Bag",
34.            "price" : 42
35.          }
36.        }
37.      ]
38.    }
39.  }


Java 代码输出的结果为:



1.  Product{id='abc', name='Bag', price=42}
2.  Product{id='efg', name='Bag', price=42}
3.  Product{id='abc', name='Bag', price=42}
4.  Product{id='efg', name='Bag', price=42}


我们使用如下的代码来简化一个复杂的 DSL:

 1.          // Splitting complex DSL
2.          TermQuery termQuery = TermQuery.of(t ->t.field("name").value("bag"));

4.          SearchResponse<Product> search2 = client.search(s -> s
5.                  .index("products")
6.                  .query(termQuery._toQuery()),
7.                  Product.class
8.          );

10.          for (Hit<Product> hit: search2.hits().hits()) {
11.              Product pd = hit.source();
12.              System.out.println(pd);
13.          }

同样上面的输出结果为:



1.  Product{id='abc', name='Bag', price=42}
2.  Product{id='efg', name='Bag', price=42}


最后,使用了一个 aggregation:

 1.          // Creating aggregations
2.          SearchResponse<Void> search3 = client.search( b-> b
3.                  .index("products")
4.                  .size(0)
5.                  .aggregations("price-histo", a -> a
6.                          .histogram(h -> h
7.                                  .field("price")
8.                                  .interval(20.0)
9.                          )
10.                  ),
11.                  Void.class
12.          );

14.          long firstBucketCount = search3.aggregations()
15.                  .get("price-histo")
16.                  .histogram()
17.                  .buckets().array()
18.                  .get(0)
19.                  .docCount();

21.          System.out.println("doc count: " + firstBucketCount);
22.      }

上面的 aggregation 相当于如下的请求:



1.  GET products/_search
2.  {
3.    "size": 0,
4.    "aggs": {
5.      "price-histo": {
6.        "histogram": {
7.          "field": "price",
8.          "interval": 50
9.        }
10.      }
11.    }
12.  }


它的响应结果为:



1.  {
2.    "took" : 0,
3.    "timed_out" : false,
4.    "_shards" : {
5.      "total" : 1,
6.      "successful" : 1,
7.      "skipped" : 0,
8.      "failed" : 0
9.    },
10.    "hits" : {
11.      "total" : {
12.        "value" : 2,
13.        "relation" : "eq"
14.      },
15.      "max_score" : null,
16.      "hits" : [ ]
17.    },
18.    "aggregations" : {
19.      "price-histo" : {
20.        "buckets" : [
21.          {
22.            "key" : 0.0,
23.            "doc_count" : 2
24.          }
25.        ]
26.      }
27.    }
28.  }


我们的 Java 代码的输出结果为:

doc count: 2