Elasticsearch简单使用

131 阅读6分钟

Elasticsearch 是一种高度可扩展的开源全文搜索和分析引擎,它允许你快速地、近实时地存储、搜索和分析大量数据。

这是一些Elasticsearch的高级功能及其用途:

聚合(Aggregations):

  • 聚合允许你在搜索文档集上生成复杂的数据分析。例如,你可以用聚合来计算平均值、求和、最小/最大值、以及更复杂的统计数据如分布和百分位数。
  • 桶聚合(Bucket Aggregations)用于分组数据,比如按照日期、地理位置或任何可分类的字段分组。
  • 度量聚合(Metric Aggregations)用于计算关于数据集的指标,如总数、平均值、最小值和最大值。

下面是一个在 Spring Boot 中使用 Elasticsearch 进行聚合查询的示例。这个例子将展示如何使用桶聚合和度量聚合来分析数据。

以下是一个简单的服务类示例,该服务执行一个桶聚合,以便按某个字段(比如 type)分组,并计算每组的平均值:

import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.aggregations.BucketOrder;
import org.elasticsearch.search.aggregations.bucket.terms.TermsAggregationBuilder;
import org.elasticsearch.search.aggregations.metrics.AvgAggregationBuilder;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

import java.io.IOException;

@Service
public class ElasticsearchService {

    @Autowired
    private RestHighLevelClient client;

    public void aggregateData() {
        try {
            SearchRequest searchRequest = new SearchRequest("your_index_name"); // 替换为你的索引名
            SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

            // 定义桶聚合
            TermsAggregationBuilder aggregation = AggregationBuilders.terms("by_type")
                    .field("type.keyword")
                    .order(BucketOrder.aggregation("average_price", true)); // 按平均价格排序

            // 定义度量聚合(计算平均值)
            AvgAggregationBuilder avgPrice = AggregationBuilders.avg("average_price")
                    .field("price");

            aggregation.subAggregation(avgPrice);
            searchSourceBuilder.aggregation(aggregation);
            searchRequest.source(searchSourceBuilder);

            SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);

            // 这里可以根据需要处理响应
            System.out.println(response);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

全文搜索功能(Full-text Search):

  • 支持多种类型的全文搜索,包括匹配查询、多字段查询和语言相关的查询,如语言分析器。
  • 查询DSL(Query Domain-Specific Language)提供了强大的、灵活的查询语言来执行和优化搜索。

以下示例演示了如何设置一个简单的全文搜索查询,使用 Spring Data Elasticsearch 进行匹配查询和多字段查询。

创建一个服务来执行全文搜索查询。这里将展示如何在 Spring Boot 应用中设置一个简单的全文搜索,包括多字段查询:

import org.elasticsearch.index.query.MultiMatchQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.springframework.beans.factory.annotation.Autowired;
import org

.springframework.data.elasticsearch.core.ElasticsearchRestTemplate;
import org.springframework.data.elasticsearch.core.SearchHit;
import org.springframework.data.elasticsearch.core.SearchHits;
import org.springframework.data.elasticsearch.core.query.NativeSearchQuery;
import org.springframework.data.elasticsearch.core.query.NativeSearchQueryBuilder;
import org.springframework.stereotype.Service;

import java.util.List;
import java.util.stream.Collectors;

@Service
public class ElasticsearchSearchService {

    @Autowired
    private ElasticsearchRestTemplate elasticsearchTemplate;

    public List<String> performFullTextSearch(String queryText) {
        NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
                .withQuery(QueryBuilders.multiMatchQuery(queryText, "title", "description")) // 指定要搜索的字段
                .build();

        SearchHits<MyDocument> searchHits = elasticsearchTemplate.search(searchQuery, MyDocument.class);
        return searchHits.stream()
                .map(SearchHit::getContent)
                .map(MyDocument::getTitle) // 假设我们只想返回文档的标题
                .collect(Collectors.toList());
    }

    // 假设有一个简单的文档类
    static class MyDocument {
        private String title;
        private String description;

        public String getTitle() {
            return title;
        }

        public void setTitle(String title) {
            this.title = title;
        }

        public

		String getDescription() {
            return description;
        }

        public void setDescription(String description) {
            this.description = description;
        }
    }
}

地理空间搜索(Geospatial Search):

  • Elasticsearch 支持基于地理位置的数据索引和查询,例如,通过地理坐标查找附近的地点或计算两个地点之间的距离。

以下是一个实现地理空间搜索的基本示例,包括如何设置地理点数据、进行地理距离查询,以及如何使用 Spring Data Elasticsearch 框架进行操作。

在 Spring Boot 应用中,你可以创建一个简单的实体和服务类来演示如何进行地理空间搜索:

实体类定义

import org.springframework.data.annotation.Id;
import org.springframework.data.elasticsearch.annotations.Document;
import org.springframework.data.elasticsearch.annotations.Field;
import org.springframework.data.elasticsearch.annotations.FieldType;

@Document(indexName = "locations")
public class Location {

    @Id
    private String id;

    @Field(type = FieldType.Text)
    private String name;

    @Field(type = FieldType.GeoPoint)
    private String geoPoint; // "lat, lon" 格式

    // Getters and Setters
    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public String getGeoPoint() {
        return geoPoint;
    }

    public void setGeoPoint(String geoPoint) {
        this.geoPoint = geoPoint;
    }
}

服务类实现

import org.elasticsearch.index.query.GeoDistanceQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.elasticsearch.core.ElasticsearchRestTemplate;
import org.springframework.data.elasticsearch.core.SearchHit;
import org.springframework.data.elasticsearch.core.SearchHits;
import org.springframework.data.elasticsearch.core.geo.GeoPoint;
import org.springframework.data.elasticsearch.core.query.NativeSearchQuery;
import org.springframework.data.elasticsearch.core.query.NativeSearchQueryBuilder;
import org.springframework.stereotype.Service;

import java.util.List;
import java.util.stream.Collectors;

@Service
public class GeoSearchService {

    @Autowired
    private ElasticsearchRestTemplate elasticsearchTemplate;

    public List<Location> searchNearby(double lat, double lon, String distance) {
        GeoDistanceQueryBuilder geoDistanceQueryBuilder = QueryBuilders.geoDistanceQuery("geoPoint")
                .point(lat, lon)
                .distance(distance); // 距离可以是 "10km", "200m" 等

        NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
                .withFilter(geoDistanceQueryBuilder)
                .build();

        SearchHits<Location> searchHits = elasticsearchTemplate.search(searchQuery, Location.class);
        return searchHits.getSearchHits().stream()
                .map(SearchHit::getContent)
                .collect(Collectors.toList());
    }
}

自定义分析(Custom Analytics):

  • 使用Painless脚本语言可以在查询、聚合或更新操作中实现自定义逻辑。

下面是一个具体的使用场景示例,展示如何在聚合查询中使用 Painless 脚本来计算数据的自定义指标。

创建一个服务类,在其中使用 Painless 脚本来执行自定义的聚合操作:

import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.aggregations.bucket.terms.TermsAggregationBuilder;
import org.elasticsearch.search.aggregations.metrics.ScriptedMetricAggregationBuilder;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

import java.io.IOException;

@Service
public class CustomAnalyticsService {

    @Autowired
    private RestHighLevelClient client;

    public String performCustomAnalysis() {
        try {
            SearchRequest searchRequest = new SearchRequest("your_index");
            SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
            
            // 聚合查询,使用 Painless 脚本计算自定义指标
            ScriptedMetricAggregationBuilder aggregation = AggregationBuilders.scriptedMetric()
                .initScript("state.transactions = []") // 初始化脚本
                .mapScript("state.transactions.add(doc['transaction_amount'].value)") // 映射脚本
                .combineScript("double total = 0; for (t in state.transactions) { total += t } return total;") // 合并脚本
                .reduceScript("double grandTotal = 0; for (a in states) { grandTotal += a } return grandTotal;"); // 归约脚本

            searchSourceBuilder.query(QueryBuilders.matchAllQuery())
                               .aggregation(aggregation);
            searchRequest.source(searchSourceBuilder);

            SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);
            return "Aggregation Results: " + response.toString();
        } catch (IOException e) {
            e.printStackTrace();
            return "Error in executing search query: " + e.getMessage();
        }
    }
}

快照和恢复(Snapshot and Restore):

  • 支持数据的定期快照和恢复,以确保数据安全和容灾恢复。

下面是一个示例,展示如何在 Spring Boot 中使用 Elasticsearch 的高级 REST 客户端来创建快照和恢复数据。

在 Spring Boot 应用中创建一个服务类,这个类包含方法来创建快照和从快照中恢复数据:

import org.elasticsearch.action.admin.cluster.snapshots.create.CreateSnapshotRequest;
import org.elasticsearch.action.admin.cluster.snapshots.create.CreateSnapshotResponse;
import org.elasticsearch.action.admin.cluster.snapshots.restore.RestoreSnapshotRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.settings.Settings;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

@Service
public class SnapshotService {

    @Autowired
    private RestHighLevelClient restHighLevelClient;

    public String createSnapshot(String repositoryName, String snapshotName) {
        try {
            CreateSnapshotRequest request = new CreateSnapshotRequest(repositoryName, snapshotName);
            request.indices("your_index_name"); // 指定快照的索引
            request.includeGlobalState(true); // 是否包含集群的全局状态

            CreateSnapshotResponse response = restHighLevelClient.snapshot().create(request, RequestOptions.DEFAULT);
            return "Snapshot was created successfully, status: " + response.status();
        } catch (Exception e) {
            e.printStackTrace();
            return "Failed to create snapshot: " + e.getMessage();
        }
    }

    public String restoreSnapshot(String repositoryName, String snapshotName) {
        try {
            RestoreSnapshotRequest request = new RestoreSnapshotRequest(repositoryName, snapshotName);
            request.includeGlobalState(true);
            request.includeAliases(true);

            restHighLevelClient.snapshot().restore(request, RequestOptions.DEFAULT);
            return "Snapshot was restored successfully";
        } catch (Exception e) {
            e.printStackTrace();
            return "Failed to restore snapshot: " + e.getMessage();
        }
    }
}

监控和管理(Monitoring and Management):

  • 提供对集群健康、性能和日志的实时监控。

下面是一个服务类,用于获取集群健康信息:

import org.elasticsearch.client.Request;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.Response;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

import java.io.IOException;

@Service
public class ClusterMonitoringService {

    @Autowired
    private RestHighLevelClient restHighLevelClient;

    public String getClusterHealth() {
        RestClient lowLevelClient = restHighLevelClient.getLowLevelClient();
        Request request = new Request("GET", "/_cluster/health");
        try {
            Response response = lowLevelClient.performRequest(request);
            return "Cluster Health: " + response.getStatusLine() + " - " + EntityUtils.toString(response.getEntity());
        } catch (IOException e) {
            e.printStackTrace();
            return "Failed to retrieve cluster health: " + e.getMessage();
        }
    }
}

这个服务类ClusterMonitoringService提供了一个getClusterHealth方法,它使用 Elasticsearch 的低级客户端发送一个 GET 请求到 /_cluster/health,这是用于获取集群健康信息的 API。它会返回集群的健康状态,如集群是否正常运行,是否有节点失联等。

你可以在你的控制器或其他服务中调用这个getClusterHealth方法来获取集群状态。例如:

@RestController
public class MonitoringController {

    @Autowired
    private ClusterMonitoringService clusterMonitoringService;

    @GetMapping("/cluster/health")
    public ResponseEntity<String> getClusterHealth() {
        return ResponseEntity.ok(clusterMonitoringService.getClusterHealth());
    }
}

这个控制器MonitoringController提供了一个端点/cluster/health,当你访问这个端点时,它会调用ClusterMonitoringService来获取集群的健康状态,并将其返回给客户端。

这些高级功能使得Elasticsearch成为处理和分析大规模数据集的强大工具。你可以根据具体的业务需求来选择和配置这些功能,以达到最佳的性能和效率。