Elasticsearch学习之(一)线上迁移数据方案_elasticsearch 在线迁移

44 阅读3分钟
            indexNames.add( indexPrefix + dateFormat.format(tempCalendar.getTime()));
        }
    }


    return indexNames;
}

### 2、数据访问



#### 1、每个月的最后几天生成下个月的索引


注意点:  
 1、因为月份最后几天不确定,所以从28-31 都计算一下  
 2、创建索引的配置`indexConfiguration`自己写个json文件然后放到容器中读取即可



@Scheduled(cron = "0 0 10 28-31 * ?") public void createIndex() throws IOException { //判断是否是最后一天 if(!DateUtil.isLastDayOfMonth()){ log.warn("索引初始化|判断不是本月最后一天|不进行处理"); return; }

    // 每月最后一天生成下个月的索引
    Calendar instance = Calendar.getInstance();
    instance.add(Calendar.MONTH,1);
    instance.set(Calendar.DAY\_OF\_MONTH,1);

    String indexName = ElasticsearchUtil.getIndexNameByTimeStamp(newIndexPrefix, instance.getTimeInMillis());
    GetIndexRequest getIndexRequest = new GetIndexRequest(indexName);
    boolean exists = restHighLevelClient.indices().exists(getIndexRequest, RequestOptions.DEFAULT);
    if(exists){
        log.warn("索引初始化|索引已存在|index:{}",indexName);
        return;
    }
    if(StringUtils.isEmpty(indexConfiguration)){
        log.error("索引初始化|获取索引初始化配置为空|setting:{}",indexConfiguration);
        return;
    }
    CreateIndexRequest request = new CreateIndexRequest(indexName);
    // 初始化索引
    request.source(indexConfiguration,XContentType.JSON);
    CreateIndexResponse response = restHighLevelClient.indices().create(request, RequestOptions.DEFAULT);
    boolean acknowledged = response.isAcknowledged();
    log.info("索引初始化|初始化完成|所有节点是否都已确认:{}",acknowledged);

}

**index初始化的配置**  
 从resource下读取配置文件



@Bean("indexConfiguration") public String initIndexConfiguration() throws IOException { String indexConfiguration= initEsIndexSetting("indexInitialization.json"); log.info("索引初始化|初始化索引配置文件:{}",indexConfiguration); return indexConfiguration; }

private String initEsIndexSetting(String resource) throws IOException {
    ClassPathResource classPathResource = new ClassPathResource(resource);
    try (InputStream in = classPathResource.getInputStream()) {
        return StreamUtils.copyToString(in, StandardCharsets.UTF\_8);
    }catch (IOException e){
        throw new IOException(e);
    }
}

#### 2、数据查询


很明显根据时间戳确定索引名称然后查询就行了



public List queryLisNewt(int from, int pageSize, BoolQueryBuilder queryBuilder,FeedbackReport entity) {

    List<FeedbackReport>results=new ArrayList<>();

    if( StringUtils.isEmpty(entity.getStartDate()) || StringUtils.isEmpty(entity.getEndDate()) ){
        log.error("queryLisNewt|查询错误|开始或结束时间为空|startDate:{}|endDate:{}",entity.getStartDate(),entity.getEndDate());
        return results;
    }
    long start=DateUtil.parseDateString(entity.getStartDate()).getTime();
    long end=DateUtil.parseDateString(entity.getEndDate()).getTime();
    if(start <=0l || end <=0l){
        log.error("queryLisNewt|查询错误|解析日期错误:param:{}",entity);
        return results;
    }
    List<String> indexNames = ElasticsearchUtil.getAllIndexNameByTimeStampRange(newIndexPrefix, start, end);
    if(CollectionUtils.isEmpty(indexNames)){
        log.error("queryLisNewt|查询错误|获取索引名称为空:{}",entity);
        return results;
    }
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder
            .query(queryBuilder)
            .from(from)
            .size(pageSize)
            .sort(SortBuilders.fieldSort("createTime")
            .order(SortOrder.DESC));

    SearchRequest searchRequest = new SearchRequest(indexNames.toArray(new String[]{}),searchSourceBuilder);
    RequestOptions.Builder builder = RequestOptions.DEFAULT.toBuilder();
    builder.setHttpAsyncResponseConsumerFactory(new HttpAsyncResponseConsumerFactory

.HeapBufferedResponseConsumerFactory(200*1024*1024)); try { SearchResponse response = restHighLevelClient.search(searchRequest, builder.build()); SearchHits hits = response.getHits(); for (SearchHit hit : hits) { Map map = hit.getSourceAsMap(); FeedbackReport obj = new FeedbackReport(); org.apache.commons.beanutils.BeanUtils.populate(obj, map); results.add(obj); } } catch (Exception e) { log.error("queryLisNewt|查询错误|获取结果失败:{}",e.getMessage(),e); return results; }

    return results;
}

### 3、数据写入


主要就是收到消息,然后还是根据时间戳写到索引里面就行了。



public boolean insertRecordNew(FeedbackReport record) throws Exception {

    if(record.getCreateTime() <=0 ){
        log.error("写入索引|写入失败|提交时间错误:{}|param:{}",record.getCreateTime(),JSON.toJSONString(record));
        return false;
    }
    // 根据时间戳 获取索引名称
    String indexName = ElasticsearchUtil.getIndexNameByTimeStamp(newIndexPrefix, record.getCreateTime());
    log.info("写入索引|准备写入|获取索引名称|name:{}",indexName);
    // 判断此内容是否存在
    GetRequest getRequest = new GetRequest(indexName);
    getRequest.id(record.getId());
    boolean exists = restHighLevelClient.exists(getRequest, RequestOptions.DEFAULT);
    if(exists){
        log.error("写入索引|写入失败|内容重复|indexName:{}|time:{}|param:{}",indexName,record.getCreateTime(),JSON.toJSONString(record));
        return false;
    }
    IndexRequest indexRequest = new IndexRequest(indexName, "\_doc", record.getId());
    indexRequest.source(JSON.toJSONString(record),XContentType.JSON);
    restHighLevelClient.index(indexRequest, RequestOptions.DEFAULT);
    alarmService.addAlarm(record);
    log.info("写入索引|写入成功|createTime:{},id:{}",record.getCreateTime(),record.getId());
    return true;
}

## 二、迁移数据


1、使用reindex API迁移数据  
 2、异步迁移,因为不用异步方式的话会超时,我迁移的数据量比较大  
 以某个月为例  
 从索引xxx-xxx-xxx迁移到xxx-xxx-xxx-xxx-202402,然后查询任务的状态是`GET /_tasks/Ydx4P84WTrWGGjLPD5dJ6A:xxxx` 其中Ydx4P84WTrWGGjLPD5dJ6A:xxxx你执行reindex返回的一个ID。



POST _reindex?wait_for_completion=false { "source": { "index": "xxx-xxx-xxx", "query": { "constant_score": { "filter": { "range": { "createTime": { "gte": 1706716800000, "lte": 1709222399999 } } } } } }, "dest": { "index": "xxx-xxx-xxx-xxx-202402" } }


注意:我这个迁移是在一个集群里的,就是测试环境给一个大索引分开。然后运维迁移到新集群。  
 还有一个方式就是使用reindex直接迁移到新集群,但是这个需要修改新集群的配置,这个会造成线上环境重启,会有影响。所以才用了这个方式。







![img](https://p3-xtjj-sign.byteimg.com/tos-cn-i-73owjymdk6/b2e5555055144328886f2454ec42a8dd~tplv-73owjymdk6-jj-mark-v1:0:0:0:0:5o6Y6YeR5oqA5pyv56S-5Yy6IEAg5py65Zmo5a2m5Lmg5LmL5b-DQUk=:q75.awebp?rk3s=f64ab15b&x-expires=1771866900&x-signature=bK%2BkAs2IXQcgLGZIqaTDwtHHU7o%3D)
![img](https://p3-xtjj-sign.byteimg.com/tos-cn-i-73owjymdk6/155dec7824ab451b86e0fc7926bd8dad~tplv-73owjymdk6-jj-mark-v1:0:0:0:0:5o6Y6YeR5oqA5pyv56S-5Yy6IEAg5py65Zmo5a2m5Lmg5LmL5b-DQUk=:q75.awebp?rk3s=f64ab15b&x-expires=1771866900&x-signature=ISbW2dpJtNjO6rkOjAm745sEt10%3D)
![img](https://p3-xtjj-sign.byteimg.com/tos-cn-i-73owjymdk6/971fd2f5bfee4ac6a5241200bd620cb4~tplv-73owjymdk6-jj-mark-v1:0:0:0:0:5o6Y6YeR5oqA5pyv56S-5Yy6IEAg5py65Zmo5a2m5Lmg5LmL5b-DQUk=:q75.awebp?rk3s=f64ab15b&x-expires=1771866900&x-signature=rmx0TQEC%2Bp8GzPsjQTr%2FQU6bQKw%3D)

**既有适合小白学习的零基础资料,也有适合3年以上经验的小伙伴深入学习提升的进阶课程,涵盖了95%以上大数据知识点,真正体系化!**


**由于文件比较多,这里只是将部分目录截图出来,全套包含大厂面经、学习笔记、源码讲义、实战项目、大纲路线、讲解视频,并且后续会持续更新**

**[需要这份系统化资料的朋友,可以戳这里获取](https://gitee.com/vip204888)**