开启掘金成长之旅!这是我参与「掘金日新计划 · 12 月更文挑战」的第9天,点击查看活动详情
1.背景
因为我们要测试es在大数据场景下的一些搜索实战场景,如果我们手动插入数据那数据量也是很大的,如果拿生产数据来进行测试学习,那也是不合理的,我们需要造一些假数据接近真实场景,之前就遇到一个很好的工具datafaker,用来生成一些接近真实数据的场景,接下来我们就用这个数据去造一些数据,用于我们后边的实验项目。
我们先来简单了解下datafaker
2.简单使用datafaker
我们先来简单了解下datafaker,这是它的官方使用文档www.datafaker.net/documentati…。
官方提供了超过100个内置的数据列表,我们可以根据官方提供的这数据生成大量的测试数据,以下截图,只展示了一部分,我们可以根据需要到官方文档查询,找到合适的api生成我们需要的数据。
<dependency>
<groupId>net.datafaker</groupId>
<artifactId>datafaker</artifactId>
<version>1.6.0</version>
</dependency>
我们先来演示下生成人名字,写一个循环随机生成10条假数据。
在com.daiyu.elastic.common.util下添加MockDataUtil.java工具类,用来编写我们生成假数据的方法。 我们先来演示下生成人名字,写一个循环随机生成10条假数据。
public static void main(String[] args) {
for (int i = 0; i < 10; i++) {
Faker faker = new Faker();
String name = faker.name().fullName();
String firstName = faker.name().firstName();
String lastName = faker.name().lastName();
String streetAddress = faker.address().streetAddress();
log.info("name:{},firstName:{},lastName:{},streetAddress:{}",name,firstName,lastName,streetAddress);
}
}
结果如下所示
name:Shannon Hettinger,firstName:Faviola,lastName:Satterfield,streetAddress:8468 Harold Center
name:Darius Bechtelar,firstName:Noel,lastName:Reinger,streetAddress:5019 Nikolaus Forges
name:Carley Green DVM,firstName:Nevada,lastName:Frami,streetAddress:3328 MacGyver Vista
name:Mr. Shenna Effertz,firstName:Margart,lastName:Beer,streetAddress:080 Carter Prairie
name:Valentine Prohaska,firstName:Kellye,lastName:Tillman,streetAddress:046 Greenfelder Green
name:Jessie Fisher,firstName:Russ,lastName:Skiles,streetAddress:28672 Beahan Route
name:Merrill Labadie Jr.,firstName:Billy,lastName:Heller,streetAddress:4532 Brakus Park
name:Jerald Marvin,firstName:Dia,lastName:Schiller,streetAddress:93120 Takako Turnpike
name:Oliver Schmitt,firstName:Sheryll,lastName:Bailey,streetAddress:85094 Mayert Ford
name:Rayford Smitham,firstName:Ariel,lastName:Lynch,streetAddress:139 Rutherford Pass
但是我们想生成中文的内容,我们只需要在生成Faker对象时候,传入Locale.CHINA这个参数就行了,其作用设置本地方言为中文。
Faker faker = new Faker(Locale.CHINA);
加完之后,我们再次运行程序,结果如下
name:苏昊焱,firstName:雪松,lastName:卢,streetAddress:郑巷59308号
name:谭凯瑞,firstName:鸿煊,lastName:黄,streetAddress:顾桥963号
name:曾子默,firstName:昊焱,lastName:邓,streetAddress:万栋910号
name:阎越彬,firstName:瑞霖,lastName:赖,streetAddress:冯路8135号
name:毛笑愚,firstName:博文,lastName:阎,streetAddress:罗桥498号
name:高智宸,firstName:晓啸,lastName:马,streetAddress:崔桥8735号
name:韦楷瑞,firstName:修杰,lastName:丁,streetAddress:朱侬4号
name:朱志泽,firstName:弘文,lastName:范,streetAddress:丁中心900号
name:袁伟宸,firstName:鑫鹏,lastName:冯,streetAddress:朱巷01号
name:董瑞霖,firstName:绍辉,lastName:郝,streetAddress:阎旁6956号
datafaker只有部分数据可以生成中文的假数据,其他的都是英文的,如果我们想要使用自己定义的假数据,这个也是可以的,官方提供了相应的抽象类,我们可以继承它进行实现,接下来我们做一个小小的例子,演示自定义假数据。
@Slf4j
public class MockDataUtil {
private static CompanyFaker companyFaker=new CompanyFaker(Locale.CHINA);
public static class MyCompany extends AbstractProvider {
private static final String[] COMPANY_NAMES = new String[]{"阿里", "腾讯", "百度", "字节","美团","米哈游","微软","大疆","华为","小红书","SHEIN"};
private static Faker faker = new Faker();
public MyCompany(Faker faker) {
super(faker);
}
public String nextCompanyName() {
return COMPANY_NAMES[faker.random().nextInt(COMPANY_NAMES.length)];
}
}
public static class CompanyFaker extends Faker {
public CompanyFaker(Locale locale) {
super(locale);
}
public MyCompany companyName() {
return getProvider(MyCompany.class, () -> new MyCompany(this));
}
}
public static void main(String[] args) {
for (int i = 0; i < 5; i++) {
System.out.println(companyFaker.companyName().nextCompanyName());
}
}
结果如下所示
SHEIN
阿里
百度
腾讯
小红书
到此,datafaker的简单使用演示完成。
3.编写工具类生成假数据
3.1 所需要生成假数据的业务
在这里我们先编写一生成商品假数据的工具,我们先看下商品的实体类,包含哪些字段。
| 字段 | 注释 | 类型 |
|---|---|---|
| id | id | Long |
| category | 分类 | String |
| basePrice | 标签基础价格 | BigDecimal |
| marketPrice | 实际销售价 | BigDecimal |
| stockNum | 库存数量 | Integer |
| skuImgUrl | 商品的图片链接 | String |
| skuId | 商品的squid | String |
| skuName | 商品的名称 | String |
| createTime | 生成时间 | Date(yyyy-MM-dd HH:mm:ss) |
| updateTime | 更新时间 | Date(yyyy-MM-dd HH:mm:ss) |
id可以结合原子自增类生成。
其中category我们可以生成数字,然后做枚举,大概捋了下商品分类的几个枚举类型。
| 枚举code | 枚举value |
|---|---|
| 1 | 电器 |
| 2 | 家庭五金 |
| 3 | 文具 |
| 4 | 图书 |
| 5 | 视频 |
| 6 | 数码 |
| 7 | 服装 |
| 8 | 玩具 |
| 9 | 体育健身 |
| 10 | 食品 |
我们在com.daiyu.elastic.eunms下新建category枚举类CategoryEnums.java
@Getter
@AllArgsConstructor
public enum CategoryEnums {
ELECTRIC(1, "电器"),
HARDWARE(2, "家庭五金"),
STATIONERY(3, "文具"),
BOOK(4, "图书"),
COSMETICS(5, "化妆品"),
DIGITAL(6, "数码"),
CLOTHING(7, "服装"),
TOYS(8, "玩具"),
PHYSICAL(9, "体育健身"),
FOOD(10, "食品");
private Integer code;
private String name;
public int getCode() {
return code;
}
public void setCode(int code) {
this.code = code;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public static CategoryEnums getByCode(int code) {
for (CategoryEnums value : values()) {
if (value.getCode() == code) {
return value;
}
}
return ELECTRIC;
}
}
这个枚举类后边查询用,在实际生产中为了节省存储成本,一般标签分类相关的字段都用枚举code来替代,查询时候,在通过枚举value进行映射。
然后我们再看下剩下的字段
basePrice、marketPrice、stockNum这几个数字类型的字段我们需要随机生成,这个用Math.Random随机生成就好了。但是要注意,basePrice的值要大于marketPrice的值,stockNum的值要大于等于0。
skuImgUrl是生成图片短链接的,这个我们用datafaker生成的ip即可。
skuName跟skuid都是商品的属性,这个也是可以随机生成的,createTime跟updateTime可以根据给定的DataFormat给定的日期格式生成。
接下来我们编写生成工具类的代码如下。
@Slf4j
public class MockDataUtil {
/**
* 设置本地语言为中文
*/
private static Faker faker = new Faker(Locale.CHINA);
/**
* 设置日期格式
*/
private static String pattern = "yyyy-mm-dd HH:mm:ss";
private static CategoryFaker categoryFaker=new CategoryFaker(Locale.CHINA);
private static SkuFaker skuFaker=new SkuFaker(Locale.CHINA);
public static String getMockValue(GoodsEnums goodsEnums) {
switch (goodsEnums) {
case CATEGORY:
return categoryFaker.category().nextCategory();
case SKUID:
return faker.idNumber().peselNumber();
case SKUNAME:
return skuFaker.sku().nextSku();
case SKUIMGURL:
return faker.internet().image();
case CREATETIME:
return faker.date().birthday(pattern);
case UPDATETIME:
return faker.date().birthday(pattern);
default:
return "11";
}
}
private static int getStockNum(){
Random random = new Random();
random.setSeed(10000L);
return random.nextInt();
}
private static BigDecimal getPrice(){
Random random = new Random();
random.setSeed(10000L);
return new BigDecimal(random.nextDouble());
}
public static class Category extends AbstractProvider {
private static final String[] CATEGORYS = new String[]{"1", "2", "3", "4","5","6","7","8","9","10"};
private static Faker faker = new Faker();
public Category(Faker faker) {
super(faker);
}
public String nextCategory() {
return CATEGORYS[faker.random().nextInt(CATEGORYS.length)];
}
}
public static class CategoryFaker extends Faker {
public CategoryFaker(Locale locale) {
super(locale);
}
public Category category() {
return getProvider(Category.class, () -> new Category(this));
}
}
public static class Sku extends AbstractProvider {
private static final String[] SKUNAME = new String[]{"书架","毛巾","花瓶","跑步机","瓷器","电脑","吸尘器","艺术碳雕","山水壁画","微波炉","空调","枕头","床单","枕头套","被子","茶杯","饭碗","牙刷","牙膏","浴巾","洗脸巾","沐浴露","洗发露","纸","垃圾桶","勺子","筷子","锅","洗脸盆","洗洁精","香皂","洗衣粉","梳子","洗衣粉","花露水","手电筒","牙签","电器插座","锁","枕头","床单","枕头套","被子","茶杯","饭碗","牙刷","牙膏","浴巾","洗脸巾","沐浴露","洗发露","纸","垃圾桶","勺子","筷子","锅","洗脸盆","洗洁精","香皂","洗衣粉","梳子","洗衣粉","花露水","手电筒","牙签","电器插座","锁","靠垫","抱枕","果盘","相框","花瓶","放遥控板的小篮子","纸巾盒","烟灰缸","床头柜上的小饭盒","勺子","筷子","洗洁精","手表","随身听","笔记本电脑","包","常备药","维生素类","衣服架","夹 长","绳子","纸篓","创可贴","纱布","胶布&胶","双面胶","凉席","竹枕","各种型号电池","闹钟","台灯"};
private static Faker faker = new Faker();
public Sku(Faker faker) {
super(faker);
}
public String nextSku() {
return SKUNAME[faker.random().nextInt(SKUNAME.length)];
}
}
public static class SkuFaker extends Faker {
public SkuFaker(Locale locale) {
super(locale);
}
public Sku sku() {
return getProvider(Sku.class, () -> new Sku(this));
}
}
}
至此我们编写造假数据 的工具完成,如果大家还想增加其他的数据,请自行进行扩展。