简介

一个基于springcloud的分布式广告系统。

文本介绍ad-search微服务，这是广告系统的核心模块，实现对广告的检索。

索引设计

使用索引加快广告检索的速度，项目的核心就是索引。

为什么广告索引数据要放在JVM内存中，广告数据太多内存放不下怎么办？

广告数据放在JVM内存中，只有一个原因，就是快
广告数据太多，可以考虑放到诸如Redis这样的缓存系统

索引应该支持增删改查四项基本操作。

public interface IndexAware<K, V> {

    V get(K key);

    void add(K key, V value);

    void update(K key, V value);

    void delete(K key, V value);
}

plan

AdPlanObject

广告计划数据在索引里的存在形式

@Data
@NoArgsConstructor
@AllArgsConstructor
public class AdPlanObject {

    private Long planId;
    private Long userId;
    private Integer planStatus;
    private Date startDate;
    private Date endDate;

    public void update(AdPlanObject newObject) {

        if (null != newObject.getPlanId()) {
            this.planId = newObject.getPlanId();
        }
        if (null != newObject.getUserId()) {
            this.userId = newObject.getUserId();
        }
        if (null != newObject.getPlanStatus()) {
            this.planStatus = newObject.getPlanStatus();
        }
        if (null != newObject.getStartDate()) {
            this.startDate = newObject.getStartDate();
        }
        if (null != newObject.getEndDate()) {
            this.endDate = newObject.getEndDate();
        }
    }
}

AdPlanIndex

public class AdPlanIndex implements IndexAware<Long, AdPlanObject>

核心是这样一个Map

private static Map<Long, AdPlanObject> objectMap;

static {
    objectMap = new ConcurrentHashMap<>();
}

静态代码块中的代码随着类的加载而执行，只执行一次。

静态代码块、构造代码块、构造函数以及Java类初始化顺序_张嘉烘的博客-CSDN博客

ConcurrentHashMap一个哈希表，支持检索的完全并发和更新的高预期并发性。

ConcurrentHashMap - Java 11中文版 - API参考文档 (apiref.com)

Map可以很轻松的实现增、删、查（put、remove、get）。修改时先确定要修改的数据是否存在，如果存在可以修改，如果不存在，就添加一个新的。

AdUnitIndex

public class AdUnitIndex implements IndexAware<Long, AdUnitObject>

match

匹配所有符合positionType的unit的id。

public Set<Long> match(Integer positionType) {
    Set<Long> adUnitIds = new HashSet<>();
    objectMap.forEach((k, v) -> {
        if (AdUnitObject.isAdSlotTypeOK(positionType, v.getPositionType())) {
            adUnitIds.add(k);
        }
    });

    return adUnitIds;
}

POSITION_TYPE：开屏、贴片、视频播放中的贴片、视频暂停时的贴片、视频结束时的贴片。

fetch

fetch是一个增强版的get，根据一个adUnitId集合获取List<AdUnitObject>。

fetch：Collection<Long> adUnitIds -> List<AdUnitObject>
get：Long key -> AdUnitObject

public List<AdUnitObject> fetch(Collection<Long> adUnitIds) {
    if (CollectionUtils.isEmpty(adUnitIds)) {
        return Collections.emptyList();
    }

    List<AdUnitObject> result = new ArrayList<>();

    adUnitIds.forEach(u -> {
        AdUnitObject object = this.get(u);
        if (object == null) {
            log.error("AdUnitObject not found: {}", u);
            return;
        }
        result.add(object);
    });

    return result;
}

CreativeUnitIndex

public class CreativeUnitIndex implements IndexAware<String, CreativeUnitObject>

// <adId-unitId, CreativeUnitObject>
private static Map<String, CreativeUnitObject> objectMap;
// <adId, unitId Set>
private static Map<Long, Set<Long>> creativeUnitMap;
// <unitId, adId set>
private static Map<Long, Set<Long>> unitCreativeMap;

static {
    objectMap = new ConcurrentHashMap<>();
    creativeUnitMap = new ConcurrentHashMap<>();
    unitCreativeMap = new ConcurrentHashMap<>();
}

查找

一般查找，根据adId-unitId字符串查找CreativeUnitObject。

@Override
public CreativeUnitObject get(String key) {
    return objectMap.get(key);
}

根据List<AdUnitObject> unitObjects，查找creative的id。

public List<Long> selectAds(List<AdUnitObject> unitObjects){
    if(CollectionUtils.isEmpty(unitObjects)){
        return Collections.emptyList();
    }

    List<Long> result=new ArrayList<>();

    for (AdUnitObject unitObject : unitObjects) {
        Set<Long> adIds = unitCreativeMap.get(unitObject.getUnitId());
        if(CollectionUtils.isNotEmpty(adIds)){
            result.addAll(adIds);
        }
    }

    return result;
}

增加

需要维护三个Map。

objectMap.put(key, value);

CommonUtil.getOrCreate(value.getAdId(), creativeUnitMap, ConcurrentSkipListSet::new)
        .add(value.getUnitId());

CommonUtil.getOrCreate(value.getUnitId(), unitCreativeMap, ConcurrentSkipListSet::new)
        .add(value.getAdId());

get：如果map中存在key，返回key在map中对应的value
create：如果map中不存在key，在map中创建key，根据computeIfAbsent的第二个参数，对key赋值，返回key对应的value（刚生成的那个）

public static <K, V> V getOrCreate(K key, Map<K, V> map, Supplier<V> factory) {
    return map.computeIfAbsent(key, k -> factory.get());
}

更新

该索引不支持更新。

CreativeUnitIndex、UnitDistrictIndex、UnitItIndex、UnitKeywordIndex都不支持更新索引，他们都维护多个Map。

如果需要更新索引，先删除索引，再添加索引。

删除

需要维护三个Map。

@Override
public void delete(String key, CreativeUnitObject value) {
    objectMap.remove(key);

    Set<Long> unitSet = creativeUnitMap.get(value.getAdId());
    if (CollectionUtils.isNotEmpty(unitSet)) {
        unitSet.remove(value.getUnitId());
    }

    Set<Long> creativeSet = unitCreativeMap.get(value.getUnitId());
    if (CollectionUtils.isNotEmpty(creativeSet)) {
        creativeSet.remove(value.getAdId());
    }
}

UnitKeywordIndex

public class UnitKeywordIndex implements IndexAware<String, Set<Long>>

/**
 * keyword -> unitIds；倒排索引
 */
private static Map<String, Set<Long>> keywordUnitMap;

/**
 * unitId -> keyword；正向索引
 */
private static Map<Long, Set<String>> unitKeywordMap;

static {
    keywordUnitMap = new ConcurrentHashMap<>();
    unitKeywordMap = new ConcurrentHashMap<>();
}

查找

根据关键词String key，从倒排索引keywordUnitMap中获取unitId的集合Set<Long>。

@Override
public Set<Long> get(String key) {
    if (StringUtils.isEmpty(key)) {
        return Collections.emptySet();
    }
    Set<Long> result = keywordUnitMap.get(key);
    if (result == null) {//未命中
        return Collections.emptySet();
    }

    return result;
}

添加

需要维护两个Map。

/**
 * @param key 关键词
 * @param value 关键词所属推广单元id的集合
 */
@Override
public void add(String key, Set<Long> value) {
    log.info("UnitKeywordIndex, before add: {}", unitKeywordMap);

    Set<Long> unitIdSet = CommonUtil.getOrCreate(key, keywordUnitMap, ConcurrentSkipListSet::new);
    unitIdSet.addAll(value);

    for (Long unitId : value) {
        Set<String> keywordSet = CommonUtil.getOrCreate(unitId, unitKeywordMap, ConcurrentSkipListSet::new);
        keywordSet.add(key);
    }

    log.info("UnitKeywordIndex, after add: {}", unitKeywordMap);
}

更新

不支持更新操作

删除

需要维护两个Map。

/**
 * @param key 关键词
 * @param value 需要删除的关键词所属推广单元id的集合
 */
@Override
public void delete(String key, Set<Long> value) {
    Set<Long> unitIds = CommonUtil.getOrCreate(key, keywordUnitMap, ConcurrentSkipListSet::new);
    unitIds.removeAll(value);

    for (Long unitId : value) {
        Set<String> keywordSet = CommonUtil.getOrCreate(unitId, unitKeywordMap, ConcurrentSkipListSet::new);
        keywordSet.remove(key);
    }
}

匹配

unitId所属的关键词限制是否包含List<String> keywords。

public boolean match(Long unitId, List<String> keywords) {
    if (unitKeywordMap.containsKey(unitId) && CollectionUtils.isNotEmpty(unitKeywordMap.get(unitId))) {
        Set<String> unitAllKeywords = unitKeywordMap.get(unitId);
        return CollectionUtils.isSubCollection(keywords, unitAllKeywords);//keywords是unitKeywords的子集时返回true
    }

    return false;
}

DataTable类

通过DataTable类，方便的使用所有的索引类。

Spring容器会在创建该Bean之后，自动调用该Bean的setApplicationContextAware()方法
使用of方法获取索引类
- 第一次获取，从applicationContext中获取类，保存到Map<Class, Object> dataTableMap中，从dataTableMap中获取。
- 不是第一次获取，从dataTableMap中获取

@Component
public class DataTable implements ApplicationContextAware, PriorityOrdered {

    public static final Map<Class, Object> dataTableMap = new ConcurrentHashMap<>();//保存所有的index服务
    private static ApplicationContext applicationContext;

    @SuppressWarnings("all")
    public static <T> T of(Class<T> clazz) {

        T instance = (T) dataTableMap.get(clazz);
        if (null != instance) {//命中
            return instance;
        }

        dataTableMap.put(clazz, bean(clazz));
        return (T) dataTableMap.get(clazz);
    }

    //通过bean的名字获取到容器中的bean
    private static <T> T bean(String beanName) {
        return (T) applicationContext.getBean(beanName);
    }

    //通过java类的类型获取到容器中的bean
    private static <T> T bean(Class clazz) {
        return (T) applicationContext.getBean(clazz);
    }

    @Override
    public void setApplicationContext(ApplicationContext applicationContext) throws BeansException {
        DataTable.applicationContext = applicationContext;
    }

    //定义初始化的优先级
    @Override
    public int getOrder() {
        return PriorityOrdered.HIGHEST_PRECEDENCE;
    }

}

使用举例

Set<Long> adUnitIdSet = DataTable.of(AdUnitIndex.class).match(adSlot.getPositionType());

List<AdUnitObject> unitObjects = DataTable.of(AdUnitIndex.class).fetch(targetUnitIdSet);

索引的处理

@Slf4j
public class AdDataHandler

V是索引数据的类型，plan数据的V是AdPlanObject。

/**
 * 对索引的处理。对哪个索引（index）进行什么样的处理（type）
 */
private static <K, V> void handleBinlogEvent(IndexAware<K, V> index, K key, V value, OpType type) {

    switch (type) {
        case ADD:
            index.add(key, value);
            break;
        case UPDATE:
            index.update(key, value);
            break;
        case DELETE:
            index.delete(key, value);
            break;
        default:
            break;
    }
}

以plan为例，处理plan数据的索引，需要AdPlanObject类型对象。

public static void handle(AdPlanTable planTable, OpType type) {

    AdPlanObject planObject = new AdPlanObject(
            planTable.getId(),
            planTable.getUserId(),
            planTable.getPlanStatus(),
            planTable.getStartDate(),
            planTable.getEndDate()
    );

    handleBinlogEvent(
            DataTable.of(AdPlanIndex.class),
            planObject.getPlanId(),
            planObject,
            type
    );
}

索引之间存在如下依赖关系

unit：依赖plan
creativeUnit：依赖unit、creative
unitDistrict、unitIt、unitKeyword：依赖unit

所以处理unit数据时要先确定unit所依赖的plan是存在的。

creativeUnit、unitDistrict、unitIt、unitKeyword的索引不支持更新操作。

if (type == OpType.UPDATE) {
    log.error("district index can not support update");
    return;
}

//depend on unit
AdUnitObject unitObject = DataTable.of(AdUnitIndex.class).get(unitDistrictTable.getUnitId());
if (unitObject == null) {
    log.error("AdUnitDistrictTable index error: {}", unitDistrictTable.getUnitId());
    return;
}

全量索引

在启动检索系统之前，会把数据库中的数据导出到一份文件中，启动检索系统是将文件中的数据作为全量索引加载进检索系统。

为什么要把全量数据导出到文件中，而不是服务直接从数据库中加载？

检索服务是多实例存在，同时操作数据库，会给数据库造成巨大的压力。由此也可以看出，全量数据文件应该放到公共文件系统上，例如NFS或FTP

@Component
@DependsOn("dataTable")
public class IndexFileLoader

数据路径

数据库 ---JAP---> AdPlan ------> AdPlanTable ---序列化---> .data文件

.data文件 ------> String ---反序列化---> AdPlanTable ------> AdPlanObject

文件加载

数据库中的数据在ad-dumpData模块的作用下，导出到了自定义的.data文件中。

ad_unit_keyword.data文件的内容如下

{"keyword":"宝马","unitId":10}
{"keyword":"奥迪","unitId":10}
{"keyword":"大众","unitId":10}

从.data文件中读取数据，文件中一行是一个String，整个文件就是一个List<String>。

private List<String> loadDumpData(String fileName) {
    try (BufferedReader br = Files.newBufferedReader(Paths.get(fileName))) {
        return br.lines().collect(Collectors.toList());
    } catch (IOException ex) {
        throw new RuntimeException(ex.getMessage());
    }
}

初始化

@PostConstruct
public void init()

@PostConstruct 注解用于需要在依赖注入完成后执行任何初始化的方法。

每一种全量索引的加载，都要经过以下三个步骤

读取.data文件获取List<String>
String反序列化为某某Table对象
使用AdLevelDataHandler，加载索引

List<String> adPlanStrings = loadDumpData(
        String.format("%s%s", DConstant.DATA_ROOT_DIR, DConstant.AD_PLAN));//读取文件内容
for (String adPlanString : adPlanStrings) {
    AdPlanTable planTable = JSON.parseObject(adPlanString, AdPlanTable.class);//反序列化
    AdLevelDataHandler.handleLevel2(planTable, OpType.ADD);//添加索引
}

根据索引之间的依赖关系，确定索引加载的顺序。

增量索引

详见监听binlog构建增量索引 | 广告系统 - 掘金 (juejin.cn)

搜索服务

搜索请求对象

SearchRequest

媒体方的请求标识String mediaId
请求基本信息RequestInfo requestInfo
- 唯一请求id
- 广告位信息List<AdSlot> adSlots
  - 广告位编码
  - 流量类型（贴片、开屏）
  - 宽
  - 高
  - 物料类型（图片、视频、文本）
  - 最低出价
- 终端信息
- 地域信息
- 设备信息
匹配信息FeatureInfo featureInfo
- List<String> keywords
- List<ProvinceAndCity> districts
- List<String> its
- FeatureRelation relation

解释：

每次获取广告的请求，可能期望返回多条广告。浏览某个页面时，一个页面可能会有多个广告位，即一次请求需要获取多条广告。而不同广告位的流量类型，物料类型，宽度，高度可能不同。
同一次广告请求的特征信息FeatureInfo 是一样的。不同用户做不同行为时，特征信息的内容应该不同，通过不同的特征信息内容，给用户推荐更个性化的广告。
- 根据用户当前的地位位置，用户的常驻地址设置地理特征DistrictFeature
- 根据用户的兴趣（来自用户自己填写、用户兴趣画像）设置兴趣特征ItFeature
- 根据用户的行为（用户搜索的关键词，用户近期观看的视频的关键词TAG）设置KeywordFeature
FeatureRelation 有OR和AND两个选择

fetchAds

检索系统重要的方法，用来根据请求匹配广告public SearchResponse fetchAds(SearchRequest request)。

unit级别的过滤

通过来自adSlot广告位PositionType过滤，去除掉不符合要求的unit
通过三个Feature过滤，去除掉不符合要求的unit
被删除的unit，被删除的plan下的unit需要过滤掉

creative级别的过滤

根据长度、宽度、物料类型（图片、视频）匹配广告
随机获取已匹配广告中的一个广告

广告检索模块 | 广告系统

简介