统计活动参与名单,看看是什么样的骚操作来实现的

·  阅读 8244
统计活动参与名单,看看是什么样的骚操作来实现的

本文已参与好文召集令活动,点击查看:后端、大前端双赛道投稿,2万元奖池等你挑战!

介绍

前段时间做出来一个活动排名,虽然有很多不足的地方,但是依然收获很多好评

image.png

有些小伙伴很好奇,怎么实现的,由于代码耦合性比较强,开始的时候也不知道怎么写这篇文章,一直没发

最近比较忙,也没及时修复bug、添加新功能,所以决定,开源出来,让大家一起舔砖加瓦,将功能完善起来

欢迎各位大佬来贡献代码 项目地址:github.com/ytwp/juejin…

这篇文章主要是讲一下活动排名怎么实现的

正题

  1. 要做这个功能,一定离不开用户,第一步就是发现用户

    • 目前已通过专栏,以及每个标签下的最新文章,发现用户,收集用户ID

    • 过滤不活跃用户,降低请求数量

  2. 通过定时,查询一下用户的信息,然后保存起来,相当于拍了个快照

  3. 查询用户时,把近一个月的文章,也查询出来,给这个功能做数据支撑

    截取部分核心源码
    public void run() {
        log.info("拉取用户快照");
        String now = LocalDateTime.now().format(yyyyMMddHH);
        try {
            String path = "./j-" + LocalDate.now().format(yyyyMMdd) + ".json";
            FileUtil.initFile(path);
            FileWriter fw = new FileWriter(path, true);
            PrintWriter pw = new PrintWriter(fw);
    
            int i = 0;
            //遍历所有用户 然后去获取用户信息
            //获取到后,输出到文件里,用于后边的计算
            for (String userId : userIdSet) {
                //获取用户信息
                JueJInApi.UserData userData = JueJInApi.getUser(userId);
                if (userData == null) {
                    userData = JueJInApi.getUser(userId);
                }
                if (userData != null) {
                    userData.setTime(now);
                    pw.println(JSONUtil.toJsonStr(userData));
                }
                log.info((++i) + " 用户快照:" + userId);
            }
            pw.close();
            fw.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
        log.info("拉取用户快照结束:" + now);
    }
    
    public static UserData getUser(String user_id) {
        Map<String, Object> map1 = new HashMap<>();
        map1.put("cursor", "0");
        map1.put("sort_type", 2);
        map1.put("user_id", user_id);
        Map<String, Object> map2 = new HashMap<>();
        map2.put("audit_status", null);
        map2.put("cursor", "0");
        map2.put("limit", 10);
        map2.put("user_id", user_id);
        try {
            //由于每篇文章都包含用户信息,所以直接去拉取文章就行
            //每次10条,直到拉取完45天内的所有文章
            List<ArticleData> articleDataList = new ArrayList<>();
            for (Integer cursor = 0; true; cursor += 10) {
                map1.put("cursor", cursor.toString());
                String res1 = HttpUtil.post("https://api.juejin.cn/content_api/v1/article/query_list", JSONUtil.toJsonStr(map1));
                JSONObject jsonObject = JSONUtil.parseObj(res1);
                Integer count = jsonObject.getInt("count");
                if (count == 0) {
                    break;
                }
                JSONArray data1 = jsonObject.getJSONArray("data");
                List<ArticleData> dataList = JSONUtil.toList(data1, ArticleData.class);
                //过滤超过45天的文章
                List<ArticleData> data2 = dataList.stream().filter(data -> {
                    String ctime = data.getArticle_info().getCtime();
                    long now = System.currentTimeMillis() / 1000;
                    return Long.parseLong(ctime) > (now - (60 * 60 * 24 * 45));
                }).collect(Collectors.toList());
    
                articleDataList.addAll(data2);
    
                //如果有超过45天的文章,就结束
                if (dataList.size() != data2.size()) {
                    break;
                }
                //如果拉完了所有文章也结束
                if (jsonObject.getInt("cursor") >= (count)) {
                    break;
                }
            }
            //这里是拿一个文章包含的用户信息,然后把其他的用户信息都设置null 防止占有大量硬盘
            AtomicReference<AuthorUserInfo> author_user_info = new AtomicReference<>();
            articleDataList = articleDataList.stream().peek(articleData -> {
                author_user_info.set(articleData.getAuthor_user_info());
                articleData.setAuthor_user_info(null);
            }).collect(Collectors.toList());
            //这里是这个用户的所以专栏,用于专栏统计
            List<SelfData> selfDataList = new ArrayList<>();
            for (Integer cursor = 0; true; cursor += 10) {
                map2.put("cursor", cursor.toString());
                String res2 = HttpUtil.post("https://api.juejin.cn/content_api/v1/column/self_center_list", JSONUtil.toJsonStr(map2));
                JSONObject jsonObject = JSONUtil.parseObj(res2);
                Integer count = jsonObject.getInt("count");
                if (count == 0) {
                    break;
                }
                JSONArray data2 = jsonObject.getJSONArray("data");
                selfDataList.addAll(JSONUtil.toList(data2, SelfData.class));
                if (jsonObject.getInt("cursor") >= (count)) {
                    break;
                }
            }
            //包装用户信息
            UserData userData = new UserData();
            userData.setUser_id(user_id);
            userData.setArticle_list(articleDataList);
            userData.setSelf_center_list(selfDataList);
            userData.setAuthor_user_info(author_user_info.get());
            return userData;
        } catch (Exception e) {
            e.printStackTrace();
        }
        return null;
    }
    复制代码
  4. 分析数据

    // 我感觉这代码,以及可以称为,最佳迷惑代码,自己都快解释不清了
    def run(): Unit = {
        log.info("计算活动文章")
        //各种时间
        val now = LocalDateTime.now.format(yyyyMMddHH)
        val runTime = System.currentTimeMillis() / 1000
        val yyyyMMddStr = LocalDate.now.format(yyyyMMdd)
        val yyyyMMddInt = Integer.parseInt(yyyyMMddStr)
        try {
            //输出路径
            val path = FilePathConstant.EXPLORE_DARA_PATH.format(yyyyMMddStr)
            val outPath = FilePathConstant.BAK_ACTIVITY_REPORT_DARA_PATH.format(now.format(yyyyMMdd))
            val outNowPath = FilePathConstant.ACTIVITY_REPORT_DARA_PATH
            val rulePath = FilePathConstant.ACTIVITY_RULE_PATH
            val configPath = FilePathConstant.ACTIVITY_CONFIG_PATH
    
            //读取所需要的配置文件
            val userDataList = FileUtil.readLineJsonList(path, classOf[JueJInApi.UserData]).asScala
            val activityRuleList = FileUtil.readJsonList(rulePath, classOf[ActivityRule]).asScala
            val activityConfig = FileUtil.readJson(configPath, classOf[ActivityConfig])
    
            //防止大量调用掘金的接口,记录一下上次处理完的时间,之前的不做处理
            val lastRunTime = activityConfig.getLastRunTime
    
            log.info(s"数据读取完成,长度:${userDataList.size}")
    
            log.info("开始计算 活动文章")
            /**
             * 专栏
             * 1.用户分组
             * 2.保留一天最新的一个快照
             * 3.关注数排序
             */
            val resMap = mutable.Map[Int, ListBuffer[(ActivityRule, ArticleData)]]()
            // 这里是 groupBy values map 是为了拿到一天内每个用户最新的一条数据
            userDataList
                .groupBy(_.getUser_id)
                .values
                .map(userData => (userData.maxBy(_.getTime.toInt)))
                .foreach(userData => {
                    //开始统计
                    userData.getArticle_list.asScala
                        .foreach(articleData => {
                            val ctime = articleData.getArticle_info.getCtime.toLong
                            //首次 或者 最后一次运行-12小时
                            if (lastRunTime == 0 || ctime > (lastRunTime - 60 * 60 * 12)) {
                                log.info(s"拉取文章详情: ${articleData.getArticle_id}")
                                //获取文章信息,匹配是不是活动文章
                                val detail = JueJInApi.getArticleDetail(articleData.getArticle_id)
                                if (detail != null) {
                                    activityRuleList.foreach(activityRule => {
                                        //匹配到活动文章
                                        def isActivity(content: String) {
                                            if (content.contains(activityRule.getKeyword)) {
                                                log.info(s"匹配到活动文章: ${activityRule.getKeyword}")
                                                val listBuffer = resMap.getOrElse(activityRule.getId.toInt, ListBuffer[(ActivityRule, ArticleData)]())
                                                articleData.setAuthor_user_info(detail.getAuthor_user_info)
                                                listBuffer.append((activityRule, articleData))
                                                //添加到最终的结果集
                                                resMap.put(activityRule.getId.toInt, listBuffer)
                                            }
                                        }
                                        //活动结束不在统计
                                        if (activityRule.getEndDate <= yyyyMMddInt) {
                                            //看看是通过标题 还是文章 匹配
                                            if (activityRule.getType == "post") {
                                                isActivity(detail.getArticle_info.getMark_content)
                                            } else if (activityRule.getType == "title") {
                                                isActivity(detail.getArticle_info.getTitle)
                                            }
                                        }
                                    })
                                }
                            }
                        })
                })
            //拿个(用户id,用户信息) 的一个map
            val userDataMap: Map[String, JueJInApi.UserData] = userDataList
                .groupBy(_.getUser_id)
                .values
                .map(userData => (userData.maxBy(_.getTime.toInt)))
                .map(userData => (userData.getUser_id, userData))
                .toMap
    
            //读取上次的结果
            var fileActivityReportList: mutable.Buffer[ActivityReport] = FileUtil.readJsonList(outNowPath, classOf[ActivityReport]).asScala
            log.info("初始化 活动 列表")
            //防止新加活动,所以要把活动初始化到结果的json里
            val ids = fileActivityReportList.map(_.getId)
            activityRuleList.foreach(rule => {
                val id = rule.getId
                if (!ids.contains(id)) {
                    val report = new ActivityReport
                    report.setId(id)
                    report.setUserActivityReportMap(new util.HashMap[String, ActivityReport.UserActivityReport]())
                    fileActivityReportList = fileActivityReportList.+:(report)
                }
                //结束7天的  直接删除
                val end7Date = LocalDate.parse(rule.getEndDate.toString, yyyyMMdd).plusDays(7).format(yyyyMMdd).toInt
                if (yyyyMMddInt > end7Date) {
                    fileActivityReportList = fileActivityReportList.filter(r => r.getId != id)
                }
            })
            //活动匹配规则 转成 (活动id,活动详情)的一个map
            val ruleMap: Map[Integer, ActivityRule] = activityRuleList.map(rule => (rule.getId, rule)).toMap
    
            log.info("转换数据")
            val activityReportList = fileActivityReportList
                .map(activityReport => {
                    val id = activityReport.getId
                    val ruleOption: Option[ActivityRule] = ruleMap.get(id)
                    val rule = ruleOption.get
                    //活动过期后不更新数据
                    if (rule == null || rule.getEndDate <= yyyyMMddInt) {
                        null
                    } else {
                        //封装好用户数据
                        val activityReportMap = activityReport.getUserActivityReportMap
                        val articleIdSet = activityReportMap.asScala.flatMap(a => a._2.getArticleIdSet.asScala).toSet
                        val option = resMap.get(id)
                        if (option.nonEmpty) {
                            option.get.foreach(t => {
                                val value = t._2
                                val article_id = value.getArticle_id
                                if (!articleIdSet.contains(article_id)) {
                                    val user_id = value.getAuthor_user_info.getUser_id
                                    val user_name = value.getAuthor_user_info.getUser_name
                                    var report = activityReportMap.get(user_id)
                                    if (report == null) {
                                        report = new ActivityReport.UserActivityReport()
                                        report.setArticleIdSet(new util.HashSet[String]())
                                        report.setUser_id(user_id)
                                        report.setUser_name(user_name)
                                        report.setCount(0)
                                        report.setSum_digg_count(0)
                                        report.setSum_view_count(0)
                                        report.setSum_collect_count(0)
                                        report.setSum_comment_count(0)
                                    }
                                    val set = report.getArticleIdSet
                                    set.add(article_id)
                                    report.setArticleIdSet(set)
                                    activityReportMap.put(user_id, report)
                                }
                            })
    
                        }
                        activityReport.setUserActivityReportMap(activityReportMap)
                        activityReport
                    }
                })
                //过滤掉空结果
                .filter(a => a != null)
                //生成最终的数据集
                .map(activityReport => {
                    val activityReportMap = activityReport.getUserActivityReportMap.asScala.map(t => {
                        val user_id = t._1
                        val userActivityReport = t._2
                        val set = userActivityReport.getArticleIdSet
                        val maybeUserData = userDataMap.get(user_id)
                        if (maybeUserData.nonEmpty) {
                            val userData = maybeUserData.get
                            val datas = userData.getArticle_list.asScala.filter(article => set.contains(article.getArticle_id))
                            userActivityReport.setCount(datas.size)
                            userActivityReport.setSum_view_count(datas.map(_.getArticle_info.getView_count.toInt).sum)
                            userActivityReport.setSum_digg_count(datas.map(_.getArticle_info.getDigg_count.toInt).sum)
                            userActivityReport.setSum_collect_count(datas.map(_.getArticle_info.getCollect_count.toInt).sum)
                            userActivityReport.setSum_comment_count(datas.map(_.getArticle_info.getComment_count.toInt).sum)
                        }
                        (user_id, userActivityReport)
                    }).toMap.asJava
                    activityReport.setUserActivityReportMap(activityReportMap)
                    activityReport
                })
                .asJava
    
            log.info("保存数据")
            activityConfig.setLastRunTime(runTime)
            FileUtil.writeJson(outPath, activityReportList)
            FileUtil.writeJson(outNowPath, activityReportList)
            FileUtil.writeJson(configPath, activityConfig)
            reportService.updateActivityReport(activityReportList, activityRuleList)
    
        } catch {
            case e: IOException =>
                e.printStackTrace()
        }
        log.info("计算活动文章结束:" + now)
    }
    复制代码
分类:
后端
标签:
分类:
后端
标签:
收藏成功!
已添加到「」, 点击更改