基于Java得黑马头条项目------day10

55 阅读1分钟

day10_数据保存&排重&文档解析

1 数据保存准备

1.1 ip代理池

1.1.1 需求分析 针对于ip代理池的管理,包括了增删改查,设置可用ip和不可用ip

下面是关于IP代理池的增删改查

检测IP是否存在的接口

  @Override
    public boolean checkExist(String host, int port) {
        ClIpPool clIpPool = new ClIpPool();
        clIpPool.setIp(host);
        clIpPool.setPort(port);
        List<ClIpPool> clIpPools = clIpPoolMapper.selectList(clIpPool);
        if(null!=clIpPools && !clIpPools.isEmpty()){
            return true;
        }
        return false;
    }

IP代理池的测试

  @Test
    public void testSaveCrawlerIpPool(){
        ClIpPool clIpPool = new ClIpPool();
        clIpPool.setIp("2222.3333.444.5555");
        clIpPool.setPort(1111);
        clIpPool.setEnable(true);
        clIpPool.setCreatedTime(new Date());
        crawlerIpPoolService.saveCrawlerIpPool(clIpPool);
    }

测试通过

保存IP成功

接口测试

  public void testQueryList(){
        ClNewsAdditional clNewsAdditional = new ClNewsAdditional();
        clNewsAdditional.setUrl("https://blog.csdn.net/weixin_43976602/article/details/96971651");
        List<ClNewsAdditional> clNewsAdditionals = crawlerNewsAdditionalService.queryList(clNewsAdditional);
        System.out.println(clNewsAdditionals);
    }

 

测试通过

 

1.3 爬虫文章图文评论信息表

 

1.3.1 需求分析 保存文章的评论信息

  @Override
    public void saveClNewsComment(ClNewsComment clNewsComment) {
        clNewsCommentMapper.insertSelective(clNewsComment);
    }

1.4 爬虫文章‘

1.4.1 需求分析 文章的增删改查


    @Override
    public void saveNews(ClNews clNews) {
        clNewsMapper.insertSelective(clNews);
    }

    @Override
    public void updateNews(ClNews clNews) {
        clNewsMapper.updateByPrimaryKey(clNews);
    }

    @Override
    public void deleteByUrl(String url) {
        clNewsMapper.deleteByUrl(url);
    }

    @Override
    public List<ClNews> queryList(ClNews clNews) {
        return clNewsMapper.selectList(clNews);
    }

2 排重 ,集成Reds