这是我参与「第四届青训营」笔记创作活动的第12天。

项目所用工具介绍

Github 版本控制

管理步骤

fork克隆项目
创建对应分支
修改自己仓库克隆项目对应分支的代码
将自己开发的代码推送至远端对应分支
向 master分支提 Pull requests，避免本地直接向master分支提交合并。
等待原作者操作（是否合并采纳修改）

测试

从git服务器拉取代码克隆到本地：git clone 仓库地址

git clone git@github.com:————/Tiny-flink.git

创建：git branch dev_ds_utils

查看分支：git branch

切换分支：git checkout 分支名，当前编辑分支前有*号。

在某个分支上修改并提交的代码不会影响到其他的分支

推送到远程仓库
初始化：git init
添加本地所有代码：git add .
提交代码描述：git commit -m "增加新的功能"
将本地的代码改动推送到服务器：git push origin dev_ds_utils
- 再次修改上传到远程仓库： git push，仅第一次需要指定仓库名和分支名
云端等待作者进行分支合并（已授权账号可自行合并）

大数据框架

框架版本：

java version "1.8.0_212"
kafka_2.11-0.11.0.0
Hadoop 3.1.3
框架安装目录：/opt

命令行操作

集群启动
- 启动hdfs：./sbin/start-dfs.sh
- 启动zookeeper： bin/zkServer.sh start
- 启动kafka：bin/kafka-server-start.sh -daemon config/server.properties
hdfs操作
- 创建目录：hadoop fs -mkdir -p /test
- -cat：显示文件内容
- 从本地上传到hdfs：hadoop fs -put 1.txt /
- 从 HDFS 下载到本地：hadoop fs -get /1.txt ./
- -rm：删除文件或文件夹
- -rmdir：删除空目录
kafka操作
- 启动服务：bin/kafka-server-start.sh -daemon config/server.properties
- 查看所以主题：bin/kafka-topics.sh --zookeeper hadoop1:2181 --list
- 新建主题：bin/kafka-topics.sh --zookeeper hadoop1:2181 --create --replication-factor 2 --partitions 1 --topic first
- 生产数据：bin/kafka-console-producer.sh --broker-list hadoop1:9092 --topic first
- 消费数据：bin/kafka-console-consumer.sh --zookeeper hadoop1:2181 --topic first

Utils类

用Java实现kafka和hdfs常用的的API操作作为工具类（轮子），供其他类调用，使代码简化。

kafka

import org.apache.kafka.clients.producer.*;
import java.util.concurrent.ExecutionException;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.clients.producer.KafkaProducer;

import java.util.Collections;
import java.util.Properties;

/**
 * @Description kafka工具类，提供消息发送与监听
 * @Author ws
 * @Date 2022/08/11
 */

public class KafkaUtil {

    static String bootstrapservers = "***:9092";
    public static  void kafkaproducer(String topic,String msg)throws ExecutionException,InterruptedException {
        Properties props = new Properties();
        //kafka 集群，broker-list
        props.put("bootstrap.servers", bootstrapservers);
        //Key,Value的序列化类
        props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        //创建生产者对象
        Producer<String,String>	producer = new KafkaProducer<>(props);
        //发送数据
        producer.send(new ProducerRecord<String, String>("first",msg));
        //关闭资源
        producer.close();
    }
    public static void kafkaconsumer()throws ExecutionException,InterruptedException {
        //1.创建消费者配置信息
        Properties properties = new Properties();
        //链接的集群
        properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,bootstrapservers);
        //开启自动提交
        properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,true);
        //自动提交的延迟
        properties.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG,"1000");
        //key,value的反序列化
        properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringDeserializer");
        properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringDeserializer");
        //消费者组
        properties.put(ConsumerConfig.GROUP_ID_CONFIG,"test-consumer-group");
        properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"latest");//参数2：latest最新、earliest也返回历史数据
        //创建生产者
        KafkaConsumer<String,String> consumer = new KafkaConsumer<>(properties);
        consumer.subscribe(Collections.singletonList("first")); 
        while (true) {
            //获取数据
            ConsumerRecords<String, String> consumerRecords = consumer.poll(100);
            //解析并打印consumerRecords
            for (ConsumerRecord consumerRecord : consumerRecords) {
                System.out.println(consumerRecord.key() + "----" + consumerRecord.value());
            }
        }
    }
}

hdfs

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.io.IOUtils;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.net.URI;
public class HdfsUtil {
    Configuration configuration = null;
    FileSystem fileSystem = null;
    public static final String HDFS_PATH = "hdfs://39.99.245.209:9000";

    /**
     * 上传文件到HDFS
     * @throws Exception
     */
    public void copyFromLocalFile(String loacalPath,String hdfsPath) throws Exception {
        System.out.println("开始建立与HDFS的连接");
        configuration = new Configuration();
        fileSystem = FileSystem.get(new URI(HDFS_PATH),configuration, "hadoop");
        Path loacalPath1 = new Path(loacalPath);
        Path hdfsPath1 = new Path(hdfsPath);
        fileSystem.copyFromLocalFile(loacalPath1, hdfsPath1);
        configuration = null;
        fileSystem = null;
        System.out.println("关闭与HDFS的连接");
    }

    /**
     * 从HDFS下载文件到本地
     * @throws Exception
     */
    public void copyToLocalFile(String hdfsPath,String loacalPath) throws Exception {
        // 1 获取文件系统
        Configuration configuration = new Configuration();
        FileSystem fs = FileSystem.get(new URI(HDFS_PATH), configuration, "hadoop");

        // 2 获取输入流
        FSDataInputStream fis = fs.open(new Path(hdfsPath));
        // 3 获取输出流
        FileOutputStream fos = new FileOutputStream(new File(loacalPath));
        // 4 流的对拷
        IOUtils.copyBytes(fis, fos, configuration);
        // 5 关闭资源
        IOUtils.closeStream(fos);
        IOUtils.closeStream(fis);
        fs.close();
    }

}

项目所用工具介绍 | 青训营

项目所用工具介绍

Github 版本控制

管理步骤

测试

大数据框架

框架版本：

命令行操作

Utils类

kafka

hdfs