大数据集群学习(7):Springboot+Hadoop,操作HDFS

751 阅读4分钟

正常情况下与HDFS交互,一般都是在Linux系统上输入hdfs命令进行文件,目录的交互,这些步骤会略微繁杂,本机文件想要上传到HDFS系统,就需要先上传到搭建了Hadoop的机子里,然后再通过命令,上传到HDFS里面

通过后端编写接口,操作HDFS分布式文件系统,进行基础操作,直接本机与HDFS进行目录文件的交互

创建项目

在IDEA里新建Maven项目
然后输入以下内容 image.png

创建一些必须的目录
项目新建之后是缺少许多东西的创建src.main.java目录以及创建src.main.resources目录

image.png

如图所示,java目录下创建以上目录,在resources目录下创建这些目录和文件,yml是springboot的配置文件,log4j是日志文件

配置项目

添加依赖 在pom.xml文件里加入以下内容添加依赖

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.example</groupId>
    <artifactId>big-data</artifactId>
    <version>v1.0</version>

    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.3.5.RELEASE</version>
    </parent>

    <properties>
        <commons.version>2.6</commons.version>
        <java.version>1.8</java.version>
        <hadoop.version>3.3.1</hadoop.version>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter</artifactId>
        </dependency>
        <!--集成spring mvc框架并实现自动配置 -->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
        <!-- lombok 依赖 -->
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
        </dependency>
        <!--提供了一系列功能强大且灵活的 API,用来在 Java 对象和 JSON 数据之间进行转换-->
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-core</artifactId>
        </dependency>
        <!--用于实现 JSON 和 Java 对象之间的相互转换-->
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-databind</artifactId>
        </dependency>
        <!--指定了项目对于 Jackson Mapper ASL 库的依赖,版本号为1.9.13-->
        <dependency>
            <groupId>org.codehaus.jackson</groupId>
            <artifactId>jackson-mapper-asl</artifactId>
            <version>1.9.13</version>
        </dependency>
        <!--高性能的JSON处理器/生成器-->
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.9</version>
        </dependency>

        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <scope>runtime</scope>
        </dependency>
        <dependency>
            <groupId>com.baomidou</groupId>
            <artifactId>mybatis-plus-boot-starter</artifactId>
            <version>3.5.1</version>
        </dependency>


        <!--IO工具类和方法。-->
        <dependency>
            <groupId>commons-io</groupId>
            <artifactId>commons-io</artifactId>
            <version>${commons.version}</version>
        </dependency>
        <!--通用类和工具方法-->
        <dependency>
            <groupId>commons-lang</groupId>
            <artifactId>commons-lang</artifactId>
            <version>${commons.version}</version>
        </dependency>

        <!--Spring Boot应用程序验证所需的依赖-->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-validation</artifactId>
        </dependency>
        <!--JUnit是Java语言的单元测试框架,用于编写和运行重复性测试-->
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <scope>test</scope>
        </dependency>

        <!-- hadoop依赖 -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>${hadoop.version}</version>
            <exclusions>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-log4j12</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>javax.servlet</groupId>
                    <artifactId>servlet-api</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>${hadoop.version}</version>
            <exclusions>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-log4j12</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>javax.servlet</groupId>
                    <artifactId>servlet-api</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>${hadoop.version}</version>
            <exclusions>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-log4j12</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>javax.servlet</groupId>
                    <artifactId>servlet-api</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

    </dependencies>

    <build>
        <plugins>

            <!--指定JDK编译版本 -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                    <encoding>UTF-8</encoding>
                </configuration>
            </plugin>

            <!-- 打包跳过测试 -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <configuration>
                    <skipTests>true</skipTests>
                </configuration>
            </plugin>

            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
                <configuration>
                    <!--用于将新建lib包打包至项目-->
                    <includeSystemScope>true</includeSystemScope>
                    <mainClass>com.hdx.biDataApi.BiDataApiApplication</mainClass>
                    <excludes>
                        <exclude>
                            <groupId>org.projectlombok</groupId>
                            <artifactId>lombok</artifactId>
                        </exclude>
                    </excludes>
                </configuration>
            </plugin>

        </plugins>
    </build>
</project>

添加完后重新加载一下,点击打开右边Maven选项,里面有个刷新的按钮,点击之后IDEA会自动根据配置自动下载所必须的相关依赖,等待下载完就可以进行下一步...

log4j文件
在log4j.properties文件里加入以下内容

log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n
log4j.appender.logfile=org.apache.log4j.FileAppender
log4j.appender.logfile.File=target/spring.log
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout
log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n

加入了这些配置,就可以消除跑程序时候一直报红色得这些信息(主要是看的不顺眼) bf47c67577d4b50624f1382c53717e3.png

application.yml
项目运行的时候会读取这个配置文件,加入以下内容

server:
  port: 8888

hadoop.name-node: hdfs://192.168.xx.xxx:9000

spring:
  datasource:
    url: jdbc:mysql://192.168.xx.xxx:3306/test?useUnicode=true&characterEncoding=utf-8&useSSL=false&serverTimezone=UTC&allowMultiQueries=true
    username: root
    password: admin
    driver-class-name: com.mysql.cj.jdbc.Driver
    type: com.zaxxer.hikari.HikariDataSource
    hikari:
      connect-test-query: SELECT 1
      connection-timeout: 30000 #等待连接池配置连接的最大时长(ms),超过该时长未连接则抛出SQLException,默认值30秒
      minimum-idle: 1 #最小连接数
      maximum-pool-size: 20 #最大连接数
      auto-commit: true #自动提交
      idle-timeout: 60000 #连接超时最大时长(ms),超时则被释放(retired),默认值10分钟
      pool-name: DataSourceHikariPool
      max-lifetime: 1800000 #连接的生命时长(ms),超时而且没有被使用则被释放(retired),默认值30分钟

#mybatis-plus:
#  type-aliases-package: com.hdx.bigData.hadoop.entity
#  configuration:
#    log-impl: org.apache.ibatis.logging.stdout.StdOutImpl
#  mapper-locations: classpath:mybatis/*.xml

port是跑程序时候所用到的端口,这个可以自定义
hadoop.name-node:hdfs//xxx.xxx.xxx.xxx:9000 这个是与hdfs通信的地址
spring:datasource 加入数据源配置,这个配置并不是用于源文件与hdfs交互,只是不加这个启动程序时会报错,配置mysql的地址: url: jdbc:mysql://192.168.xx.xxx:3306/test? 这里test是指mysql里test这个库

配置环境变量
程序在本地运行测试的时候,需要有hadoop的环境,windows环境下运行就需要配置一下hadoop的环境变量,这个时候需要下载一个winutils文件,作用是模拟linux环境
我用的是hadoop3.3.1,winutils下载地址:github.com/robguilarr/…
将用于linux环境的hadoop包在windows环境解压(包的下载详情请看 大数据集群学习(5):Hadoop的搭建与配置
解压之后,将下载的winutils文件bin目录下的所有内容,复制到刚刚解压的hadoop bin目录下。
编辑系统环境变量,path里新增

%HADOOP_HOME%\bin
%HADOOP_HOME%\sbin

然后新建个变量,变量名HADOOP_HOME,变量值填入hadoop的解压地址就可以了
完成这一步后需要重启电脑(我就是没重启电脑跑程序报错,重启了就没事了)

代码部分

程序的启动入口:
Application

package com.example.bigData;

import lombok.extern.slf4j.Slf4j;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.ConfigurableApplicationContext;
import org.springframework.transaction.annotation.EnableTransactionManagement;

@Slf4j
@SpringBootApplication
@EnableTransactionManagement
public class Application {

    public static void main(String[] args) {
        ConfigurableApplicationContext run = SpringApplication.run(Application.class, args);
    }

}

HadoopConfig

package com.example.bigData.hadoop.config;

import lombok.extern.slf4j.Slf4j;
import org.apache.hadoop.fs.FileSystem;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

import java.net.URI;

@Configuration
@Slf4j
public class HadoopConfig {

    @Value("${hadoop.name-node}")
    private String nameNode;

    private String userName;

    /**
    * @Description: Configuration conf=new Configuration();
     *  创建一个Configuration对象时,其构造方法会默认加载hadoop中的两个配置文件,
     *  分别是hdfs-site.xml以及core-site.xml,这两个文件中会有访问hdfs所需的参数值,
     *  主要是fs.default.name,指定了hdfs的地址,有了这个地址客户端就可以通过这个地址访问hdfs了。
     *  即可理解为configuration就是hadoop中的配置信息。
    * @date: 2023/11/13 15:31
    * @author: xxx
    */
    @Bean("fileSystem")
    public FileSystem createFs() throws Exception{
        //读取配置文件
        org.apache.hadoop.conf.Configuration conf = new org.apache.hadoop.conf.Configuration();
        conf.set("fs.defaultFS",nameNode);
        conf.set("dfs.replication","1");
        FileSystem fs = null;
        //指定访问hdfs的客户端身份
        //文件系统
        //返回指定的文件系统,如果在本地测试,需要使用此种方法获取文件系统
        try{
            URI uri = new URI(nameNode.trim());
            fs = FileSystem.get(uri, conf,"root");
        }catch (Exception e){
            log.error("",e);
        }
        System.out.println("fs.defaultFS:"+conf.get("fs.defaultFS"));
        return fs;
    }

}

fs = FileSystem.get(uri, conf,"root");
这里定义了是用root用户来执行,用到fileSystem这个才会以root用户执行,否则就会因为权限不够而报错!!!

//读取配置文件中hdfs地址
@Value("${hadoop.name-node}")

创建目录

在Service包下写interface接口,然后Impl包下重写方法
HadoopCreateDirService

package com.example.bigData.hadoop.Service;

public interface HadoopCreateDirService {

    Boolean createDir(String directory);

}

HadoopCreateDirServiceImpl

package com.example.bigData.hadoop.Service.Impl;

import com.example.bigData.hadoop.Service.HadoopCreateDirService;
import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Service;

import java.io.IOException;

@Service
public class HadoopCreateDirServiceImpl implements HadoopCreateDirService {

    @Autowired
    private FileSystem fileSystem;

//    @Value("${hadoop.name-node}")
//    private String hdfsUrl;

    @Override
    public Boolean createDir(String srcDirectory) {
        Boolean flagStatus = false;
        if (StringUtils.isEmpty(srcDirectory)){
            throw new IllegalArgumentException("创建目录不能为空");
        }
        try{
            Path dirPath = new Path(srcDirectory);
            if (!fileSystem.exists(dirPath)){
                fileSystem.mkdirs(dirPath);
            }else{
                System.out.println("目录已存在");
            }
            if (fileSystem.isDirectory(dirPath)){
                flagStatus = true;
            }
        } catch (IOException e) {
            System.out.println(e);
        }
        return flagStatus;
    }
}

HadoopController

/**
* @Description: 创建目录
* @date: 2023/11/14 18:10
* @author: xxx
*/
@Resource
private HadoopCreateDirServiceImpl createDir;

@RequestMapping("/createDir")
public String createDir(@RequestParam String srcDirectory){
    createDir.createDir(srcDirectory);
    return "upload";
}

srcDirectory:创建的目录

文件上传

HadoopUploadService

package com.example.bigData.hadoop.Service;

public interface HadoopUploadService {

    void uploadFile(String srcFile,String destFile);

}

HadoopUploadServiceImpl

package com.example.bigData.hadoop.Service.Impl;

import com.example.bigData.hadoop.Service.HadoopUploadService;
import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Service;

import javax.annotation.Resource;
import java.io.IOException;

@Service
public class HadoopUploadServiceImpl implements HadoopUploadService {

    @Autowired
    private FileSystem fileSystem;

    @Value("${hadoop.name-node}")
    private String hdfsUrl;

    @Override
    public void uploadFile(String srcFile,String destFile) {
        //源文件路径
        Path srcPath = new Path(srcFile);
        //目的路径
        if (StringUtils.isNotBlank(hdfsUrl)){
            destFile = hdfsUrl + destFile;
        }
        Path dstPath = new Path(destFile);
        //文件上传
        try {
            fileSystem.copyFromLocalFile(false,true,srcPath,dstPath);
        }catch (IOException e){
            System.out.println(e);
        }
    }

}

HadoopController

@Resource
private HadoopUploadServiceImpl uploadFile;

@RequestMapping("/upload")
public String upload(@RequestParam String srcFile,String dstPath){
    uploadFile.uploadFile(srcFile,dstPath);
    return "upload";
}

srcFile:源文件
dstPath:目的路径

下载文件

HadoopDownloadFileService

package com.example.bigData.hadoop.Service;

public interface HadoopDownloadFileService {

    void getFile(String srcFile,String destPath);

}

HadoopDownloadFileServiceImpl

package com.example.bigData.hadoop.Service.Impl;

import com.example.bigData.hadoop.Service.HadoopDownloadFileService;
import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Service;

import javax.annotation.Resource;
import java.io.IOException;

@Service
public class HadoopDownloadFileServiceImpl implements HadoopDownloadFileService {

    @Resource
    private FileSystem fileSystem;

    @Value("${hadoop.name-node}")
    private String hdfsUrl;


    @Override
    public void getFile(String srcFile, String destPath) {
        if (StringUtils.isNotBlank(hdfsUrl)){
            srcFile = hdfsUrl + srcFile;
        }
        Path srcPath = new Path(srcFile);
        Path dstPath = new Path(destPath);
        try{
            fileSystem.copyToLocalFile(srcPath,dstPath);
        }catch (IOException e ){
            System.out.println(e);
        }
    }
}

HadoopController

@Resource
private HadoopDownloadFileServiceImpl downloadFile;

@RequestMapping("/downloadFile")
public void getFile(@RequestParam String srcFile,String destPath){
    downloadFile.getFile(srcFile,destPath);
}

srcFile:想下载的文件
destPath:下载的目的路径

删除目录、文件

HadoopRemoveDirService

package com.example.bigData.hadoop.Service;

public interface HadoopRemoveDirService {

    void rmDir(String delPath);

}

HadoopRemoveDirServiceImpl

package com.example.bigData.hadoop.Service.Impl;

import com.example.bigData.hadoop.Service.HadoopRemoveDirService;
import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Service;

import javax.annotation.Resource;
import java.io.IOException;

@Service
public class HadoopRemoveDirServiceImpl implements HadoopRemoveDirService {

    @Resource
    FileSystem fileSystem;

    @Value("${hadoop.name-node}")
    private String hdfsUrl;

    @Override
    public void rmDir(String delPath) {
        try{
            if (StringUtils.isNotBlank(hdfsUrl)){
                delPath = hdfsUrl + delPath;
            }
            fileSystem.delete(new Path(delPath),true);
        }catch (IllegalArgumentException|IOException e){
            System.out.println(e);
        }
    }
}

HadoopController

@Resource
private HadoopRemoveDirServiceImpl removeDir;

@RequestMapping("/removeDir")
public void remove(@RequestParam String delPath){
    removeDir.rmDir(delPath);
}

delPath:删除的目录/文件

完整的HadoopController

package com.example.bigData.hadoop.controller;

import com.example.bigData.hadoop.Service.Impl.HadoopCreateDirServiceImpl;
import com.example.bigData.hadoop.Service.Impl.HadoopDownloadFileServiceImpl;
import com.example.bigData.hadoop.Service.Impl.HadoopRemoveDirServiceImpl;
import com.example.bigData.hadoop.Service.Impl.HadoopUploadServiceImpl;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;

import javax.annotation.Resource;

@RequestMapping("/hdfs")
@RestController
public class HadoopController {

    /**
    * @Description: 上传文件
    * @date: 2023/11/14 18:10
    * @author: xxx
    */
    @Resource
    private HadoopUploadServiceImpl uploadFile;

    @RequestMapping("/upload")
    public String upload(@RequestParam String srcFile,String dstPath){
        uploadFile.uploadFile(srcFile,dstPath);
        return "upload";
    }

    /**
    * @Description: 创建目录
    * @date: 2023/11/14 18:10
    * @author: xxx
    */
    @Resource
    private HadoopCreateDirServiceImpl createDir;

    @RequestMapping("/createDir")
    public String createDir(@RequestParam String srcDirectory){
        createDir.createDir(srcDirectory);
        return "upload";
    }

    /**
    * @Description: 删除目录
    * @date: 2023/11/15 18:15
    * @author: xxx
    */
    @Resource
    private HadoopRemoveDirServiceImpl removeDir;

    @RequestMapping("/removeDir")
    public void remove(@RequestParam String delPath){
        removeDir.rmDir(delPath);
    }

    /**
    * @Description: 下载文件
    * @date: 2023/11/16 9:42
    * @author: xxx
    */

    @Resource
    private HadoopDownloadFileServiceImpl downloadFile;

    @RequestMapping("/downloadFile")
    public void getFile(@RequestParam String srcFile,String destPath){
        downloadFile.getFile(srcFile,destPath);
    }
}

测试

运行程序,运行程序前要确保Hadoop是启动起来了,网页输入ip+9870进入HDFS页面
在IDEA里,点击这个图标,就可以测试调用接口

image.png

创建目录测试
image.png
创建成功后,在页面刷新下可以看到创建的/test目录
image.png

上传文件测试
在本地创建一个txt文件,将地址复制给参数 image.png
上传成功
image.png

下载文件
本地新建了个文件夹,等下下载的文件放到这里面 image.png
下载成功 image.png

删除文件
删除HDFS刚刚上传的1.txt文件 image.png
删除成功
image.png

以上过程只是一个简单的案例
过程若有不对,欢迎指出😀