正常情况下与HDFS交互,一般都是在Linux系统上输入hdfs命令进行文件,目录的交互,这些步骤会略微繁杂,本机文件想要上传到HDFS系统,就需要先上传到搭建了Hadoop的机子里,然后再通过命令,上传到HDFS里面
通过后端编写接口,操作HDFS分布式文件系统,进行基础操作,直接本机与HDFS进行目录文件的交互
创建项目
在IDEA里新建Maven项目
然后输入以下内容
创建一些必须的目录
项目新建之后是缺少许多东西的创建src.main.java目录以及创建src.main.resources目录
如图所示,java目录下创建以上目录,在resources目录下创建这些目录和文件,yml是springboot的配置文件,log4j是日志文件
配置项目
添加依赖 在pom.xml文件里加入以下内容添加依赖
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.example</groupId>
<artifactId>big-data</artifactId>
<version>v1.0</version>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.3.5.RELEASE</version>
</parent>
<properties>
<commons.version>2.6</commons.version>
<java.version>1.8</java.version>
<hadoop.version>3.3.1</hadoop.version>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter</artifactId>
</dependency>
<!--集成spring mvc框架并实现自动配置 -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- lombok 依赖 -->
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
</dependency>
<!--提供了一系列功能强大且灵活的 API,用来在 Java 对象和 JSON 数据之间进行转换-->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
</dependency>
<!--用于实现 JSON 和 Java 对象之间的相互转换-->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
</dependency>
<!--指定了项目对于 Jackson Mapper ASL 库的依赖,版本号为1.9.13-->
<dependency>
<groupId>org.codehaus.jackson</groupId>
<artifactId>jackson-mapper-asl</artifactId>
<version>1.9.13</version>
</dependency>
<!--高性能的JSON处理器/生成器-->
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.9</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>com.baomidou</groupId>
<artifactId>mybatis-plus-boot-starter</artifactId>
<version>3.5.1</version>
</dependency>
<!--IO工具类和方法。-->
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>${commons.version}</version>
</dependency>
<!--通用类和工具方法-->
<dependency>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
<version>${commons.version}</version>
</dependency>
<!--Spring Boot应用程序验证所需的依赖-->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-validation</artifactId>
</dependency>
<!--JUnit是Java语言的单元测试框架,用于编写和运行重复性测试-->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<scope>test</scope>
</dependency>
<!-- hadoop依赖 -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>javax.servlet</groupId>
<artifactId>servlet-api</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>javax.servlet</groupId>
<artifactId>servlet-api</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>${hadoop.version}</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>javax.servlet</groupId>
<artifactId>servlet-api</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
<build>
<plugins>
<!--指定JDK编译版本 -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.8</source>
<target>1.8</target>
<encoding>UTF-8</encoding>
</configuration>
</plugin>
<!-- 打包跳过测试 -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<configuration>
<skipTests>true</skipTests>
</configuration>
</plugin>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<configuration>
<!--用于将新建lib包打包至项目-->
<includeSystemScope>true</includeSystemScope>
<mainClass>com.hdx.biDataApi.BiDataApiApplication</mainClass>
<excludes>
<exclude>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
</exclude>
</excludes>
</configuration>
</plugin>
</plugins>
</build>
</project>
添加完后重新加载一下,点击打开右边Maven选项,里面有个刷新的按钮,点击之后IDEA会自动根据配置自动下载所必须的相关依赖,等待下载完就可以进行下一步...
log4j文件
在log4j.properties文件里加入以下内容
log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n
log4j.appender.logfile=org.apache.log4j.FileAppender
log4j.appender.logfile.File=target/spring.log
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout
log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n
加入了这些配置,就可以消除跑程序时候一直报红色得这些信息(主要是看的不顺眼)
application.yml
项目运行的时候会读取这个配置文件,加入以下内容
server:
port: 8888
hadoop.name-node: hdfs://192.168.xx.xxx:9000
spring:
datasource:
url: jdbc:mysql://192.168.xx.xxx:3306/test?useUnicode=true&characterEncoding=utf-8&useSSL=false&serverTimezone=UTC&allowMultiQueries=true
username: root
password: admin
driver-class-name: com.mysql.cj.jdbc.Driver
type: com.zaxxer.hikari.HikariDataSource
hikari:
connect-test-query: SELECT 1
connection-timeout: 30000 #等待连接池配置连接的最大时长(ms),超过该时长未连接则抛出SQLException,默认值30秒
minimum-idle: 1 #最小连接数
maximum-pool-size: 20 #最大连接数
auto-commit: true #自动提交
idle-timeout: 60000 #连接超时最大时长(ms),超时则被释放(retired),默认值10分钟
pool-name: DataSourceHikariPool
max-lifetime: 1800000 #连接的生命时长(ms),超时而且没有被使用则被释放(retired),默认值30分钟
#mybatis-plus:
# type-aliases-package: com.hdx.bigData.hadoop.entity
# configuration:
# log-impl: org.apache.ibatis.logging.stdout.StdOutImpl
# mapper-locations: classpath:mybatis/*.xml
port是跑程序时候所用到的端口,这个可以自定义
hadoop.name-node:hdfs//xxx.xxx.xxx.xxx:9000 这个是与hdfs通信的地址
spring:datasource 加入数据源配置,这个配置并不是用于源文件与hdfs交互,只是不加这个启动程序时会报错,配置mysql的地址: url: jdbc:mysql://192.168.xx.xxx:3306/test? 这里test是指mysql里test这个库
配置环境变量
程序在本地运行测试的时候,需要有hadoop的环境,windows环境下运行就需要配置一下hadoop的环境变量,这个时候需要下载一个winutils文件,作用是模拟linux环境
我用的是hadoop3.3.1,winutils下载地址:github.com/robguilarr/…
将用于linux环境的hadoop包在windows环境解压(包的下载详情请看 大数据集群学习(5):Hadoop的搭建与配置)
解压之后,将下载的winutils文件bin目录下的所有内容,复制到刚刚解压的hadoop bin目录下。
编辑系统环境变量,path里新增
%HADOOP_HOME%\bin
%HADOOP_HOME%\sbin
然后新建个变量,变量名HADOOP_HOME,变量值填入hadoop的解压地址就可以了
完成这一步后需要重启电脑(我就是没重启电脑跑程序报错,重启了就没事了)
代码部分
程序的启动入口:
Application
package com.example.bigData;
import lombok.extern.slf4j.Slf4j;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.ConfigurableApplicationContext;
import org.springframework.transaction.annotation.EnableTransactionManagement;
@Slf4j
@SpringBootApplication
@EnableTransactionManagement
public class Application {
public static void main(String[] args) {
ConfigurableApplicationContext run = SpringApplication.run(Application.class, args);
}
}
HadoopConfig
package com.example.bigData.hadoop.config;
import lombok.extern.slf4j.Slf4j;
import org.apache.hadoop.fs.FileSystem;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import java.net.URI;
@Configuration
@Slf4j
public class HadoopConfig {
@Value("${hadoop.name-node}")
private String nameNode;
private String userName;
/**
* @Description: Configuration conf=new Configuration();
* 创建一个Configuration对象时,其构造方法会默认加载hadoop中的两个配置文件,
* 分别是hdfs-site.xml以及core-site.xml,这两个文件中会有访问hdfs所需的参数值,
* 主要是fs.default.name,指定了hdfs的地址,有了这个地址客户端就可以通过这个地址访问hdfs了。
* 即可理解为configuration就是hadoop中的配置信息。
* @date: 2023/11/13 15:31
* @author: xxx
*/
@Bean("fileSystem")
public FileSystem createFs() throws Exception{
//读取配置文件
org.apache.hadoop.conf.Configuration conf = new org.apache.hadoop.conf.Configuration();
conf.set("fs.defaultFS",nameNode);
conf.set("dfs.replication","1");
FileSystem fs = null;
//指定访问hdfs的客户端身份
//文件系统
//返回指定的文件系统,如果在本地测试,需要使用此种方法获取文件系统
try{
URI uri = new URI(nameNode.trim());
fs = FileSystem.get(uri, conf,"root");
}catch (Exception e){
log.error("",e);
}
System.out.println("fs.defaultFS:"+conf.get("fs.defaultFS"));
return fs;
}
}
fs = FileSystem.get(uri, conf,"root");
这里定义了是用root用户来执行,用到fileSystem这个才会以root用户执行,否则就会因为权限不够而报错!!!
//读取配置文件中hdfs地址
@Value("${hadoop.name-node}")
创建目录
在Service包下写interface接口,然后Impl包下重写方法
HadoopCreateDirService
package com.example.bigData.hadoop.Service;
public interface HadoopCreateDirService {
Boolean createDir(String directory);
}
HadoopCreateDirServiceImpl
package com.example.bigData.hadoop.Service.Impl;
import com.example.bigData.hadoop.Service.HadoopCreateDirService;
import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Service;
import java.io.IOException;
@Service
public class HadoopCreateDirServiceImpl implements HadoopCreateDirService {
@Autowired
private FileSystem fileSystem;
// @Value("${hadoop.name-node}")
// private String hdfsUrl;
@Override
public Boolean createDir(String srcDirectory) {
Boolean flagStatus = false;
if (StringUtils.isEmpty(srcDirectory)){
throw new IllegalArgumentException("创建目录不能为空");
}
try{
Path dirPath = new Path(srcDirectory);
if (!fileSystem.exists(dirPath)){
fileSystem.mkdirs(dirPath);
}else{
System.out.println("目录已存在");
}
if (fileSystem.isDirectory(dirPath)){
flagStatus = true;
}
} catch (IOException e) {
System.out.println(e);
}
return flagStatus;
}
}
HadoopController
/**
* @Description: 创建目录
* @date: 2023/11/14 18:10
* @author: xxx
*/
@Resource
private HadoopCreateDirServiceImpl createDir;
@RequestMapping("/createDir")
public String createDir(@RequestParam String srcDirectory){
createDir.createDir(srcDirectory);
return "upload";
}
srcDirectory:创建的目录
文件上传
HadoopUploadService
package com.example.bigData.hadoop.Service;
public interface HadoopUploadService {
void uploadFile(String srcFile,String destFile);
}
HadoopUploadServiceImpl
package com.example.bigData.hadoop.Service.Impl;
import com.example.bigData.hadoop.Service.HadoopUploadService;
import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Service;
import javax.annotation.Resource;
import java.io.IOException;
@Service
public class HadoopUploadServiceImpl implements HadoopUploadService {
@Autowired
private FileSystem fileSystem;
@Value("${hadoop.name-node}")
private String hdfsUrl;
@Override
public void uploadFile(String srcFile,String destFile) {
//源文件路径
Path srcPath = new Path(srcFile);
//目的路径
if (StringUtils.isNotBlank(hdfsUrl)){
destFile = hdfsUrl + destFile;
}
Path dstPath = new Path(destFile);
//文件上传
try {
fileSystem.copyFromLocalFile(false,true,srcPath,dstPath);
}catch (IOException e){
System.out.println(e);
}
}
}
HadoopController
@Resource
private HadoopUploadServiceImpl uploadFile;
@RequestMapping("/upload")
public String upload(@RequestParam String srcFile,String dstPath){
uploadFile.uploadFile(srcFile,dstPath);
return "upload";
}
srcFile:源文件
dstPath:目的路径
下载文件
HadoopDownloadFileService
package com.example.bigData.hadoop.Service;
public interface HadoopDownloadFileService {
void getFile(String srcFile,String destPath);
}
HadoopDownloadFileServiceImpl
package com.example.bigData.hadoop.Service.Impl;
import com.example.bigData.hadoop.Service.HadoopDownloadFileService;
import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Service;
import javax.annotation.Resource;
import java.io.IOException;
@Service
public class HadoopDownloadFileServiceImpl implements HadoopDownloadFileService {
@Resource
private FileSystem fileSystem;
@Value("${hadoop.name-node}")
private String hdfsUrl;
@Override
public void getFile(String srcFile, String destPath) {
if (StringUtils.isNotBlank(hdfsUrl)){
srcFile = hdfsUrl + srcFile;
}
Path srcPath = new Path(srcFile);
Path dstPath = new Path(destPath);
try{
fileSystem.copyToLocalFile(srcPath,dstPath);
}catch (IOException e ){
System.out.println(e);
}
}
}
HadoopController
@Resource
private HadoopDownloadFileServiceImpl downloadFile;
@RequestMapping("/downloadFile")
public void getFile(@RequestParam String srcFile,String destPath){
downloadFile.getFile(srcFile,destPath);
}
srcFile:想下载的文件
destPath:下载的目的路径
删除目录、文件
HadoopRemoveDirService
package com.example.bigData.hadoop.Service;
public interface HadoopRemoveDirService {
void rmDir(String delPath);
}
HadoopRemoveDirServiceImpl
package com.example.bigData.hadoop.Service.Impl;
import com.example.bigData.hadoop.Service.HadoopRemoveDirService;
import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Service;
import javax.annotation.Resource;
import java.io.IOException;
@Service
public class HadoopRemoveDirServiceImpl implements HadoopRemoveDirService {
@Resource
FileSystem fileSystem;
@Value("${hadoop.name-node}")
private String hdfsUrl;
@Override
public void rmDir(String delPath) {
try{
if (StringUtils.isNotBlank(hdfsUrl)){
delPath = hdfsUrl + delPath;
}
fileSystem.delete(new Path(delPath),true);
}catch (IllegalArgumentException|IOException e){
System.out.println(e);
}
}
}
HadoopController
@Resource
private HadoopRemoveDirServiceImpl removeDir;
@RequestMapping("/removeDir")
public void remove(@RequestParam String delPath){
removeDir.rmDir(delPath);
}
delPath:删除的目录/文件
完整的HadoopController
package com.example.bigData.hadoop.controller;
import com.example.bigData.hadoop.Service.Impl.HadoopCreateDirServiceImpl;
import com.example.bigData.hadoop.Service.Impl.HadoopDownloadFileServiceImpl;
import com.example.bigData.hadoop.Service.Impl.HadoopRemoveDirServiceImpl;
import com.example.bigData.hadoop.Service.Impl.HadoopUploadServiceImpl;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import javax.annotation.Resource;
@RequestMapping("/hdfs")
@RestController
public class HadoopController {
/**
* @Description: 上传文件
* @date: 2023/11/14 18:10
* @author: xxx
*/
@Resource
private HadoopUploadServiceImpl uploadFile;
@RequestMapping("/upload")
public String upload(@RequestParam String srcFile,String dstPath){
uploadFile.uploadFile(srcFile,dstPath);
return "upload";
}
/**
* @Description: 创建目录
* @date: 2023/11/14 18:10
* @author: xxx
*/
@Resource
private HadoopCreateDirServiceImpl createDir;
@RequestMapping("/createDir")
public String createDir(@RequestParam String srcDirectory){
createDir.createDir(srcDirectory);
return "upload";
}
/**
* @Description: 删除目录
* @date: 2023/11/15 18:15
* @author: xxx
*/
@Resource
private HadoopRemoveDirServiceImpl removeDir;
@RequestMapping("/removeDir")
public void remove(@RequestParam String delPath){
removeDir.rmDir(delPath);
}
/**
* @Description: 下载文件
* @date: 2023/11/16 9:42
* @author: xxx
*/
@Resource
private HadoopDownloadFileServiceImpl downloadFile;
@RequestMapping("/downloadFile")
public void getFile(@RequestParam String srcFile,String destPath){
downloadFile.getFile(srcFile,destPath);
}
}
测试
运行程序,运行程序前要确保Hadoop是启动起来了,网页输入ip+9870进入HDFS页面
在IDEA里,点击这个图标,就可以测试调用接口
创建目录测试
创建成功后,在页面刷新下可以看到创建的/test目录
上传文件测试
在本地创建一个txt文件,将地址复制给参数
上传成功
下载文件
本地新建了个文件夹,等下下载的文件放到这里面
下载成功
删除文件
删除HDFS刚刚上传的1.txt文件
删除成功
以上过程只是一个简单的案例
过程若有不对,欢迎指出😀