sqoop安装
Sqoop简介
Sqoop(发音:skup)是一款开源的工具,主要用于在Hadoop(Hive)与传统的数据库(mysql、postgresql...)间进行数据的传递,可以将一个关系型数据库(例如 : MySQL ,Oracle ,Postgres等)中的数据导进到Hadoop的HDFS中,也可以将HDFS的数据导进到关系型数据库中。
说明
官网地址:sqoop.apache.org/
在releases下面下载对应版本
现在地址: www.apache.org/dyn/closer.…
准备工作
上传到sqoop-1.99.7-bin-hadoop200.tar.gz到linux 并解压到 /root/apps
解压到安装目录apps
[root@linux1 apps]# ll /root/apps
总用量 52
drwxr-xr-x. 3 root root 4096 4月 20 16:12 appdata
drwxr-xr-x. 10 1000 ftp 4096 2月 21 2010 hadoop-2.7.7
drwxr-xr-x. 8 uucp 143 4096 6月 22 2016 jdk1.8.0_101
drwxr-xr-x. 8 root root 4096 5月 20 21:28 sqoop-1.99.7-bin-hadoop200
[root@linux1 apps]#
配置sqoop的环境变量并使之生效
[root@linux01 sqoop-1.99.7-bin-hadoop200]# vi /etc/profile
export SQOOP_HOME=/root/apps/sqoop-1.99.7-bin-hadoop200
export PATH=$PATH:$SQOOP_HOME/bin
export SQOOP_SERVER_EXTRA_LIB=$SQOOP_HOME/extra
export CATALINA_BASE=$SQOOP_HOME/server
export LOGDIR=$SQOOP_HOME/logs/
[root@master sqoop]#source /etc/profile
创建相关目录
[root@master sqoop]# mkdir ${SQOOP_HOME}/extra
[root@master sqoop]# mkdir ${SQOOP_HOME}/logs
修改配置文件sqoop.properties
[root@master ~]# cd /root/apps/sqoop-1.99.7-bin-hadoop200/conf
[root@linux01 conf]#
[root@linux01 conf]# ll
总用量 3
-rw-r--r--. 1 root root 668 5月 20 21:51 catalina.properties
-rw-r--r--. 1 root root 1584 7月 20 2016 sqoop_bootstrap.properties
-rw-r--r--. 1 root root 9731 5月 20 21:40 sqoop.properties
[root@linux01 conf]#
[root@linux01 conf]# vi sqoop.properties
搜索org.apache.sqoop.submission.engine.mapreduce.configuration.directory配置hadoop的配置路径
org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/root/apps/hadoop-2.7.7/etc/hadoop
添加catalina.properties文件
说明:在conf目录下,添加catalina.properties文件。加入本机hadoop的相关jar路径,如下所示:
[root@linux01 conf]# cd /root/apps/sqoop-1.99.7-bin-hadoop200/conf
[root@linux01 conf]# ll
总用量 20
-rw-r--r--. 1 root root 668 5月 20 21:51 catalina.properties
-rw-r--r--. 1 root root 1584 7月 20 2016 sqoop_bootstrap.properties
-rw-r--r--. 1 root root 9731 5月 20 21:40 sqoop.properties
[root@linux01 conf]# vi catalina.properties
注意下面:这里唯一要修改的是你的hadoop安装路径是什么。
common.loader=${catalina.base}/lib,${catalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar,${catalina.home}/../lib/*.jar,/root/apps/hadoop-2.7.7/share/hadoop/common/*.jar,/root/apps/hadoop-2.7.7/share/hadoop/common/lib/*.jar,/root/apps/hadoop-2.7.7/share/hadoop/hdfs/*.jar,/root/apps/hadoop-2.7.7/share/hadoop/hdfs/lib/*.jar,/root/apps/hadoop-2.7.7/share/hadoop/mapreduce/*.jar,/root/apps/hadoop-2.7.7/share/hadoop/mapreduce/lib/*.jar,/root/apps/hadoop-2.7.7/share/hadoop/tools/lib/*.jar,/root/apps/hadoop-2.7.7/share/hadoop/yarn/*.jar,/root/apps/hadoop-2.7.7/share/hadoop/yarn/lib/*.jar,/root/apps/hadoop-2.7.7/share/hadoop/httpfs/tomcat/lib/*.jar
修改hadoop 的core-site.xml文件
添加以下内容
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://linux01:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/root/apps/appdata</value>
</property>
<!--sqoop需要添加的配置-->
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
</configuration>
sqoop命令
启动sqoop服务
[root@linux01 ~]# ${SQOOP_HOME}/bin/sqoop.sh server start
Setting conf dir: ./../conf
Sqoop home directory: /root/apps/sqoop-1.99.7-bin-hadoop200
Starting the Sqoop2 server...
1 [main] INFO org.apache.sqoop.core.SqoopServer - Initializing Sqoop server.
54 [main] INFO org.apache.sqoop.core.PropertiesConfigurationProvider - Starting config file poller thread
Sqoop2 server started.
[root@linux01 ~]# jps
1842 SecondaryNameNode
1653 NameNode
2278 SqoopJettyServer
2298 Jps
1998 ResourceManager
[root@linux01 ~]#
进入sqoop client
[root@linux01 bin]# ./sqoop.sh client
Setting conf dir: ./../conf
Sqoop home directory: /root/apps/sqoop-1.99.7-bin-hadoop200
Sqoop Shell: Type 'help' or '\h' for help.
sqoop:000>
或者
[root@linux01 bin]# ./sqoop2-shell
Setting conf dir: /root/apps/sqoop-1.99.7-bin-hadoop200/bin/../conf
Sqoop home directory: /root/apps/sqoop-1.99.7-bin-hadoop200
Sqoop Shell: Type 'help' or '\h' for help.
sqoop:000>
未知的设置需要研究
set server --host linux01 --port 12000 --webapp sqoop
查看并设置输出级别
true输出的更加详细,可以跟踪错误信息,便于调试。
sqoop:000> show option --name verbose
Verbose = false
sqoop:000> set option --name verbose --value true
Verbose option was changed to true
sqoop:000> show option --name verbose
Verbose = true
sqoop:000>
删除link
sqoop:000> delete link -n linkName
查看 job
sqoop:000> show job
删除job
sqoop:000> delete job -n jobName
查看可用连接器
sqoop:002> show connector
+------------------------+---------+------------------------------------
| Name | Version | Class | Supported Directions|
+------------------------+---------+-------------------------------------
| generic-jdbc-connector | 1.99.7 | GenericJdbcConnector | FROM/TO |
| kite-connector | 1.99.7 | KiteConnector | FROM/TO|
| oracle-jdbc-connector | 1.99.7 | OracleJdbcConnector | FROM/TO|
| ftp-connector | 1.99.7 | FtpConnector | TO |
| hdfs-connector | 1.99.7 | HdfsConnector | FROM/TO|
| kafka-connector | 1.99.7 | KafkaConnector | TO |
| sftp-connector | 1.99.7 | SftpConnector | TO |
+------------------------+---------+--------------------------------------
sqoop:002>
查看已经创建得link
sqoop:002> show link
+--------+------------------------+---------+
| Name | Connector Name | Enabled |
+--------+------------------------+---------+
| mysql1 | generic-jdbc-connector | true |
| mysql2 | generic-jdbc-connector | true |
| hdfs1 | hdfs-connector | true |
| hdfs2 | hdfs-connector | true |
+--------+------------------------+---------+
sqoop:002>
创建mysqllink
说明:
- 前提要安装mysql准备表数据(注意:表要有主键否则报错)并授权给所有的hadoop的从节点,
- 在/root/apps/sqoop-1.99.7-bin-hadoop200/server/lib下面要复制一份mysql的驱动。
sqoop:000> create link -c generic-jdbc-connector
Creating link for connector with name generic-jdbc-connector
Please fill following values to create new link object
Name: mysql2
Database connection
Driver class: com.mysql.jdbc.Driver
Connection String: jdbc:mysql://linux01:3306/test
Username: root
Password: ******
Fetch Size:
Connection Properties:
There are currently 0 values in the map:
entry#
SQL Dialect
Identifier enclose:
New link was successfully created with validation status OK and name mysql2
sqoop:000>
sqoop:002> show link
+--------+------------------------+---------+
| Name | Connector Name | Enabled |
+--------+------------------------+---------+
| mysql2 | generic-jdbc-connector | true |
+--------+------------------------+---------+
sqoop:002>
注意SQL Dialect 的下面Identifier enclose项需要输入一个空格,然后回车
创建hfds link
sqoop:000> create link -c hdfs-connector
Creating link for connector with name hdfs-connector
Please fill following values to create new link object
Name: hdfs2
HDFS cluster
URI: hdfs://linux01:9000
Conf directory: /root/apps/hadoop-2.7.7/etc/hadoop
Additional configs::
There are currently 0 values in the map:
entry#
New link was successfully created with validation status OK and name hdfs2
sqoop:000>
sqoop:000> show link
+--------+------------------------+---------+
| Name | Connector Name | Enabled |
+--------+------------------------+---------+
| mysql1 | generic-jdbc-connector | true |
| hdfs2 | hdfs-connector | true |
+--------+------------------------+---------+
创建 job
create job -f formLinkName -t toLinkName
案例:
注意下面红色部分
sqoop:000> create job -f mysql-link -t hdfs-link
Creating job for links with from name mysql-link and to name hdfs-link
Please fill following values to create new job object
Name: test1(job名称)
Database source
Schema name: test(数据库的schema名称)
Table name: user(数据库表名)
SQL statement:
Column names:
There are currently 0 values in the list:
element#
Partition column:
Partition column nullable:
Boundary query:
Incremental read
Check column:
Last value:
Target configuration
Override null value:
Null value:
File format:
0 : TEXT_FILE
1 : SEQUENCE_FILE
2 : PARQUET_FILE
Choose: 0(选择文本文件)
Compression codec:
0 : NONE
1 : DEFAULT
2 : DEFLATE
3 : GZIP
4 : BZIP2
5 : LZO
6 : LZ4
7 : SNAPPY
8 : CUSTOM
Choose: 0(选择NONE)
Custom codec:
Output directory: /root/test(这里输入HDFS文件的目录,需要是空目录)
Append mode:
Throttling resources
Extractors: 2(这里是参考官网填的2)
Loaders: 2(这里是参考官网填的2)
Classpath configuration
Extra mapper jars:
There are currently 0 values in the list:
element#
New job was successfully created with validation status OK and name test1
sqoop:000>
启动job
sqoop:002> start job -name mysql2hdfs2
Submission details
Job Name: mysql2hdfs2
Server URL: http://localhost:12000/sqoop/
Created by: root
Creation date: 2020-05-21 20:37:41 CST
Lastly updated by: root
External ID: job_1590052132270_0001
http://linux01:8088/proxy/application_1590052132270_0001/
Source Connector schema: Schema{name= test . emp ,columns=[
FixedPoint{name=empno,nullable=true,type=FIXED_POINT,byteSize=4,signed=true},
Text{name=ename,nullable=true,type=TEXT,charSize=null}]}
2020-05-21 20:37:41 CST: BOOTING - Progress is not available
sqoop:002>
查看hdfs
[root@linux03 hadoop]# hadoop fs -ls /a
20/05/21 20:47:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r-- 3 root supergroup 0 2020-05-21 20:38 /a/768ac6af-c8c7-4a4f-9350-7c6f6df4ce31.txt
-rw-r--r-- 3 root supergroup 22 2020-05-21 20:38 /a/ef9f9632-8120-4663-9d00-10bc1231a904.txt
[root@linux03 hadoop]# hadoop fs -cat /a/ef9f9632-8120-4663-9d00-10bc1231a904.txt
1,'zhangsan'
2,'lisi'
[root@linux03 hadoop]#
mysql 导入hdfs案例
准备工作
- 安装hadoop并启动所有服务
- 在HDFS上创建一个目录用于存放导入的数据
创建命令:hadoop fs -mkdir /sqoopinput
sqoopinput 是创建得目录
-
安装mysql并授权给hadoop的节点并准备一张表数据
-
安装sqoop1.99.7
具体导入操作
- 启动sqoop服务
[root@linux01 server]# cd ${SQOOP_HOME}/bin
[root@linux01 bin]#
[root@linux01 bin]#
[root@linux01 bin]# pwd
/root/apps/sqoop-1.99.7-bin-hadoop200/bin
[root@linux01 bin]# ll
总用量 32
drwxr-xr-x. 3 root root 4096 5月 21 14:59 @BASEDIR@
drwxr-xr-x. 2 root root 4096 5月 21 14:58 @LOGDIR@
-rwxr--r--. 1 root root 1175 7月 20 2016 sqoop2-server
-rwxr--r--. 1 root root 1166 7月 20 2016 sqoop2-shell
-rwxr--r--. 1 root root 1163 7月 20 2016 sqoop2-tool
-rwxr--r--. 1 root root 7171 7月 20 2016 sqoop.sh
-rwxr--r--. 1 root root 942 7月 20 2016 sqoop-sys.sh
[root@linux01 bin]#
[root@linux01 bin]# ./sqoop.sh server start
Setting conf dir: ./../conf
Sqoop home directory: /root/apps/sqoop-1.99.7-bin-hadoop200
Starting the Sqoop2 server...
0 [main] INFO org.apache.sqoop.core.SqoopServer - Initializing Sqoop server.
102 [main] INFO org.apache.sqoop.core.PropertiesConfigurationProvider - Starting config file poller thread
Sqoop2 server started.
[root@linux01 bin]#
- 进入sqoop客户端
[root@linux01 bin]# ./sqoop2-shell
Setting conf dir: /root/apps/sqoop-1.99.7-bin-hadoop200/bin/../conf
Sqoop home directory: /root/apps/sqoop-1.99.7-bin-hadoop200
Sqoop Shell: Type 'help' or '\h' for help.
sqoop:000>
- 设置服务(linux01是hadoop的主节点)
sqoop:000> set server --host linux01--port 12000 --webapp sqoop
Usage: set server
sqoop:000>
- 设置日志输出级别
sqoop:000> show option --name verbose
Verbose = false
sqoop:000> set option --name verbose --value true
Verbose option was changed to true
sqoop:000> show option --name verbose
Verbose = true
sqoop:000>
- 创建mysql link
sqoop:000> create link -c generic-jdbc-connector
Creating link for connector with name generic-jdbc-connector
Please fill following values to create new link object
Name: mysqllink3
Database connection
Driver class: com.mysql.jdbc.Driver
Connection String: jdbc:mysql://linux01:3306/test
Username: root
Password: ******
Fetch Size:
Connection Properties:
There are currently 0 values in the map:
entry#
SQL Dialect
Identifier enclose:
New link was successfully created with validation status WARNING and name mysqllink3
sqoop:000>
- 创建hdfs link
sqoop:000> create link -c hdfs-connector
Creating link for connector with name hdfs-connector
Please fill following values to create new link object
Name: hdfslink3
HDFS cluster
URI: hdfs://linux01:9000
Conf directory: /root/apps/hadoop-2.7.7/etc/hadoop
Additional configs::
There are currently 0 values in the map:
entry#
New link was successfully created with validation status OK and name hdfslink3
sqoop:000> show link
+------------+------------------------+---------+
| Name | Connector Name | Enabled |
+------------+------------------------+---------+
| mysqllink3 | generic-jdbc-connector | true |
| hdfslink3 | hdfs-connector | true |
+------------+------------------------+---------+
sqoop:000>
- 创建job
sqoop:000> create job -f mysqllink3 -t hdfslink3
Creating job for links with from name mysqllink3 and to name hdfslink3
Please fill following values to create new job object
Name: mysqltohdfs3
Database source
Schema name: test
Table name: emp
SQL statement:
Column names:
There are currently 0 values in the list:
element#
Partition column:
Partition column nullable:
Boundary query:
Incremental read
Check column:
Last value:
Target configuration
Override null value:
Null value:
File format:
0 : TEXT_FILE
1 : SEQUENCE_FILE
2 : PARQUET_FILE
Choose: 0
Compression codec:
0 : NONE
1 : DEFAULT
2 : DEFLATE
3 : GZIP
4 : BZIP2
5 : LZO
6 : LZ4
7 : SNAPPY
8 : CUSTOM
Choose: 0
Custom codec:
Output directory: /sqoopinput
Append mode:
Throttling resources
Extractors: 2
Loaders: 2
Classpath configuration
Extra mapper jars:
There are currently 0 values in the list:
element#
New job was successfully created with validation status OK and name mysqltohdfs3
sqoop:000>
- 启动job
注意:启动job最好是观察yarn的进度,是否完成,否则如果没有提交完毕,直接查看hdfs里面可能没有数据。
这里不看yarn也可以,需要耐心等待一会,再去查看hdfs
sqoop:000> start job -name mysqltohdfs3
Submission details
Job Name: mysqltohdfs3
Server URL: http://localhost:12000/sqoop/
Created by: root
Creation date: 2020-05-22 10:27:11 CST
Lastly updated by: root
External ID: job_1590107885418_0001
http://linux01:8088/proxy/application_1590107885418_0001/
Source Connector schema: Schema{name= test . emp ,columns=[
FixedPoint{name=empno,nullable=true,type=FIXED_POINT,byteSize=4,signed=true},
Text{name=ename,nullable=true,type=TEXT,charSize=null}]}
2020-05-22 10:27:11 CST: BOOTING - Progress is not available
sqoop:000>
- 查看hdfs是否迁移成功
[root@linux01 bin]# hadoop fs -ls /sqoopinput
-rw-r--r-- 3 root supergroup 22 2020-05-22 10:28 /sqoopinput/02d7eab2-cf84-4b46-a134-adaa4dc11f4a.txt
-rw-r--r-- 3 root supergroup 0 2020-05-22 10:28 /sqoopinput/d8142f84-2a12-427f-affd-0c59287a15cb.txt
[root@linux01 bin]#
[root@linux01 bin]# hadoop fs -cat /sqoopinput/02d7eab2-cf84-4b46-a134-adaa4dc11f4a.txt
1,'zhangsan'
2,'lisi'
[root@linux01 bin]#
中途遇到的坑
在创建link、job途中,由于是双向交互,万一有一个参数填写错误,将不能修改,这里最好的办法是重新创建一个新的link。和原来不能重名,否则会提示已经存在此link,但是show link命令里面没有此link,蛋疼。如果退出client,停止sqoop服务,然后重新创建就什么问题都没有了。
有关其他课程请参阅 juejin.cn/user/175884…