安装ariflow 集群安装

484 阅读3分钟

提示:linux系统默认python2.7 有些软件安装需要python3.x支持,此文介绍如何升级python到3.x

airflow api地址:https://airflow.apache.org/docs/apache-airflow/1.10.1/scheduler.html

@TOC

一、安装python环境

1、安装依赖

yum install build-essential zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev libreadline-dev libffi-dev curl libbz2-dev

2、下载安装包

#wget https://www.python.org/ftp/python/3.7.10/Python-3.7.10.tar.xz
wget https://zhengyansheng.oss-cn-beijing.aliyuncs.com/Python-3.7.10.tar.xz
tar xf Python-3.7.10.tar.xz
cd Python-3.7.10

3、编译&安装python源码包

这里一定要带上编译参数--with-ssl,该参数是编译python是加入SSL,如果没有在使用pip3时会报错SSL有问题
./configure --with-ssl
make
sudo make altinstall

4、将默认python替换成python3

unlink /usr/bin/python
ln -sv /usr/local/python37/bin/python3.7 /usr/bin/python
unlink /usr/bin/pip3
ln -sv /usr/local/bin/pip3.7 /usr/bin/pip3

5、修改python源

cat > /etc/pip.conf << EOF
[global]
trusted-host = mirrors.aliyun.com
index-url = http://mirrors.aliyun.com/pypi/simple/

[list]
format=columns
EOF

6、升级pip==20.2.4

pip3.7 install --upgrade pip==20.2.4
# 查看版本
python --version
pip3.7 --version 

注意事项: 由于将OS系统默认的Python版本更改了,导致系统自带的命令行工具(yum/ urlgrabber-ext-down/ yum-config-manager)无法直接使用,需要做更改才行

1. vi /usr/bin/yum
2. vi /usr/libexec/urlgrabber-ext-down
3. vi /usr/bin/yum-config-manager
将头文件修改为原本的python2.x即可
#!/usr/bin/python2.7

二、安装mysql

1、安装

yum list installed | grep mysql
卸载
yum remove ....xxxx
wget http://repo.mysql.com/mysql57-community-release-el7-8.noarch.rpm
rpm -ivh mysql57-community-release-el7-8.noarch.rpm
安装成功后,会在/etc/yum.repos.d/目录下增加了以下两个文件 如下图
启动mysql
yum install mysql-server
查看mysql版本
mysql -V

在这里插入图片描述

2、启动数据库

# 1. Start mysql
service mysqld start

# 2. view mysql login password 查看初始密码  或者使用mysql -uroot -p登录 密码不需要输入
grep "password" /var/log/mysqld.log 
echo explicit_defaults_for_timestamp=1 >> /etc/my.cnf
systemctl restart mysqld.service

3、创建数据库

mysql -uroot -p <xxx>
修改root密码
use mysql; 
update user set password=password('123456') where user='root' and host='localhost'; 
grant all privileges on *.* to root@"%" identified by "123456";
flush privileges;
创建airflow数据库
CREATE DATABASE `airflow` /*!40100 DEFAULT CHARACTER SET utf8 */;
GRANT ALL ON airflow.* TO 'airflow_user'@'%';
FLUSH PRIVILEGES;

三、安装redis

1、安装

# 1. Install remi yum repo
yum install -y epel-release yum-utils
yum install -y http://rpms.remirepo.net/enterprise/remi-release-7.rpm
yum-config-manager --enable remi

# 2. Install redis latest version
yum install -y redis

2、配置

# vi /etc/redis.conf
bind 0.0.0.0

3、启动

# 1. Start redis
systemctl start redis && systemctl enable redis
systemctl status redis

# 2. View redis
ps -ef |grep redis

# 3. Test
redis-cli ping

# 4. View version
redis-cli --version

四、安装ariflow

1、安装

docker部署方式 https://github.com/airflow-cn/airflow-video/blob/master/3-deploy.md

# 1. Set env
export AIRFLOW_HOME=~/airflow

# 2. Install apache-airflow 2.1.0
AIRFLOW_VERSION=2.1.0
PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
可以先下载CONSTRAINT_URL  超时的话去掉https--http 
wget http://raw.githubusercontent.com/apache/airflow/constraints-2.1.0/constraints-3.7.txt
pip3.7 install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
或
pip3.7 install "apache-airflow==${AIRFLOW_VERSION}" --constraint constraints-3.7.txt

2、初始化数据库

# 1. Set up database
## https://airflow.apache.org/docs/apache-airflow/2.1.0/howto/set-up-database.html#
pip3.7 install pymysql
airflow config get-value core sql_alchemy_conn  # 这一步报错,但是会创建文件/opt/module/airflow/airflow.cfg

# 2. Initialize the database
"""
# vi ~/airflow/airflow.cfg
[core]
sql_alchemy_conn = mysql+pymysql://airflow_user:123456@localhost:3306/airflow

airflow db init
1.报错
Global variable explicit_defaults_for_timestamp needs to be on (1) for mysql
解决
vim /usr/my.cnf
添加
[mysqld]
explicit_defaults_for_timestamp=1
2.报错
Native table 'performance_schema'.'session_variables' has the wrong structure"
运行命令
mysql_upgrade -u root -p --force
service mysqld restart

3、创建用户

# Create superuser
airflow users create \
    --username admin \
    --firstname l\
    --lastname a \
    --role Admin \
    --email xxxx@qq.com
输入密码:123456

4、启动服务

后台启动加 airflow webserver --port 8080  -D
airflow webserver --port 8080 

提示:黄色字表示未运行scheduler调度器 此时打开任务不会运行调度 在这里插入图片描述 运行调度器 此时打开任务 该任务每秒执行一次

airflow scheduler

在这里插入图片描述

5、分布式部署

5.1、安装依赖

pip install 'apache-airflow[celery]'
pip install celery[redis]

5.2、设置executor

[core]
# The executor class that airflow should use. Choices include
# ``SequentialExecutor``, ``LocalExecutor``, ``CeleryExecutor``, ``DaskExecutor``,
# ``KubernetesExecutor``, ``CeleryKubernetesExecutor`` or the
# full import path to the class when using a custom executor.
# executor = SequentialExecutor
executor = CeleryExecutor


[celery]
# broker_url = redis://redis:6379/0
broker_url = redis://hadoop102:6379/0

# result_backend = db+postgresql://postgres:airflow@postgres/airflow
result_backend = redis://hadoop102:6379/0

5.3、启动

# 1. Start webserver
airflow webserver -p 8000

# 2. Start scheduler
airflow scheduler

# 3. Start celery worker
airflow celery worker

# 4. Start celery flower
airflow celery flower

5.4、管理界面

Webserver 在这里插入图片描述 flower 在这里插入图片描述

5.5 演示

启动 在这里插入图片描述 查看scheduler日志 在这里插入图片描述 查看Worker日志 在这里插入图片描述