LS-LINUX-003 单机自动化运维
一、基础架构设计(单机版)
1. 核心组件选择
graph TD
A[控制端] -->|SSH/API| B[被控服务器]
B --> C[Ansible]
B --> D[Docker]
B --> E[Prometheus]
B --> F[ELK Stack]
2. 推荐技术栈
- 配置管理: Ansible(无Agent架构)
- 容器化: Docker + Docker Compose
- 监控系统: Prometheus + Grafana + Node Exporter
- 日志系统: Elasticsearch + Logstash + Kibana (ELK)
- 持续交付: GitLab CI/CD 或 Jenkins
- Web管理: Flask/Django + Bootstrap
二、分阶段实施计划
阶段1:基础设施自动化(1-3天)
核心任务:建立基础运维框架
# 安装核心工具
sudo apt-get install -y ansible python3-pip git
pip3 install docker-compose
# 创建Ansible清单文件
mkdir -p /opt/automation/inventories
echo "[single_server]
192.168.1.100 ansible_user=root" > /opt/automation/inventories/hosts
阶段2:监控系统搭建(示例配置)
# docker-compose-monitor.yml
version: '3'
services:
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
node_exporter:
image: prom/node-exporter:latest
ports:
- "9100:9100"
pid: "host"
阶段3:Web控制台开发(Python示例)
# app.py
from flask import Flask, render_template
import subprocess
app = Flask(__name__)
@app.route('/deploy/<service>')
def deploy_service(service):
result = subprocess.run(
f"ansible-playbook -i inventories/hosts deploy_{service}.yml",
shell=True,
capture_output=True
)
return f"Deployment output:\n{result.stdout.decode()}"
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)
三、关键自动化场景实现
1. 自动化安全加固
# security_hardening.yml
- name: Apply security updates
apt:
upgrade: dist
update_cache: yes
- name: Configure firewall
ufw:
rule: allow
port: "{{ item }}"
loop:
- 22
- 80
- 443
- name: Disable root SSH login
lineinfile:
path: /etc/ssh/sshd_config
regexp: '^PermitRootLogin'
line: 'PermitRootLogin no'
notify: restart sshd
handlers:
- name: restart sshd
service:
name: sshd
state: restarted
2. 智能监控告警
# prometheus/rules.yml
groups:
- name: node_alerts
rules:
- alert: HighCPUUsage
expr: 100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
四、进阶扩展方向
1. 架构演进路线
graph LR
A[单机自动化] --> B[容器编排]
B --> C[混合云管理]
C --> D[AIOps]
style A fill:#f9f,stroke:#333
style D fill:#ccf,stroke:#f66
2. 扩展功能建议
- 灰度发布系统:使用Nginx + Lua实现流量切分
- 配置中心:Apollo或Consul实现动态配置
- 灾备演练:Chaos Engineering工具如Chaos Mesh
五、安全注意事项
-
密钥管理方案:
# 使用Ansible Vault加密敏感数据 ansible-vault encrypt_string 'super_secret' --name 'db_password'
-
访问控制矩阵:
# RBAC示例 def check_permission(user, action): permissions = { 'admin': ['deploy', 'restart', 'config'], 'developer': ['deploy', 'logs'], 'guest': ['view'] } return action in permissions.get(user.role, [])
六、学习资源推荐
-
书籍:
- 《Ansible权威指南》李松涛(机械工业出版社)^^ansible-book^^
- 《SRE:Google运维解密》(北京电子工业出版社)^^sre-book^^
-
课程:
- 极客时间《运维体系管理课》^^geektime-course^^
- Coursera《Google IT Automation with Python》^^coursera-course^^
-
工具链:
graph LR A[开发] --> B[GitLab] B --> C[Jenkins] C --> D[Ansible] D --> E[Kubernetes] E --> F[Prometheus] F --> G[Grafana]
建议从Ansible和Docker入手,逐步扩展监控和日志系统。单服务器环境是学习自动化运维的绝佳实验平台,关键要建立规范的运维流程体系。后续扩展集群时,重点注意配置管理的标准化和服务的无状态化改造。