Hadoop集群部署教程-P7
Hadoop集群部署教程(续)
第二十五章:安全加固与权限控制
25.1 Kerberos认证集成
- KDC服务配置:
yum install krb5-server krb5-libs krb5-workstation
kdb5_util create -s -r EXAMPLE.COM
- Hadoop核心配置:
<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
</property>
<property>
<name>hadoop.security.authorization</name>
<value>true</value>
</property>
25.2 Ranger权限管理
- 插件安装流程:
cp ranger-2.3.0-hdfs-plugin.tar.gz /usr/hdp/current/hadoop-client
tar -xzf ranger-2.3.0-hdfs-plugin.tar.gz
./enable-hdfs-plugin.sh
- 权限策略示例:
{
"policyName": "finance-data-access",
"resources": {
"path": {
"values": ["/data/finance/*"],
"isRecursive": true
}
},
"policyItems": [
{
"accesses": [
{"type": "read"},
{"type": "write"}
],
"users": ["finance_team"],
"conditions": [
{"type": "access-time", "values": {"days": "mon-fri"}}
]
}
]
}
第二十六章:多集群联邦部署
26.1 ViewFS配置
- 核心参数设置:
<property>
<name>fs.defaultFS</name>
<value>viewfs://clusterFed/</value>
</property>
<property>
<name>fs.viewfs.mounttable.clusterFed.link./data</name>
<value>hdfs://cluster1/data</value>
</property>
- 命名空间映射表:
viewfs://clusterFed/
├── /data -> hdfs://cluster1/data
├── /logs -> hdfs://cluster2/logs
└── /archive -> hdfs://cluster3/archive
26.2 跨集群数据同步
- DistCp增强方案:
hadoop distcp \
-strategy dynamic \
-bandwidth 100 \
-m 20 \
-update \
hdfs://cluster1/src hdfs://cluster2/target
- 实时同步监控看板:
option = {
series: [{
type: 'gauge',
data: [{
value: syncProgress,
name: '同步进度'
}]
}]
}
第二十七章:容器化部署方案
27.1 Docker镜像构建
- 基础镜像Dockerfile:
FROM openjdk:8-jre
RUN wget https://archive.apache.org/dist/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz
RUN tar -xzf hadoop-3.3.1.tar.gz -C /opt/
ENV HADOOP_HOME=/opt/hadoop-3.3.1
- Kubernetes部署模板:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: datanode
spec:
serviceName: "hadoop-dn"
replicas: 3
template:
spec:
containers:
- name: datanode
image: hadoop-dn:3.3.1
ports:
- containerPort: 9864
volumeMounts:
- mountPath: /hadoop/dfs/data
name: hadoop-data
27.2 持久化存储方案
存储类型 | 适用场景 | 性能指标 |
---|
hostPath | 开发测试环境 | 低延迟 |
NFS | 中小规模生产 | 吞吐量50MB/s |
Ceph RBD | 大规模集群 | IOPS 3000+ |
第二十八章:最佳实践总结
28.1 性能优化检查表
- 配置项验证清单:
hdfs getconf -confKey dfs.namenode.handler.count
yarn node -list | grep 'CPU VCores'
- 基准测试工具:
hadoop jar hadoop-mapreduce-examples.jar terasort \
-Dmapreduce.job.maps=100 \
/input /output
28.2 常见故障处理指南
- 问题诊断流程图:
graph TD
A[服务异常] --> B{日志分析}
B -->|NameNode| C[检查fsimage完整性]
B -->|DataNode| D[验证磁盘空间]
B -->|YARN| E[检查资源超配]
- 关键恢复命令:
hdfs fsck / -files -blocks -locations
hdfs dfsadmin -recoverLease -path /corrupt/file