maker的部分步骤docker安装maker部分步骤，首先是安装docker、配置镜像源 0. 然后maker，当出现

首先是安装docker、配置镜像源

docker run -it --name maker_container -v /home/lilab/9.19-data/data:/data aflatoxing/maker3 bash

/home/lilab/9.19-data/data是挂载目录，进入到docker后其实就是 cd/ cd/data这个文件夹

cd /usr/local/maker/data这个文件夹有
root@c67b99735d40:/usr/local/maker/data# ls
dpp_contig.fasta  dpp_protein.fasta  hsap_contig.fasta  hsap_protein.fasta
dpp_est.fasta     evm_weights.txt    hsap_est.fasta     te_proteins.fasta


mkdir example1
mv dpp_contig.fasta dpp_est.fasta dpp_protein.fasta example1/

root@c67b99735d40:/usr/local/maker/data/example1# ls
dpp_contig.fasta  dpp_est.fasta  dpp_protein.fasta  maker_bopts.ctl  maker_evm.ctl  maker_exe.ctl  maker_opts.ctl
root@c67b99735d40:/usr/local/maker/data/example1# vi maker_opts.ctl

配置出现一

genome=dpp_contig.fasta
est=dpp_est.fasta
protein=dpp_protein.fasta

然后maker，当出现finished代表结束，cd进入cd /usr/local/maker/data/example1/dpp_contig.maker.output/

PART 1：MAKER 第一轮运行
修改 maker_opts.ctl 文件：

bash
复制代码
nano maker_opts.ctl
设置以下参数：

bash
复制代码
genome=/data/he_flye_scaffolds.fa
est=/data/Teinity.fasta
protein=/data/Ricciocarpos_test_proteins.faa,/data/MpTak_v6.1r2_proteins.faa
model_org=physcomitrella  # 你可以根据情况选择合适的近缘种，或使用 RepeatModeler 结果
rmlib=  # 为空，因为你未使用特定重复序列库
est2genome=1
protein2genome=1
TMP=/data/tmp  # 临时目录
运行 MAKER 第一轮：

bash
复制代码
mpiexec -n 48 maker -base cbp_rnd1 >& log1 &
PART 2：SNAP 第一轮训练
1. 创建 SNAP 第一轮训练的目录并运行 SNAP：

bash
复制代码
cd /data
mkdir snap1
cd snap1
2. 合并 GFF 文件并生成 SNAP 训练数据：

bash
复制代码
gff3_merge -d ../cbp_rnd1.maker.output/cbp_rnd1_master_datastore_index.log
maker2zff -l 50 -x 0.5 cbp_rnd1.all.gff
fathom -categorize 1000 genome.ann genome.dna
fathom -export 1000 -plus uni.ann uni.dna
forge export.ann export.dna
hmm-assembler.pl cbp . > ../cbp1.hmm
3. 修改 maker_opts.ctl 文件，备份之前的配置：

bash
复制代码
cd /data
cp maker_opts.ctl maker_opts.ctl_backup_rnd1
nano maker_opts.ctl
修改 maker_opts.ctl 文件内容如下：

bash
复制代码
genome=/data/he_flye_scaffolds.fa
maker_gff=cbp_rnd1.all.gff
snaphmm=/data/cbp1.hmm  # 使用第一轮生成的 HMM 模型
est_pass=1  # 使用第一轮 EST 对齐结果
protein_pass=1  # 使用第一轮蛋白质对齐结果
rm_pass=1  # 使用第一轮 GFF 文件中的重复序列
est=  # 不再使用 EST 文件
protein=  # 不再使用蛋白质文件
model_org=  # 不再使用 RepeatMasker 的 model_org
rmlib=  # 不再运行重复序列屏蔽
est2genome=0  # 不再基于 EST 预测基因模型
protein2genome=0  # 不再基于蛋白质预测基因模型
pred_stats=1  # 报告 AED 统计
alt_splice=0  # 不预测可变剪接
keep_preds=1  # 保留没有证据支持的基因
4. 运行 MAKER 第二轮：

bash
复制代码
mpiexec -n 48 maker -base cbp_rnd2 >& log2 &
PART 3：SNAP 第二轮训练
1. 创建 SNAP 第二轮训练的目录并运行 SNAP：

bash
复制代码
cd /data
mkdir snap2
cd snap2
2. 合并 GFF 文件并生成 SNAP 训练数据：

bash
复制代码
gff3_merge -d ../cbp_rnd2.maker.output/cbp_rnd2_master_datastore_index.log
maker2zff -l 50 -x 0.5 cbp_rnd2.all.gff
fathom -categorize 1000 genome.ann genome.dna
fathom -export 1000 -plus uni.ann uni.dna
forge export.ann export.dna
hmm-assembler.pl cbp . > ../cbp2.hmm
3. 修改 maker_opts.ctl 文件，备份之前的配置：

bash
复制代码
cd /data
cp maker_opts.ctl maker_opts.ctl_backup_rnd2
nano maker_opts.ctl
修改 maker_opts.ctl 文件内容如下：

bash
复制代码
genome=/data/he_flye_scaffolds.fa
maker_gff=cbp_rnd2.all.gff
snaphmm=/data/cbp2.hmm  # 使用第二轮生成的 HMM 模型
4. 运行 MAKER 第三轮：

bash
复制代码
mpiexec -n 48 maker -base cbp_rnd3 >& log3 &
PART 4：SNAP 第三轮训练 + 使用 Augustus
1. 创建 SNAP 第三轮训练的目录并运行 SNAP：

bash
复制代码
cd /data
mkdir snap3
cd snap3
2. 合并 GFF 文件并生成 SNAP 训练数据：

bash
复制代码
gff3_merge -d ../cbp_rnd3.maker.output/cbp_rnd3_master_datastore_index.log
maker2zff -l 50 -x 0.5 cbp_rnd3.all.gff
fathom -categorize 1000 genome.ann genome.dna
fathom -export 1000 -plus uni.ann uni.dna
forge export.ann export.dna
hmm-assembler.pl cbp . > ../cbp3.hmm
3. 修改 maker_opts.ctl 文件，备份之前的配置：

bash
复制代码
cd /data
cp maker_opts.ctl maker_opts.ctl_backup_rnd3
nano maker_opts.ctl
修改 maker_opts.ctl 文件内容如下：

bash
复制代码
genome=/data/he_flye_scaffolds.fa
maker_gff=cbp_rnd3.all.gff
snaphmm=/data/cbp3.hmm  # 使用第三轮生成的 HMM 模型
augustus_species=arabidopsis  # 使用 Augustus 进行注释，指定近缘种
4. 运行 MAKER 第四轮：

bash
复制代码
mpiexec -n 48 maker -base cbp_rnd4 >& log4 &
PART 5：提取最终的 GFF 文件和蛋白/转录本序列
1. 提取最终的 GFF 文件：

cd /data
gff3_merge -n -d cbp_rnd4.maker.output/cbp_rnd4_master_datastore_index.log > cbp_rnd4.noseq.gff
2. 提取蛋白和转录本序列：

fasta_merge -d cbp_rnd4.maker.output/cbp_rnd4_master_datastore_index.log
3. 最终文件生成：

bash
复制代码
cbp_rnd4.noseq.gff
cbp_rnd4.all.maker.proteins.fasta
cbp_rnd4.all.maker.transcripts.fasta

安装mpich

apt-get update
apt-get install -y mpich
cd /usr/local/maker/src
nano Build.PL
把N换成Y
perl Build.PL
./Build install
cd data/input
mpiexec -n 60 maker -base cbp_rnd1 >& log1 &