ROD(Reduction of Diversity,多样性减少) 和 Pi-ratio(核苷酸多样性比率) ,用于检测基因组中可能受到自然选择或人工选择的区域。通过衡量遗传多样性的变化,揭示群体在进化过程中基因频率的异常偏移,进而定位与适应性或特定性状相关的候选区域。
ROD(Reduction of Diversity):多样性减少指数
1. 基本概念
ROD用于量化目标区域相对于参考区域(或基因组背景)的遗传多样性降低程度。在正向选择(尤其是强选择)过程中,有利等位基因会快速在群体中扩散,导致该区域的遗传多样性因“选择性清扫”(selective sweep)而显著降低。ROD通过计算这种“多样性减少”的幅度,定位可能受选择的区域。
2. 核心原理
受到正向选择的区域,由于有利等位基因的高频固定,会表现出:
- 核苷酸多样性(如π、θₐ等)显著低于基因组平均水平;
- 与未受选择的参考区域相比,多样性“减少”的比例更高。
ROD的核心是通过比较目标区域与参考区域的多样性,量化这种“减少”的程度,值越高,说明该区域受选择的可能性越大。
Pi-ratio:核苷酸多样性比率
1. 基本概念
Pi-ratio是两个对比群体(如驯化群体vs野生群体、抗性群体vs敏感群体)在同一基因组区域的核苷酸多样性(π)的比值。它通过比较两个群体的多样性差异,定位因特定选择压力(如驯化、抗逆)而发生遗传分化的区域。
2. 核心原理
两个具有进化关联的群体(如派生群体与祖先群体)中,受选择的区域会表现出:
- 派生群体(如驯化种)的π显著低于祖先群体(如野生种)(因选择导致多样性丢失);
- 或在适应性进化中,派生群体的π显著高于祖先群体(因平衡选择维持多样性)。
Pi-ratio通过量化这种差异(比值偏离1的程度),定位选择区域。
实操
以下分析使用数信院计算平台完成
准备上一节的输出结果: subpop1.windowed.pi, subpop2.windowed.pi
library(ggplot2)
library(tidyr)
library(dplyr)
library(CMplot)
qtile <- 0.95 # 分位数阈值,例如0.95表示95%分位数
prefix <- "outsample"# 输出文件前缀
# 读取数据(替换为实际文件路径)
wd <- read.table("subpop1.windowed.pi", header = TRUE) # subpop1的Pi数据
cd <- read.table("subpop2.windowed.pi", header = TRUE) # subpop2的Pi数据
# 数据处理:合并ID并进行内连接
# 原始数据列名应为:CHROM BIN_START BIN_END N_VARIANTS PI
wd1 <- unite(wd, ID, CHROM, BIN_START, BIN_END, sep = ":", remove = TRUE)
cd1 <- unite(cd, ID, CHROM, BIN_START, BIN_END, sep = ":", remove = TRUE)
wild_cult <- inner_join(wd1, cd1, by = "ID")
# 计算ROD和Pi比率
wild_cult <- wild_cult %>%
mutate(
rod = 1 - PI.y / PI.x, # ROD = 1 - (subpop2的Pi / subpop1的Pi)
pi_ratio1 = PI.x / PI.y, # subpop1的Pi / subpop2的Pi
pi_ratio2 = PI.y / PI.x # subpop2的Pi / subpop1的Pi
) %>%
separate(ID, c("chr", "start", "end"), sep = ":") %>%
unite(ID, chr, start, sep = ":", remove = FALSE)
# ROD绘图
{
# 计算ROD的阈值(指定分位数)
rod_cut <- quantile(wild_cult$rod, qtile, na.rm = TRUE)
# 筛选ROD>0的数据并绘图
rod_data <- select(wild_cult, ID, chr, start, rod) %>% filter(rod > 0)
# 保存为PDF
pdf(paste(prefix, "ROD.pdf", sep = "."), width = 10, height = 3)
CMplot(
rod_data,
type = "p",
plot.type = "m",
LOG10 = FALSE,
col = c("blue4", "orange3"),
cex = 0.2,
band = 0.5,
ylab.pos = 2.5,
cex.axis = 0.8,
threshold = rod_cut,
amplify = FALSE,
ylab = expr(paste("ROD ( 1-", pi, "_subpop2/", pi, "_subpop1 )")),
file.output = FALSE
)
dev.off()
# 保存为PNG
png(paste(prefix, "ROD.png", sep = "."), width = 1000, height = 300)
CMplot(
rod_data,
type = "p",
plot.type = "m",
LOG10 = FALSE,
col = c("blue4", "orange3"),
cex = 0.2,
band = 0.5,
ylab.pos = 2.5,
cex.axis = 0.8,
threshold = rod_cut,
amplify = FALSE,
ylab = expr(paste("ROD ( 1-", pi, "_subpop2/", pi, "_subpop1 )")),
file.output = FALSE
)
dev.off()
}
# Pi_ratio1绘图(subpop1的Pi / subpop2的Pi)
{
ratio1_cut <- quantile(wild_cult$pi_ratio1, qtile, na.rm = TRUE)
ratio1_data <- select(wild_cult, ID, chr, start, pi_ratio1)
# PDF格式
pdf(paste(prefix, "PIratio1.pdf", sep = "."), width = 10, height = 3)
CMplot(
ratio1_data,
type = "p",
plot.type = "m",
LOG10 = FALSE,
col = c("blue4", "orange3"),
cex = 0.2,
band = 0.5,
ylab.pos = 2.5,
cex.axis = 0.8,
threshold = ratio1_cut,
amplify = FALSE,
ylab = expr(paste(pi, "_subpop1/", pi, "_subpop2")),
file.output = FALSE
)
dev.off()
# PNG格式
png(paste(prefix, "PIratio1.png", sep = "."), width = 1000, height = 300)
CMplot(
ratio1_data,
type = "p",
plot.type = "m",
LOG10 = FALSE,
col = c("blue4", "orange3"),
cex = 0.2,
band = 0.5,
ylab.pos = 2.5,
cex.axis = 0.8,
threshold = ratio1_cut,
amplify = FALSE,
ylab = expr(paste(pi, "_subpop1/", pi, "_subpop2")),
file.output = FALSE
)
dev.off()
}
# Pi_ratio2绘图(subpop2的Pi / subpop1的Pi)
{
ratio2_cut <- quantile(wild_cult$pi_ratio2, qtile, na.rm = TRUE)
ratio2_data <- select(wild_cult, ID, chr, start, pi_ratio2)
# PDF格式
pdf(paste(prefix, "PIratio2.pdf", sep = "."), width = 10, height = 3)
CMplot(
ratio2_data,
type = "p",
plot.type = "m",
LOG10 = FALSE,
col = c("blue4", "orange3"),
cex = 0.2,
band = 0.5,
ylab.pos = 2.5,
cex.axis = 0.8,
threshold = ratio2_cut,
amplify = FALSE,
ylab = expr(paste(pi, "_subpop2/", pi, "_subpop1")),
file.output = FALSE
)
dev.off()
# PNG格式
png(paste(prefix, "PIratio2.png", sep = "."), width = 1000, height = 300)
CMplot(
ratio2_data,
type = "p",
plot.type = "m",
LOG10 = FALSE,
col = c("blue4", "orange3"),
cex = 0.2,
band = 0.5,
ylab.pos = 2.5,
cex.axis = 0.8,
threshold = ratio2_cut,
amplify = FALSE,
ylab = expr(paste(pi, "_subpop2/", pi, "_subpop1")),
file.output = FALSE
)
dev.off()
}
# 输出汇总表格
outtable <- select(wild_cult, ID, chr, start, end, PI.x, PI.y, rod, pi_ratio1, pi_ratio2)
outfile <- paste(prefix, "ROD_PiRatio.table", sep = ".")
write.table(outtable, file = outfile, sep = "\t", quote = FALSE, row.names = FALSE)
# 输出统计阈值
cutfile <- paste(prefix, "ROD_PiRatio.cutoff", sep = ".")
outcut <- data.frame(
Iterm = c("ROD(1-subpop2/subpop1)", "Piratio1(subpop1/subpop2)", "Piratio2(subpop2/subpop1)"),
cutoff = c(rod_cut, ratio1_cut, ratio2_cut)
)
write.table(outcut, file = cutfile, sep = "\t", quote = FALSE, row.names = FALSE)
应用
1.ROD
- 检测单一群体中近期受到正向选择的区域(如作物驯化中人工选择的基因、自然群体中适应本地环境的基因)
2.Pi-ratio
- 比较两个相关群体(如驯化vs野生、抗性vs敏感)的选择差异,定位与特定性状(如作物产量、抗逆性)相关的区域;
- 适用于检测人工选择(如育种)或生态适应(如环境压力)导致的遗传分化。
| 指标 | 核心对比对象 | 优势 | 局限性 |
|---|---|---|---|
| ROD | 目标区域 vs 基因组背景 | 适用于单一群体,检测近期选择性清扫 | 受基因组背景多样性波动影响大 |
| Pi-ratio | 群体A vs 群体B(同区域) | 直接反映两个群体的选择差异,定位明确 | 依赖群体遗传背景一致性,易受群体结构干扰 |
注意事项
- 数据质量控制:需过滤低质量SNP(如MQ < 30、DP < 5)、去除缺失率高的位点(如缺失率 > 20%),避免误差;
- 滑动窗口选择:窗口大小需平衡分辨率与稳定性(小窗口噪音大,大窗口可能掩盖小选择区域,通常10-100kb);
- 群体结构校正:若群体存在分层,需先通过PCA或Admixture去除群体结构干扰,否则会导致多样性估计偏差;
- 显著性阈值:需通过 permutation 或经验分布确定显著阈值(如top 5%的ROD或Pi-ratio值),避免假阳性。
ROD和Pi-ratio均通过遗传多样性的变化检测选择区域,但ROD聚焦于单一群体内的多样性减少,Pi-ratio聚焦于两个群体的多样性差异。在实际分析中,常将两者结合(如同时使用ROD和Pi-ratio筛选重叠区域),并结合其他选择信号(如Fₛₜ、iHS等),提高选择区域定位的准确性,为后续基因功能验证提供可靠候选。