重测序群体选择分析之ROD 和 Pi-ratio值计算ROD（Reduction of Diversity）：多样性减少

ROD（Reduction of Diversity，多样性减少） 和 Pi-ratio（核苷酸多样性比率） ，用于检测基因组中可能受到自然选择或人工选择的区域。通过衡量遗传多样性的变化，揭示群体在进化过程中基因频率的异常偏移，进而定位与适应性或特定性状相关的候选区域。

ROD（Reduction of Diversity）：多样性减少指数

1. 基本概念

ROD用于量化目标区域相对于参考区域（或基因组背景）的遗传多样性降低程度。在正向选择（尤其是强选择）过程中，有利等位基因会快速在群体中扩散，导致该区域的遗传多样性因“选择性清扫”（selective sweep）而显著降低。ROD通过计算这种“多样性减少”的幅度，定位可能受选择的区域。

2. 核心原理

受到正向选择的区域，由于有利等位基因的高频固定，会表现出：

核苷酸多样性（如π、θₐ等）显著低于基因组平均水平；
与未受选择的参考区域相比，多样性“减少”的比例更高。

ROD的核心是通过比较目标区域与参考区域的多样性，量化这种“减少”的程度，值越高，说明该区域受选择的可能性越大。

Pi-ratio：核苷酸多样性比率

1. 基本概念

Pi-ratio是两个对比群体（如驯化群体vs野生群体、抗性群体vs敏感群体）在同一基因组区域的核苷酸多样性（π）的比值。它通过比较两个群体的多样性差异，定位因特定选择压力（如驯化、抗逆）而发生遗传分化的区域。

2. 核心原理

两个具有进化关联的群体（如派生群体与祖先群体）中，受选择的区域会表现出：

派生群体（如驯化种）的π显著低于祖先群体（如野生种）（因选择导致多样性丢失）；
或在适应性进化中，派生群体的π显著高于祖先群体（因平衡选择维持多样性）。

Pi-ratio通过量化这种差异（比值偏离1的程度），定位选择区域。

实操

以下分析使用数信院计算平台完成

准备上一节的输出结果： subpop1.windowed.pi， subpop2.windowed.pi

library(ggplot2)
library(tidyr)
library(dplyr)
library(CMplot)

qtile <- 0.95  # 分位数阈值，例如0.95表示95%分位数
prefix <- "outsample"# 输出文件前缀

# 读取数据（替换为实际文件路径）
wd <- read.table("subpop1.windowed.pi", header = TRUE)  # subpop1的Pi数据
cd <- read.table("subpop2.windowed.pi", header = TRUE)  # subpop2的Pi数据

# 数据处理：合并ID并进行内连接
# 原始数据列名应为：CHROM   BIN_START BIN_END N_VARIANTS      PI
wd1 <- unite(wd, ID, CHROM, BIN_START, BIN_END, sep = ":", remove = TRUE)
cd1 <- unite(cd, ID, CHROM, BIN_START, BIN_END, sep = ":", remove = TRUE)
wild_cult <- inner_join(wd1, cd1, by = "ID")

# 计算ROD和Pi比率
wild_cult <- wild_cult %>%
  mutate(
    rod = 1 - PI.y / PI.x,  # ROD = 1 - (subpop2的Pi / subpop1的Pi)
    pi_ratio1 = PI.x / PI.y,  # subpop1的Pi / subpop2的Pi
    pi_ratio2 = PI.y / PI.x   # subpop2的Pi / subpop1的Pi
  ) %>%
  separate(ID, c("chr", "start", "end"), sep = ":") %>%
  unite(ID, chr, start, sep = ":", remove = FALSE)

# ROD绘图
{
# 计算ROD的阈值（指定分位数）
  rod_cut <- quantile(wild_cult$rod, qtile, na.rm = TRUE)

# 筛选ROD>0的数据并绘图
  rod_data <- select(wild_cult, ID, chr, start, rod) %>% filter(rod > 0)

# 保存为PDF
  pdf(paste(prefix, "ROD.pdf", sep = "."), width = 10, height = 3)
  CMplot(
    rod_data,
    type = "p",
    plot.type = "m",
    LOG10 = FALSE,
    col = c("blue4", "orange3"),
    cex = 0.2,
    band = 0.5,
    ylab.pos = 2.5,
    cex.axis = 0.8,
    threshold = rod_cut,
    amplify = FALSE,
    ylab = expr(paste("ROD ( 1-", pi, "_subpop2/", pi, "_subpop1 )")),
    file.output = FALSE
  )
  dev.off()

# 保存为PNG
  png(paste(prefix, "ROD.png", sep = "."), width = 1000, height = 300)
  CMplot(
    rod_data,
    type = "p",
    plot.type = "m",
    LOG10 = FALSE,
    col = c("blue4", "orange3"),
    cex = 0.2,
    band = 0.5,
    ylab.pos = 2.5,
    cex.axis = 0.8,
    threshold = rod_cut,
    amplify = FALSE,
    ylab = expr(paste("ROD ( 1-", pi, "_subpop2/", pi, "_subpop1 )")),
    file.output = FALSE
  )
  dev.off()
}

# Pi_ratio1绘图（subpop1的Pi / subpop2的Pi）
{
  ratio1_cut <- quantile(wild_cult$pi_ratio1, qtile, na.rm = TRUE)
  ratio1_data <- select(wild_cult, ID, chr, start, pi_ratio1)

# PDF格式
  pdf(paste(prefix, "PIratio1.pdf", sep = "."), width = 10, height = 3)
  CMplot(
    ratio1_data,
    type = "p",
    plot.type = "m",
    LOG10 = FALSE,
    col = c("blue4", "orange3"),
    cex = 0.2,
    band = 0.5,
    ylab.pos = 2.5,
    cex.axis = 0.8,
    threshold = ratio1_cut,
    amplify = FALSE,
    ylab = expr(paste(pi, "_subpop1/", pi, "_subpop2")),
    file.output = FALSE
  )
  dev.off()

# PNG格式
  png(paste(prefix, "PIratio1.png", sep = "."), width = 1000, height = 300)
  CMplot(
    ratio1_data,
    type = "p",
    plot.type = "m",
    LOG10 = FALSE,
    col = c("blue4", "orange3"),
    cex = 0.2,
    band = 0.5,
    ylab.pos = 2.5,
    cex.axis = 0.8,
    threshold = ratio1_cut,
    amplify = FALSE,
    ylab = expr(paste(pi, "_subpop1/", pi, "_subpop2")),
    file.output = FALSE
  )
  dev.off()
}

# Pi_ratio2绘图（subpop2的Pi / subpop1的Pi）
{
  ratio2_cut <- quantile(wild_cult$pi_ratio2, qtile, na.rm = TRUE)
  ratio2_data <- select(wild_cult, ID, chr, start, pi_ratio2)

# PDF格式
  pdf(paste(prefix, "PIratio2.pdf", sep = "."), width = 10, height = 3)
  CMplot(
    ratio2_data,
    type = "p",
    plot.type = "m",
    LOG10 = FALSE,
    col = c("blue4", "orange3"),
    cex = 0.2,
    band = 0.5,
    ylab.pos = 2.5,
    cex.axis = 0.8,
    threshold = ratio2_cut,
    amplify = FALSE,
    ylab = expr(paste(pi, "_subpop2/", pi, "_subpop1")),
    file.output = FALSE
  )
  dev.off()

# PNG格式
  png(paste(prefix, "PIratio2.png", sep = "."), width = 1000, height = 300)
  CMplot(
    ratio2_data,
    type = "p",
    plot.type = "m",
    LOG10 = FALSE,
    col = c("blue4", "orange3"),
    cex = 0.2,
    band = 0.5,
    ylab.pos = 2.5,
    cex.axis = 0.8,
    threshold = ratio2_cut,
    amplify = FALSE,
    ylab = expr(paste(pi, "_subpop2/", pi, "_subpop1")),
    file.output = FALSE
  )
  dev.off()
}

# 输出汇总表格
outtable <- select(wild_cult, ID, chr, start, end, PI.x, PI.y, rod, pi_ratio1, pi_ratio2)
outfile <- paste(prefix, "ROD_PiRatio.table", sep = ".")
write.table(outtable, file = outfile, sep = "\t", quote = FALSE, row.names = FALSE)

# 输出统计阈值
cutfile <- paste(prefix, "ROD_PiRatio.cutoff", sep = ".")
outcut <- data.frame(
  Iterm = c("ROD(1-subpop2/subpop1)", "Piratio1(subpop1/subpop2)", "Piratio2(subpop2/subpop1)"),
  cutoff = c(rod_cut, ratio1_cut, ratio2_cut)
)
write.table(outcut, file = cutfile, sep = "\t", quote = FALSE, row.names = FALSE)

应用

1.ROD

检测单一群体中近期受到正向选择的区域（如作物驯化中人工选择的基因、自然群体中适应本地环境的基因）

2.Pi-ratio

比较两个相关群体（如驯化vs野生、抗性vs敏感）的选择差异，定位与特定性状（如作物产量、抗逆性）相关的区域；
适用于检测人工选择（如育种）或生态适应（如环境压力）导致的遗传分化。

指标	核心对比对象	优势	局限性
ROD	目标区域 vs 基因组背景	适用于单一群体，检测近期选择性清扫	受基因组背景多样性波动影响大
Pi-ratio	群体A vs 群体B（同区域）	直接反映两个群体的选择差异，定位明确	依赖群体遗传背景一致性，易受群体结构干扰

注意事项

数据质量控制：需过滤低质量SNP（如MQ < 30、DP < 5）、去除缺失率高的位点（如缺失率 > 20%），避免误差；
滑动窗口选择：窗口大小需平衡分辨率与稳定性（小窗口噪音大，大窗口可能掩盖小选择区域，通常10-100kb）；
群体结构校正：若群体存在分层，需先通过PCA或Admixture去除群体结构干扰，否则会导致多样性估计偏差；
显著性阈值：需通过 permutation 或经验分布确定显著阈值（如top 5%的ROD或Pi-ratio值），避免假阳性。

ROD和Pi-ratio均通过遗传多样性的变化检测选择区域，但ROD聚焦于单一群体内的多样性减少，Pi-ratio聚焦于两个群体的多样性差异。在实际分析中，常将两者结合（如同时使用ROD和Pi-ratio筛选重叠区域），并结合其他选择信号（如Fₛₜ、iHS等），提高选择区域定位的准确性，为后续基因功能验证提供可靠候选。