bulk RNA-Seq (5)差异分析

54 阅读1分钟

差异表达分析用于比较两个样本中同一个基因的的表达量是否存在差异。用到的统计方法是假设检验,所以样本需要重复。
常用的软件包括DESeq2、edgeR,这里推荐使用trinity软件包的一个程序run_DE_analysis.pl,安装方法:conda install trinity

run_DE_analysis.pl

#!/usr/bin/env perl`` ``use strict;``use warnings;``use Carp;``use Getopt::Long qw(:config no_ignore_case bundling pass_through);``use Cwd;``use FindBin;``use File::Basename;``use lib ("$FindBin::RealBin/../../PerlLib");``use Fasta_reader;``use Data::Dumper;`` `` ``my $ROTS_B = 500;``my $ROTS_K = 5000;`` `` ``my $usage = <<__EOUSAGE__;`` `` ``#################################################################################################``#``# Required:``#``# --matrix|m <string> matrix of raw read counts (not normalized!)``#``# --method <string> edgeR|DESeq2|voom|ROTS``# note: you should have biological replicates.``# edgeR will support having no bio replicates with``# a fixed dispersion setting.``#``# Optional:``#``# --samples_file|s <string> tab-delimited text file indicating biological replicate relationships.``# ex.``# cond_A cond_A_rep1``# cond_A cond_A_rep2``# cond_B cond_B_rep1``# cond_B cond_B_rep2``#``#``# General options:``#``# --min_rowSum_counts <int> default: 2 (only those rows of matrix meeting requirement will be tested)``#``# --output|o name of directory to place outputs (default: $method.$pid.dir)``#``# --reference_sample <string> name of a sample to which all other samples should be compared.``# (default is doing all pairwise-comparisons among samples)``#``# --contrasts <string> file (tab-delimited) containing the pairs of sample comparisons to perform.``# ex.``# cond_A cond_B``# cond_Y cond_Z``#``#``###############################################################################################``#``# ## EdgeR-related parameters``# ## (no biological replicates)``#``# --dispersion <float> edgeR dispersion value (Read edgeR manual to guide your value choice)``# http://www.bioconductor.org/packages/release/bioc/html/edgeR.html``# ## ROTS parameters``# --ROTS_B <int> : number of bootstraps and permutation resampling (default: $ROTS_B)``# --ROTS_K <int> : largest top genes size (default: $ROTS_K)``#``#``###############################################################################################``#``# Documentation and manuals for various DE methods. Please read for more advanced and more``# fine-tuned DE analysis than provided by this helper script.``#``# edgeR: http://www.bioconductor.org/packages/release/bioc/html/edgeR.html``# DESeq2: http://bioconductor.org/packages/release/bioc/html/DESeq2.html``# voom/limma: http://bioconductor.org/packages/release/bioc/html/limma.html``# ROTS: http://www.btk.fi/research/research-groups/elo/software/rots/``#``###############################################################################################

脚本用法: perl anaconda3/opt/trinity-2.1.1/Analysis/DifferentialExpression/run_DE_analysis.pl \ #你的路径`` --matrix genes.counts.matrix \ #原始count矩阵`` --method DESeq2 \ #差异分析的软件`` --samples_file sample.txt \ #分组的样本信息表

输入:read count矩阵 输出:差异分析结果

image.png