Question

How to find Top 10 expressed genes in DESeq2

0

Entering edit mode

melkile26 • 0

@melkile26-16646

Last seen 5.8 years ago

I am a newbie to RNAseq analysis. Briefly, I mapped the RNA-seq reads to the reference genome using STAR and I used featureCounts to quantify the reads mapped to the reference genes and found the gene counts. I removed rows that have no counts, or only a single count across all samples using the code below as it is mentioned in RNA-seq workflow

dds <- dds[ rowSums(counts(dds)) > 1, ]

Then I run the function DESeq on the raw counts.

dds <- DESeq(dds)

I would like to extract top 10 expressed genes and then I sorted the result based on baseMean and found the following genes. Is it the right approach to extrct the top expressed genes?

baseMean    log2FoldChange  lfcSE   stat    pvalue  padj LUKE00000033088    24608.95    2.492   0.481   5.180   2.22038382101822E-07    5.88213544449403E-05 LUKE00000005348    23004.02    -1.559  0.366   -4.265  1.9950729451741E-05 0.001090627809135 LUKE00000038686   17687.10    -2.207  0.368   -5.993  2.06555823272055E-09    2.01779219858888E-06 LUKE010978756  9213.70 -2.064  0.354   -5.826  5.66036709013022E-09    4.21293036279692E-06 LUKE00000004836    8685.09 -1.576  0.378   -4.171  3.03873604272807E-05    0.001294153796944 LUKE00000002600   8268.26 -1.522  0.427   -3.565  0.000363547763205   0.005474230769653 LUKE010993945 7758.28 -1.873  0.495   -3.786  0.000153046441589   0.003240726698392 LUKE00000032342   7452.12 1.966   0.480   4.098   4.16481516975401E-05    0.001557322035963 LUKE00000003460   6717.35 -1.686  0.392   -4.304  1.68102625403878E-05    0.000991488315118 LUKE00000018131   6508.93 -1.722  0.402   -4.280  1.86954821598158E-05    0.00106646126335

Is it the right approach to extract the top expressed genes?

deseq2 baseMean differential gene expression rnaseqgene • 8.7k views

ADD COMMENT • link updated 5.8 years ago by swbarnes2 ★ 1.4k • written 5.8 years ago by melkile26 • 0

1

Entering edit mode

swbarnes2 ★ 1.4k

@swbarnes2-14086

Last seen 2 days ago

San Diego

This will give you the genes with the highest number of counts, but for some library preps, count numbers for a gene can be related to the length of the transcript, so you would want to correct for that if it was a factor with your library prep.

ADD COMMENT • link 5.8 years ago swbarnes2 ★ 1.4k

score 2 · Accepted Answer · 2019-02-04

2

Entering edit mode

Michael Love 43k

@mikelove

Last seen 3 days ago

United States

We have some code in the workflow:

https://bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html

Basically:

head(res[order(res$padj),], 10)

ADD COMMENT • link 5.8 years ago Michael Love 43k

0

Entering edit mode

Dear Michael,

Thanks for your replay,

Are you recommending to extract the top expressed genes based on adjusted p-value rather than sorting baseMean?

Best regards,

Mel

ADD REPLY • link 5.8 years ago melkile26 • 0

0

Entering edit mode

Oh, I missed that you want to rank by expression. You should really use TPM for this, rather than counts. Counts are not proportional to expression across genes.

ADD REPLY • link 5.8 years ago Michael Love 43k

0

Entering edit mode

Is this because baseMean doesn't take into account the differences in library sizes between samples

ADD REPLY • link 5.8 years ago melkile26 • 0