I am a newbie to RNAseq analysis. Briefly, I mapped the RNA-seq reads to the reference genome using STAR and I used featureCounts to quantify the reads mapped to the reference genes and found the gene counts. I removed rows that have no counts, or only a single count across all samples using the code below as it is mentioned in RNA-seq workflow
dds <- dds[ rowSums(counts(dds)) > 1, ]
Then I run the function DESeq on the raw counts.
dds <- DESeq(dds)
I would like to extract top 10 expressed genes and then I sorted the result based on baseMean and found the following genes. Is it the right approach to extrct the top expressed genes?
baseMean log2FoldChange lfcSE stat pvalue padj LUKE00000033088 24608.95 2.492 0.481 5.180 2.22038382101822E-07 5.88213544449403E-05 LUKE00000005348 23004.02 -1.559 0.366 -4.265 1.9950729451741E-05 0.001090627809135 LUKE00000038686 17687.10 -2.207 0.368 -5.993 2.06555823272055E-09 2.01779219858888E-06 LUKE010978756 9213.70 -2.064 0.354 -5.826 5.66036709013022E-09 4.21293036279692E-06 LUKE00000004836 8685.09 -1.576 0.378 -4.171 3.03873604272807E-05 0.001294153796944 LUKE00000002600 8268.26 -1.522 0.427 -3.565 0.000363547763205 0.005474230769653 LUKE010993945 7758.28 -1.873 0.495 -3.786 0.000153046441589 0.003240726698392 LUKE00000032342 7452.12 1.966 0.480 4.098 4.16481516975401E-05 0.001557322035963 LUKE00000003460 6717.35 -1.686 0.392 -4.304 1.68102625403878E-05 0.000991488315118 LUKE00000018131 6508.93 -1.722 0.402 -4.280 1.86954821598158E-05 0.00106646126335
Is it the right approach to extract the top expressed genes?
Dear Michael,
Thanks for your replay,
Are you recommending to extract the top expressed genes based on adjusted p-value rather than sorting baseMean?
Best regards,
Mel
Oh, I missed that you want to rank by expression. You should really use TPM for this, rather than counts. Counts are not proportional to expression across genes.
Is this because baseMean doesn't take into account the differences in library sizes between samples