How to convert TMM RNA-seq file (generated by edgeR) to list of differentially expressed genes
1
0
Entering edit mode
@hamidrezarazzaghian-9208
Last seen 3.1 years ago
Canada

Hi,

I have a TMM (trimmed mean of M values) CSV file of whole RNA sequencing (generated by edgeR package) with two groups (group A with 27 samples and group B with 18 samples). Each samples is in one column and each gene is in one row and all of the columns have header. This file include the list of the genes (in one column) plus values for each gene in each sample. Using this file in R, I want to get the list of differentially expressed genes in group B compared to group A and with correction for multiple testing with Benjamini-Hochberg method (FDR <0.05). The output file should have log fold change, p value and adjusted p value for each of the differentially expressed genes.

I'm wondering can anyone please share the code for this procedure with me?

Here is how the data look like (samples 1 to 27 belong to group A and samples 28 to 45 belong to group B):

  1. gene | sample1 | sample2 | sample3 |............| sample 27 | sample 28 |........| sample 45 |
  2. TSPAN | -2.994 | -0.651 | 0.274 |............| 2.352 | 1.523 |.........| -2.486 |
  3. LAP3 | 3.545 | -1.545 | 2.450 |............| 1.298 | -1.476 |.........| 1.987 |
  4. ALS2 | -1.910 | -2.224 | -1.720 |............| -1.758 | 1.368 |.........| 2.154 |
edgeR DifferentialExpression TMM RNASeq • 1.1k views
ADD COMMENT
0
Entering edit mode

Is this normalized data not suitable for use with edgeR?

ADD REPLY
0
Entering edit mode
@gordon-smyth
Last seen 4 minutes ago
WEHI, Melbourne, Australia

I'm the senior author of the edgeR package but I don't know what you mean by a "TMM csv file". TMM normalizes the library sizes rather than individual expression values. Do you perhaps mean that you have used edgeR's cpm function to generate log-CPM values?

Anyway, edgeR analyses read counts rather than cpm values. Just follow one of the sample workflows, for example:

or else follow the edgeR User's Guide.

Personally I use Rsubread::align followed by Rsubread::featureCounts to generate counts as input to edgeR.

If you really only have log-CPM values and not the original counts, then the limma-trend pipeline could be used instead of edgeR to get DE genes.

ADD COMMENT

Login before adding your answer.

Traffic: 638 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6