limma for getting DEGs from my own input matrix
1
0
Entering edit mode
@shuvadeepmaity-19471
Last seen 5.7 years ago

Hi I have normalized RNAseq data for treatment and control of a particular cell line. The matrix is the log expression ratio(12hrscon/0hrs)and (12hrstrated/0hrs). Can I use limma to get the DEGs using the my matrix as direct input file?

I have 2 replicates (each with control and treatment). I designed my matrix as follows

design.mat<-cbind(c(1,1,0,0),c(0,0,1,1)) colnames(design.mat) <- c("Treated", "Con") design.mat Treated Con [1,] 1 0 [2,] 1 0 [3,] 0 1 [4,] 0 1 and contrast.mat <- matrix(c(1,-1), ncol = 1) dimnames(contrast.mat) <- list(c("Treated","Con"), "Diff") contrast.mat Diff Treated 1 Con -1

My input matrix
Gene_name Treated Treated.1 Con Con.1 1 1110004E09Rik 0.03105187 -0.14257259 0.02448855 0.1928913 2 2010300C02Rik -0.37487930 0.09699910 0.03967733 0.1089108 3 2210016F16Rik -0.57569502 -0.56298981 -0.01678530 -0.1019228 4 2410002F23Rik -0.01191708 -0.25381992 -0.05737920 -0.1057968 5 2700097O09Rik 0.04423673 -0.13028207 0.07958317 0.1220447 6 4931406P16Rik -0.00345034 0.06384793 -0.07985070 -0.1200976

Is it okay to use limma on my matrix data ? is the design is correct?

Thanks

limma • 1.5k views
ADD COMMENT
3
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 25 minutes ago
The city by the bay

No, you should not use limma on your matrix of log-ratios. The log-ratios discard information about the mean-variance trend that is necessary for accurate modelling. For example, a log-ratio computed from large counts should be much less variable than a log-ratio computed from small counts - but if you only provide the log-ratio, the model is not able to consider the size of the counts from which they are computed, which reduces the power and/or specificity of the DE analysis. Moreover, your log-ratios were probably computed by adding some pseudo-count to avoid undefined values, which further distorts the true log-fold changes.

To avoid all of these problems, you should get your hands on the raw counts and do your analysis using voom (or edgeR, etc.). You should have 8 columns in the count matrix - 2 time points for 2 replicates for 2 treatment conditions. I'll assume that the time points were generated as one 0/12 pair per replicate (i.e., one or your replicates corresponds to a single time course), and block appropriately:

condition <- rep(c("C", "T"), each=4)
time <- rep(c("0", "12"), 4)
block <- gl(4, 2) # paired time points per replicate

design <- model.matrix(~0 + block + time:condition)
design <- design[,!grepl("time0", colnames(design))] # get to full rank
colnames(design) <- make.names(colnames(design)) # friendlier column names

The first four terms represent the blocking factors for each replicate, and are not of scientific interest. The last two terms represent the average log-fold change of expression in time 12 over 0 in the control or treatment conditions. Setting up a contrast to test for DE between control and treatment in edgeR or limma is as easy as:

con <- makeContrasts(time12.conditionT - time12.conditionC, levels=design)

Conceptually, this set-up tests the same null hypothesis as what you described in your post, except it does it better.

ADD COMMENT
0
Entering edit mode

Thank you Aaron for your explanations/help. Much Appreciated!

ADD REPLY
0
Entering edit mode

Thank you Aaron for your explanations/help. Much Appreciated!

ADD REPLY

Login before adding your answer.

Traffic: 686 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6