I have 2 replicates (each with control and treatment). I designed my matrix as follows

Question

limma for getting DEGs from my own input matrix

0

Entering edit mode

shuvadeep.maity • 0

@shuvadeepmaity-19471

Last seen 6.1 years ago

Hi I have normalized RNAseq data for treatment and control of a particular cell line. The matrix is the log expression ratio(12hrscon/0hrs)and (12hrstrated/0hrs). Can I use limma to get the DEGs using the my matrix as direct input file?

I have 2 replicates (each with control and treatment). I designed my matrix as follows

design.mat<-cbind(c(1,1,0,0),c(0,0,1,1)) colnames(design.mat) <- c("Treated", "Con") design.mat Treated Con [1,] 1 0 [2,] 1 0 [3,] 0 1 [4,] 0 1 and contrast.mat <- matrix(c(1,-1), ncol = 1) dimnames(contrast.mat) <- list(c("Treated","Con"), "Diff") contrast.mat Diff Treated 1 Con -1

My input matrix
Gene_name Treated Treated.1 Con Con.1 1 1110004E09Rik 0.03105187 -0.14257259 0.02448855 0.1928913 2 2010300C02Rik -0.37487930 0.09699910 0.03967733 0.1089108 3 2210016F16Rik -0.57569502 -0.56298981 -0.01678530 -0.1019228 4 2410002F23Rik -0.01191708 -0.25381992 -0.05737920 -0.1057968 5 2700097O09Rik 0.04423673 -0.13028207 0.07958317 0.1220447 6 4931406P16Rik -0.00345034 0.06384793 -0.07985070 -0.1200976

Is it okay to use limma on my matrix data ? is the design is correct?

Thanks

limma • 1.6k views

ADD COMMENT • link updated 6.3 years ago by Aaron Lun ★ 28k • written 6.3 years ago by shuvadeep.maity • 0

score 3 · Answer 1 · 2019-01-15

No, you should not use limma on your matrix of log-ratios. The log-ratios discard information about the mean-variance trend that is necessary for accurate modelling. For example, a log-ratio computed from large counts should be much less variable than a log-ratio computed from small counts - but if you only provide the log-ratio, the model is not able to consider the size of the counts from which they are computed, which reduces the power and/or specificity of the DE analysis. Moreover, your log-ratios were probably computed by adding some pseudo-count to avoid undefined values, which further distorts the true log-fold changes.

To avoid all of these problems, you should get your hands on the raw counts and do your analysis using voom (or edgeR, etc.). You should have 8 columns in the count matrix - 2 time points for 2 replicates for 2 treatment conditions. I'll assume that the time points were generated as one 0/12 pair per replicate (i.e., one or your replicates corresponds to a single time course), and block appropriately:

condition <- rep(c("C", "T"), each=4)
time <- rep(c("0", "12"), 4)
block <- gl(4, 2) # paired time points per replicate

design <- model.matrix(~0 + block + time:condition)
design <- design[,!grepl("time0", colnames(design))] # get to full rank
colnames(design) <- make.names(colnames(design)) # friendlier column names

The first four terms represent the blocking factors for each replicate, and are not of scientific interest. The last two terms represent the average log-fold change of expression in time 12 over 0 in the control or treatment conditions. Setting up a contrast to test for DE between control and treatment in edgeR or limma is as easy as:

con <- makeContrasts(time12.conditionT - time12.conditionC, levels=design)

Conceptually, this set-up tests the same null hypothesis as what you described in your post, except it does it better.