Differential expression of nomalised and log2 transformed data in limma
1
0
Entering edit mode
@stinehedensted-14017
Last seen 7.3 years ago
Denmark

Hello

 

I have normalized and log2 transformed RNA microarray data from GEO, and I want to do differential gene expression analysis with the limma package in R. 

My dataset contains 25 samples of cancer og 25 samples of stroma. Each sample have 33297 gene expression values. 

I've loaded my data as: 

> Array_tumorandstroma <- read.table("25sT_og25T-samples.csv", header = T, sep = ";", stringsAsFactors = T, nrow = 100)

> Array_tumorandstroma2 <- Array_tumorandstroma[,-1]

> rownames(Array_tumorandstroma2) <- Array_tumorandstroma[,1]

 

Then I just started to design a model matrix. I am unsure, whether I can just do this right away?

> design <-model.matrix((~0+factor(c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2))))

> colnames(design) <- c("sT", "T")

> design 
   sT T
1   1 0
2   1 0
3   1 0
4   1 0
5   1 0
6   1 0
7   1 0
8   1 0
9   1 0
10  1 0
11  1 0
12  1 0
13  1 0
14  1 0
15  1 0
16  1 0
17  1 0
18  1 0
19  1 0
20  1 0
21  1 0
22  1 0
23  1 0
24  1 0
25  1 0
26  0 1
27  0 1
28  0 1
29  0 1
30  0 1
31  0 1
32  0 1
33  0 1
34  0 1
35  0 1
36  0 1
37  0 1
38  0 1
39  0 1
40  0 1
41  0 1
42  0 1
43  0 1
44  0 1
45  0 1
46  0 1
47  0 1
48  0 1
49  0 1
50  0 1
attr(,"assign")
[1] 1 1
attr(,"contrasts")
attr(,"contrasts")$`factor(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2))`
[1] "contr.treatment"

 

 

Is my matrix design correct? I am very insure about this, and nobody I my lab has experience in the limma package.  

 

When I try to do the moderated t-test, I run into trouble: 

 

> fit <- lm.fit(Array_tumorandstroma2, design)

Error in lm.fit(Array_tumorandstroma2, design) : incompatible dimensions

 

I guess, my matrix design does not fit the dimentions of my microarray dataset (Array_tumorandstroma2). And probably this is way the lm.fit function does not work. Can anyone please advise me to better matrix design? 

I also tried an other matrix design, but that resulted in the same problem: 

 

> design <- model.matrix((~0+factor(c(1,2))))

> colnames(design) <- c("sT", "T")

> design 
  sT T
1  1 0
2  0 1
attr(,"assign")
[1] 1 1
attr(,"contrasts")
attr(,"contrasts")$`factor(c(1, 2))`
[1] "contr.treatment"

> fit <- lm.fit(Array_tumorandstroma2, design)
Error in lm.fit(Array_tumorandstroma2, design) : incompatible dimensions

 

Thank you in advance for any help or suggestions!

/Stine 

limma lmfit model.matrix differential gene expression • 2.0k views
ADD COMMENT
1
Entering edit mode

It looks like you're using the wrong function.  Limma provides 'lmFit', not the lm.fit function you're using, which is something entirely different.  Also, it's probably worth turning Array_tumorandstroma2 into a numeric matrix, rather than the data.frame that read.table returns.  And for safety, I'd find some way of deriving the factor describing T and sT programmatically from the column names of the table, rather than explicitly enumerating them, or at least enumerating them in a more readable (and debug-able) fashion, e.g. rep(c("cancer", "stroma"), times=c(25,25)) - if you're really sure that the samples come in blocks, with cancer first.

ADD REPLY
0
Entering edit mode
@stinehedensted-14017
Last seen 7.3 years ago
Denmark

Hi Gavin, 

Thank you for your reply. You are totally right, I was using the wrong function. I have now turned the dataframe into a numeric matrix. I ordered all the samples in the csv file, so I know fore sure, that the 25 first samples are from cancer, and that the last 25 sampls are from stroma. 

I ran into a new problem, about my design matrix. I get the error-message:

"row dimension of design doesn't match column dimension of data object"

The dimentions of my dataobject is anf design matrix is:  

> dim(merge_mean)
[1] 100   2
> dim(design)
[1] 100   2

I don't understand what the problem is? Probably this is also due to poor understand of how the design-matrix works. 

I will be very gratefull, for your advise. 

My full coding is below, 

//Stine 

 

> #source("https://bioconductor.org/biocLite.R")

> #biocLite("limma")
> 
> #browseVignettes("limma")
> library("limma")

> #Import data
> setwd("U:/Speciale/Svitlana et al/Differential ekspression")

> Array_tumorandstroma <- read.table("25sT_og25T-samples.csv", header = T, sep = ";", stringsAsFactors = T, nrow = 100, dec = ",")

> Array_tumorandstroma2 <- Array_tumorandstroma[,-1]

> rownames(Array_tumorandstroma2) <- Array_tumorandstroma[,1]

> #Convert dataframe into matrix: 
> Array_tumorandstroma2 <- as.matrix(sapply(Array_tumorandstroma2,as.numeric))

> #Mean expressionvalue for cancer and stroma:
> mean_cancer <- rowMeans(Array_tumorandstroma2[,1:25])

> mean_stroma <- rowMeans(Array_tumorandstroma2[,26:50])

> merge_mean <- matrix(c(mean_cancer, mean_stroma), nrow = 100, ncol = 2)

> #Construct design matrix 
> design <- model.matrix((~0+factor(rep(1:2, each=50))))

> colnames(design) <- c("Cancer", "Stroma")

> #lm.fit bruges til at finde least? square model
> fit <- lmFit(merge_mean, design)
Error in lmFit(merge_mean, design) : 
  row dimension of design doesn't match column dimension of data object
ADD COMMENT
1
Entering edit mode

Your design matrix is constructed from a factor (vector) that is 100 elements long. The first 50 are "1" and the second "50" are "2", but you said yourself that you have 25 elements per group, so ... why are you doing rep(..., each=50)?

Fix it so that you have as many labels as samples ... and, btw: had you followed Gavin's advice re: extracting class labels (tumor/stroma) from column names of your samples, you wouldn't be having this problem ;-)

 

ADD REPLY

Login before adding your answer.

Traffic: 628 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6