Question

Limma sample groups

0

Entering edit mode

nancy.nassar1590 • 0

@nancynassar1590-24240

Last seen 4.6 years ago

hello!

I am analyzing a dataset of 78 samples (39 normal + 39 tumor). In order to identify the DE genes, I tried following this code to differentiate the normal samples and the cancer samples:

colnames(ph@data)[2]="source"

ph@data

groups = ph@data$source

f = factor(groups,levels=c("normal","tumor"))

ph@data [,2] = c("normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor")

design = model.matrix(~ 0 + f)

colnames(design) = c("normal","tumor")

Could you please tell me if this is correct???

After this I used this code:

data.fit = lmFit(data.matrix,design)

But I received this warning message: Error in lm.fit(data.matrix, design) : incompatible dimensions

Could anyone please help?

microarray limma affy cancer • 1.9k views

ADD COMMENT • link updated 4.6 years ago by Gordon Smyth 52k • written 4.6 years ago by nancy.nassar1590 • 0

2

Entering edit mode

While you have supplied some code, it's not enough for anybody to help. You also appear to be directly accessing slots in an S4 object, which is almost surely not what you should be doing. If there isn't an accessor function to get at the data, you probably shouldn't be hacking around like that (and what I am talking about is things like colnames(ph@data)).

The @ function allows you to access slots in an S4 object, but that's a pretty advanced move, and if you don't know what you are doing you may well just make a mess of things.

Can you point to where you got the code you are using? Or maybe give more code so we know what you are trying to do?

ADD REPLY • link 4.6 years ago James W. MacDonald 68k

0

Entering edit mode

Thank you for your reply. I am just a beginner and I am trying to learn how to analyze microarray data on my own. I am using the code that I found on this web page (https://wiki.bits.vib.be/index.php/Analyze_your_own_microarray_data_in_R/Bioconductor#Retrieving_sample_annotation_using_affy)

There are 78 samples, and I want to identify the deferentially expressed genes between normal samples (39) and cancer samples (39). I want to tell Bioconductor which samples are normal and which samples are cancer samples, so that I will be able to compare them.

ADD REPLY • link updated 4.6 years ago by James W. MacDonald 68k • written 4.6 years ago by nancy.nassar1590 • 0

1

Entering edit mode

While what they show on that web page is technically correct, it's not what they should be teaching people. For example to get the phenotypic data or feature data, there are accessors that they barely mention, which is what you should be doing instead. So instead of doing

ph <- data@phenoData

You should be doing

ph <- pData(data)

And you shouldn't be calling an object 'data' to begin with. That's a function name, and all things equal it's better not to mask function names with object names.

But the above presupposes that you already put your phenodata in the phenoData slot of your ExpressionSet, which isn't really that common IMO for people to do. What I normally do is to start with a data.frame (the limma User's Guide calls this a 'targets' data.frame) that has all the sample information. So you have 39 tumor and 39 normal samples. I would make a data.frame that has at the very least the CEL file names and the phenotypic data. Here's an example from an analysis I have done:

!> head(samps)[,c(1,10,11)]
                            filename time treatment
 1 Lefebvre zfish 24hr control 1.CEL   24   control
 2 Lefebvre zfish 24hr control 2.CEL   24   control
 3 Lefebvre zfish 24hr control 3.CEL   24   control
 4      Lefebvre zfish 24hr DA 4.CEL   24        DA
 5      Lefebvre zfish 24hr DA 5.CEL   24        DA
 6      Lefebvre zfish 24hr DA 6.CEL   24        DA

Where I have ensured that the file name matches up with the other phenotata (treatment and time). I then read those data in using the filenames in the first column, which ensures that the samples are matched up with the phenotypic data. I can then just do something like

design <- model.matrix(~time + treatment)

or whatever, and I am ensured that the model matrix corresponds exactly to the data. But this is so because I took the time to make sure everything was good at the outset.

ADD REPLY • link 4.6 years ago James W. MacDonald 68k

score 1 · Answer 1 · 2020-09-28

1

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 4 hours ago

WEHI, Melbourne, Australia

What does

dim(data.matrix)

show?

Apparently the data matrix does not have 78 columns.

ADD COMMENT • link 4.6 years ago Gordon Smyth 52k

0

Entering edit mode

Thank you for your reply. it shows (54675 78). and dim(design) 78,2

ADD REPLY • link 4.6 years ago nancy.nassar1590 • 0

0

Entering edit mode

From what you say, the column dimension of the data.matrix matches the row dimension of the design matrix, so I don't see how you could have got the error message about incompatible dimensions.

It is all really much easier than the code tutorial you are trying to follow. I suggest you take take James' advice to go back to the beginning and follow the advice in the limma User's Guide.