hello!
I am analyzing a dataset of 78 samples (39 normal + 39 tumor). In order to identify the DE genes, I tried following this code to differentiate the normal samples and the cancer samples:
colnames(ph@data)[2]="source"
ph@data
groups = ph@data$source
f = factor(groups,levels=c("normal","tumor"))
ph@data [,2] = c("normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","normal","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor","tumor")
design = model.matrix(~ 0 + f)
colnames(design) = c("normal","tumor")
Could you please tell me if this is correct???
After this I used this code:
data.fit = lmFit(data.matrix,design)
But I received this warning message: Error in lm.fit(data.matrix, design) : incompatible dimensions
Could anyone please help?
While you have supplied some code, it's not enough for anybody to help. You also appear to be directly accessing slots in an S4 object, which is almost surely not what you should be doing. If there isn't an accessor function to get at the data, you probably shouldn't be hacking around like that (and what I am talking about is things like
colnames(ph@data)
).The
@
function allows you to access slots in an S4 object, but that's a pretty advanced move, and if you don't know what you are doing you may well just make a mess of things.Can you point to where you got the code you are using? Or maybe give more code so we know what you are trying to do?
Thank you for your reply. I am just a beginner and I am trying to learn how to analyze microarray data on my own. I am using the code that I found on this web page (https://wiki.bits.vib.be/index.php/Analyze_your_own_microarray_data_in_R/Bioconductor#Retrieving_sample_annotation_using_affy)
There are 78 samples, and I want to identify the deferentially expressed genes between normal samples (39) and cancer samples (39). I want to tell Bioconductor which samples are normal and which samples are cancer samples, so that I will be able to compare them.
While what they show on that web page is technically correct, it's not what they should be teaching people. For example to get the phenotypic data or feature data, there are accessors that they barely mention, which is what you should be doing instead. So instead of doing
You should be doing
And you shouldn't be calling an object 'data' to begin with. That's a function name, and all things equal it's better not to mask function names with object names.
But the above presupposes that you already put your phenodata in the phenoData slot of your
ExpressionSet
, which isn't really that common IMO for people to do. What I normally do is to start with adata.frame
(the limma User's Guide calls this a 'targets'data.frame
) that has all the sample information. So you have 39 tumor and 39 normal samples. I would make adata.frame
that has at the very least the CEL file names and the phenotypic data. Here's an example from an analysis I have done:Where I have ensured that the file name matches up with the other phenotata (treatment and time). I then read those data in using the filenames in the first column, which ensures that the samples are matched up with the phenotypic data. I can then just do something like
or whatever, and I am ensured that the model matrix corresponds exactly to the data. But this is so because I took the time to make sure everything was good at the outset.