Dear Ladies and Gentlemen,
I have a question regarding the two Bioconductor packages "affy" and "agilp". For my work I need to use expression data from Agilent software in a .txt file and then use this data to create tissue specific models with the help of the COBRA toolbox.
To use the COBRA function "createTissueSpecificModel" you need to have the present/absent calls of the genes in the respective probe/data. For expression data from Affymetrix .CEL files you can use the Bioconductor package "affy" and then use the commands "ReadAffy" -> "mas5calls" -> "exprs" to get a ExpressionSet of the present/absent genes in the respective .CEL file.
Now my question: Is there something similar for Agilent's .txt files within the Bioconductor package "agilp"? If yes, could someone please explain to me how to get the same output as with a .CEL file, namely present/absent calls of the genes?
Thank you for your help.
Kind regards,
Maximilian
Dear Mr. MacDonald,
thank you for your response. The Agilent platform design files I am referring to are accessible on the GEO database under accession number GPL18948:
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL18948
An example for the resulting RAW data is accessible on the GEO database under accession number GSM1436498:
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM1436498
In this context, what are the foreground and the background probes you are referring to?
Thank you for your help.
Kind regards,
Maximilian
Well, the summarized data that people load onto GEO is sometimes less than useful, and this looks like a case of that. But they were kind enough to load up the raw data as well, so you can start from scratch.
That's not helpful. But they do give you a better set of annotations, so let's use that.
So we have too many rows in our new annotations. Let's fix that.
This uses the idea that if we take the row and column and
paste
them together, we get a unique identifier that we canmatch
on (e.g., we get 1_1, 1_2, etc, and they uniquely identify each spot). So I just paste them together and usematch
to subset and reorder the 'bettergns' data.frame. I can then just put that into the 'dat' object.Now we know which are positive and which are negative genes, and you can hypothetically then come up with a measure of present/absent to use.
You may also want to have a look at the library SCAN.UPC, specifically the function
UPC_TwoColor()
. This will calculate a Universal exPression Codes (UPC) score for two-channel microarrays, which estimates the probability a probe is "active" (expressed) in a sample.Dear Mr. Hooiveld,
thank you very much for your suggestion with the package "scan.upc".
I read the manual and found the sample code but because I am a beginner with Bioconductor and R in general I just want to ask you again: all I need to give in is
and then I get a matrix with the presence/abscence calls of the corresponding genes?
I tried to run it like this with the input file 'GSM1436498_Parent_replicate1.txt' from
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM1436498
which is a simple RAW Agilent .txt file but then I get this error:
Can you please tell me what I am doing wrong because I have no idea. Thank you for your help.
Kind regards,
Maximilian
Well, this is apparently a single-color Agilent experiment, and as is stated in the UPC.SCAN vignette (PDF; section 3, top page 5): "This package does not yet support normalizing Agilent one-color arrays."
To calculate a UPC score for a single-color array, you could switch to the function UPC_Generic(). Be sure to check the help pages for this function!
Using the object
dat
that is generated by the code James posted above, you could do something like this:I noticed some warning messages are returned, but AFAIK these can be ignored (I am not sure about this since I never used this for Agilent arrays...) Otherwise modify the default value for the parameter
convThreshold.