Hi,
Is it possible to somehow have a code that can pull out several GDS
info (like a batch process) using GEOquery in a way that they can
subsequently be analyzed with SAM or Siggenes in a kind of loop?
Thanks a bunch.
~V
On Thu, Sep 1, 2011 at 5:57 AM, Voke AO <ovokeraye at="" gmail.com="">
wrote:
> Hi,
>
> Is it possible to somehow have a code that can pull out several GDS
> info (like a batch process) using GEOquery in a way that they can
> subsequently be analyzed with SAM or Siggenes in a kind of loop?
Yes. Here is a simple example. You will need to supply the code to
do any actual analysis and return the actual result, but I hope you
get the idea. I used sapply as the loop structure, but you could use
any loop structure that you like.
Hope that helps.
Sean
> gdslist = c('GDS3717','GDS3718','GDS3719')
> analysisfunc = function(gdsid) {
gdsdat = getGEO(gdsid,destdir=".")
gdseset = GDS2eSet(gdsdat)
message("DO SIGGENES STUFF HERE")
return(sprintf("Results from %s would be here",gdsid))
}
> resultlist = sapply(gdslist,analysisfunc)
File stored at:
./GDS3717.soft.gz
File stored at:
/var/folders/23/234W5ZnqHPih-
U4YppHVCU+++TI/-Tmp-//RtmpgeksBL/GPL570.annot.gz
DO SIGGENES STUFF HERE
File stored at:
./GDS3718.soft.gz
File stored at:
/var/folders/23/234W5ZnqHPih-
U4YppHVCU+++TI/-Tmp-//RtmpgeksBL/GPL1261.annot.gz
DO SIGGENES STUFF HERE
File stored at:
./GDS3719.soft.gz
File stored at:
/var/folders/23/234W5ZnqHPih-
U4YppHVCU+++TI/-Tmp-//RtmpgeksBL/GPL1319.annot.gz
DO SIGGENES STUFF HERE
There were 50 or more warnings (use warnings() to see the first 50)
> resultlist
GDS3717
GDS3718
"Results from GDS3717 would be here" "Results from GDS3718 would be
here"
GDS3719
"Results from GDS3719 would be here"
On Thu, Sep 1, 2011 at 7:40 AM, Voke AO <ovokeraye at="" gmail.com="">
wrote:
> Hi,
>
> This would work if I had the exact sample sizes and knew what
columns
> are cases or controls. I guess my question is more like...if I knew
> 100 GDS files had data related to a disease of interest, it would be
a
> vicious process to go through each one and create a cl or cls file
to
> correspond to the data. In such a situation, will it be possible to
> have a code that will
> 1. Go through the specified number of GDS files
> 2. Detect the different classes of samples and assign numbers
> accordingly for SAM/Siggenes for each GDS file, that will eventually
> be called back in the loop process for the SAM/Siggenes analysis of
> each file.
>
> I'm very much a beginner in programming so, I'm hoping I don't sound
too naive.
Unfortunately, there will be some programming involved here. As for
the class info, the phenoData slot of an ExpressionSet resulting from
a call to GDS2eSet will be fully populated. Often, there will be a
standard column "disease.state" that you could use to create the cls
object.
> gds = getGEO("GDS10")
> eset = GDS2eSet(gds)
> pData(eset)$disease.state
[1] diabetic diabetic diabetic-resistant
diabetic-resistant
[5] diabetic-resistant diabetic-resistant diabetic-resistant
diabetic-resistant
[9] diabetic-resistant diabetic-resistant nondiabetic
nondiabetic
[13] nondiabetic nondiabetic diabetic diabetic
[17] diabetic-resistant diabetic-resistant diabetic-resistant
diabetic-resistant
[21] diabetic-resistant diabetic-resistant diabetic-resistant
diabetic-resistant
[25] nondiabetic nondiabetic nondiabetic
nondiabetic
Levels: diabetic diabetic-resistant nondiabetic
Hope that helps.
Sean
> Thanks again.
>
> ~V
>
> On Thu, Sep 1, 2011 at 12:50 PM, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote:
>> On Thu, Sep 1, 2011 at 5:57 AM, Voke AO <ovokeraye at="" gmail.com="">
wrote:
>>> Hi,
>>>
>>> Is it possible to somehow have a code that can pull out several
GDS
>>> info (like a batch process) using GEOquery in a way that they can
>>> subsequently be analyzed with SAM or Siggenes in a kind of loop?
>>
>> Yes. ?Here is a simple example. ?You will need to supply the code
to
>> do any actual analysis and return the actual result, but I hope you
>> get the idea. ?I used sapply as the loop structure, but you could
use
>> any loop structure that you like.
>>
>> Hope that helps.
>>
>> Sean
>>
>>> gdslist = c('GDS3717','GDS3718','GDS3719')
>>> analysisfunc = function(gdsid) {
>> ?gdsdat = getGEO(gdsid,destdir=".")
>> ?gdseset = GDS2eSet(gdsdat)
>> ?message("DO SIGGENES STUFF HERE")
>> ?return(sprintf("Results from %s would be here",gdsid))
>> }
>>> resultlist = sapply(gdslist,analysisfunc)
>> File stored at:
>> ./GDS3717.soft.gz
>> File stored at:
>> /var/folders/23/234W5ZnqHPih-
U4YppHVCU+++TI/-Tmp-//RtmpgeksBL/GPL570.annot.gz
>> DO SIGGENES STUFF HERE
>> File stored at:
>> ./GDS3718.soft.gz
>> File stored at:
>> /var/folders/23/234W5ZnqHPih-
U4YppHVCU+++TI/-Tmp-//RtmpgeksBL/GPL1261.annot.gz
>> DO SIGGENES STUFF HERE
>> File stored at:
>> ./GDS3719.soft.gz
>> File stored at:
>> /var/folders/23/234W5ZnqHPih-
U4YppHVCU+++TI/-Tmp-//RtmpgeksBL/GPL1319.annot.gz
>> DO SIGGENES STUFF HERE
>> There were 50 or more warnings (use warnings() to see the first 50)
>>> resultlist
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? GDS3717 ? ? ? ? ? ? ? ? ? ? ? ? ? ?
?GDS3718
>> "Results from GDS3717 would be here" "Results from GDS3718 would be
here"
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? GDS3719
>> "Results from GDS3719 would be here"
>>
>
Hi,
Is it possible to somehow have a code that can pull out several GDS
info (like a batch process) using GEOquery in a way that they can
subsequently be analyzed with SAM or Siggenes in a kind of loop?
Thanks a bunch.
~V