Hello everyone;
I am trying to analyze Tiling array data using Starr Package
and I am stuck at reading GFF files for the 7 genomic sequences
of C. elegans. In the example that come with the vignette, a
single primordial gff file (20 lines?) is used whic is not
anywhere near the 56 MN (combined) gff files.
My question is: how do I read in multiple gff files for analysis?
among other things I have tried reading them like:
gffs <- c(file.path(dataPath,"chrI.gff"),
file.path(dataPath,"chrII.gff"), file.path(dataPath,"chrIII.gff"),
file.path(dataPath,"chrIV.gff"), file.path(dataPath,"chrV.gff"),
file.path(dataPath,"chrX.gff"))
transcriptAnno <- read.gffAnno(gffs, feature="transcript")
But none worked for me.
I would appreciate any help in getting my analysis to the next level:
FYI:
I am trying to analyze TEST vs CONTROL experession differential
on the C. elegans Tiling Array 1.0 chips.
Thanks
Dear Feseha
I am not sure whether this will solve your question, but have you
tried
cat chrI.gff chrII.gff chrIII.gff chrIV.gff chrV.gff chrX.gff >
all.gff
(on the OS command line) and then
transcriptAnno = read.gffAnno("all.gff", feature="transcript")
(in R). Alternatively, if you are so unfortunate to work with an
operating system that does not have 'cat', you could also e.g. use R's
readLines and writeLines.
Best wishes
Wolfgang
Il Mar/2/11 3:48 AM, Feseha Abebe-Akele ha scritto:
> Hello everyone;
> I am trying to analyze Tiling array data using Starr Package
> and I am stuck at reading GFF files for the 7 genomic sequences
> of C. elegans. In the example that come with the vignette, a
> single primordial gff file (20 lines?) is used whic is not
> anywhere near the 56 MN (combined) gff files.
>
> My question is: how do I read in multiple gff files for analysis?
> among other things I have tried reading them like:
>
> gffs <- c(file.path(dataPath,"chrI.gff"),
> file.path(dataPath,"chrII.gff"), file.path(dataPath,"chrIII.gff"),
> file.path(dataPath,"chrIV.gff"), file.path(dataPath,"chrV.gff"),
> file.path(dataPath,"chrX.gff"))
>
> transcriptAnno <- read.gffAnno(gffs, feature="transcript")
>
> But none worked for me.
>
> I would appreciate any help in getting my analysis to the next
level:
>
> FYI:
> I am trying to analyze TEST vs CONTROL experession differential
> on the C. elegans Tiling Array 1.0 chips.
>
> Thanks
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber
Dear Wolfgang;
"cat" indeed helped reading the GFF. However, I am still unclear about
the
feature="transcript" parameter. In the example that shipped with the
package
all entries are "transcript". In the gff I downloaded from NCBI the
same
column is populated by things like CDS, gene, tRNA etc.. Am I suposed
to
convert entries like: CDS, gene, mRNA, tRNA, snRNA ... which appear in
the
4th column of the gff in to a generic "transcript" entry or would
Starr take
them in as is with the feature="transcript" parameter and use them?
Thanks a lot.
Feseha
* Wolfgang Huber <whuber at="" embl.de=""> [Fri 04 Mar 2011 01:30:18 PM
EST]:
> Dear Feseha
>
> I am not sure whether this will solve your question, but have you
tried
>
> cat chrI.gff chrII.gff chrIII.gff chrIV.gff chrV.gff chrX.gff >
all.gff
>
> (on the OS command line) and then
>
> transcriptAnno = read.gffAnno("all.gff", feature="transcript")
>
> (in R). Alternatively, if you are so unfortunate to work with an
> operating system that does not have 'cat', you could also e.g. use
> R's readLines and writeLines.
>
> Best wishes
> Wolfgang
>
>
>
> Il Mar/2/11 3:48 AM, Feseha Abebe-Akele ha scritto:
>> Hello everyone;
>> I am trying to analyze Tiling array data using Starr Package
>> and I am stuck at reading GFF files for the 7 genomic sequences
>> of C. elegans. In the example that come with the vignette, a
>> single primordial gff file (20 lines?) is used whic is not
>> anywhere near the 56 MN (combined) gff files.
>>
>> My question is: how do I read in multiple gff files for analysis?
>> among other things I have tried reading them like:
>>
>> gffs <- c(file.path(dataPath,"chrI.gff"),
>> file.path(dataPath,"chrII.gff"), file.path(dataPath,"chrIII.gff"),
>> file.path(dataPath,"chrIV.gff"), file.path(dataPath,"chrV.gff"),
>> file.path(dataPath,"chrX.gff"))
>>
>> transcriptAnno <- read.gffAnno(gffs, feature="transcript")
>>
>> But none worked for me.
>>
>> I would appreciate any help in getting my analysis to the next
level:
>>
>> FYI:
>> I am trying to analyze TEST vs CONTROL experession differential
>> on the C. elegans Tiling Array 1.0 chips.
>>
>> Thanks
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> --
>
>
> Wolfgang Huber
> EMBL
> http://www.embl.de/research/units/genome_biology/huber
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
Dear Feseha
I would suggest omitting the 'feature' argument in your call to
'read.gffAnno' and then select those rows that you care about
yourself.
The 'Starr' maintainer might be able to provide more details in the
function's manual page, or to allow 'feature' to be a vector or a
regular expression.
Best wishes
Wolfgang
Il Mar/4/11 9:44 PM, Feseha Abebe-Akele ha scritto:
> Dear Wolfgang;
>
> "cat" indeed helped reading the GFF. However, I am still unclear
about the
> feature="transcript" parameter. In the example that shipped with the
> package
> all entries are "transcript". In the gff I downloaded from NCBI the
same
> column is populated by things like CDS, gene, tRNA etc.. Am I
suposed to
> convert entries like: CDS, gene, mRNA, tRNA, snRNA ... which appear
in the
> 4th column of the gff in to a generic "transcript" entry or would
Starr
> take
> them in as is with the feature="transcript" parameter and use them?
>
> Thanks a lot.
>
> Feseha
>
>
>
> * Wolfgang Huber <whuber at="" embl.de=""> [Fri 04 Mar 2011 01:30:18 PM
EST]:
>
>> Dear Feseha
>>
>> I am not sure whether this will solve your question, but have you
tried
>>
>> cat chrI.gff chrII.gff chrIII.gff chrIV.gff chrV.gff chrX.gff >
all.gff
>>
>> (on the OS command line) and then
>>
>> transcriptAnno = read.gffAnno("all.gff", feature="transcript")
>>
>> (in R). Alternatively, if you are so unfortunate to work with an
>> operating system that does not have 'cat', you could also e.g. use
R's
>> readLines and writeLines.
>>
>> Best wishes
>> Wolfgang
>>
>>
>>
>> Il Mar/2/11 3:48 AM, Feseha Abebe-Akele ha scritto:
>>> Hello everyone;
>>> I am trying to analyze Tiling array data using Starr Package
>>> and I am stuck at reading GFF files for the 7 genomic sequences
>>> of C. elegans. In the example that come with the vignette, a
>>> single primordial gff file (20 lines?) is used whic is not
>>> anywhere near the 56 MN (combined) gff files.
>>>
>>> My question is: how do I read in multiple gff files for analysis?
>>> among other things I have tried reading them like:
>>>
>>> gffs <- c(file.path(dataPath,"chrI.gff"),
>>> file.path(dataPath,"chrII.gff"), file.path(dataPath,"chrIII.gff"),
>>> file.path(dataPath,"chrIV.gff"), file.path(dataPath,"chrV.gff"),
>>> file.path(dataPath,"chrX.gff"))
>>>
>>> transcriptAnno <- read.gffAnno(gffs, feature="transcript")
>>>
>>> But none worked for me.
>>>
>>> I would appreciate any help in getting my analysis to the next
level:
>>>
>>> FYI:
>>> I am trying to analyze TEST vs CONTROL experession differential
>>> on the C. elegans Tiling Array 1.0 chips.
>>>
>>> Thanks
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> --
>>
>>
>> Wolfgang Huber
>> EMBL
>> http://www.embl.de/research/units/genome_biology/huber
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
--
Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber
Dear Feseha,
sorry for the late reply. I am currently on holidays for some weeks.
I am going to make the documentation more clear, regarding what is
meant
by the "feature" argument. I hope, everything works now.
Please contact me if you have any further questions on Starr.
Best,
Benedikt
Wolfgang Huber <whuber at="" embl.de=""> wrote :
> Dear Feseha
>
> I would suggest omitting the 'feature' argument in your call to
> 'read.gffAnno' and then select those rows that you care about
yourself.
>
> The 'Starr' maintainer might be able to provide more details in the
> function's manual page, or to allow 'feature' to be a vector or a
> regular expression.
>
> Best wishes
> Wolfgang
>
>
> Il Mar/4/11 9:44 PM, Feseha Abebe-Akele ha scritto:
> > Dear Wolfgang;
> >
> > "cat" indeed helped reading the GFF. However, I am
still unclear
> about the
> > feature="transcript" parameter. In the example that
shipped with
> the
> > package
> > all entries are "transcript". In the gff I downloaded
from NCBI
> the same
> > column is populated by things like CDS, gene, tRNA etc.. Am I
suposed
> to
> > convert entries like: CDS, gene, mRNA, tRNA, snRNA ... which
appear in
> the
> > 4th column of the gff in to a generic "transcript"
entry or
> would Starr
> > take
> > them in as is with the feature="transcript" parameter
and use
> them?
> >
> > Thanks a lot.
> >
> > Feseha
> >
> >
> >
> > * Wolfgang Huber <whuber at="" embl.de="">
> [Fri 04 Mar 2011 01:30:18 PM EST]:
> >
> >> Dear Feseha
> >>
> >> I am not sure whether this will solve your question, but
have you
> tried
> >>
> >> cat chrI.gff chrII.gff chrIII.gff chrIV.gff chrV.gff
chrX.gff >
> all.gff
> >>
> >> (on the OS command line) and then
> >>
> >> transcriptAnno = read.gffAnno("all.gff",
> feature="transcript")
> >>
> >> (in R). Alternatively, if you are so unfortunate to work
with an
> >> operating system that does not have 'cat', you could also
e.g. use
> R's
> >> readLines and writeLines.
> >>
> >> Best wishes
> >> Wolfgang
> >>
> >>
> >>
> >> Il Mar/2/11 3:48 AM, Feseha Abebe-Akele ha scritto:
> >>> Hello everyone;
> >>> I am trying to analyze Tiling array data using Starr
Package
> >>> and I am stuck at reading GFF files for the 7 genomic
sequences
> >>> of C. elegans. In the example that come with the
vignette, a
> >>> single primordial gff file (20 lines?) is used whic is
not
> >>> anywhere near the 56 MN (combined) gff files.
> >>>
> >>> My question is: how do I read in multiple gff files for
> analysis?
> >>> among other things I have tried reading them like:
> >>>
> >>> gffs <- c(file.path(dataPath,"chrI.gff"),
> >>> file.path(dataPath,"chrII.gff"),
> file.path(dataPath,"chrIII.gff"),
> >>> file.path(dataPath,"chrIV.gff"),
> file.path(dataPath,"chrV.gff"),
> >>> file.path(dataPath,"chrX.gff"))
> >>>
> >>> transcriptAnno <- read.gffAnno(gffs,
> feature="transcript")
> >>>
> >>> But none worked for me.
> >>>
> >>> I would appreciate any help in getting my analysis to
the next
> level:
> >>>
> >>> FYI:
> >>> I am trying to analyze TEST vs CONTROL experession
differential
> >>> on the C. elegans Tiling Array 1.0 chips.
> >>>
> >>> Thanks
> >>>
> >>> _______________________________________________
> >>> Bioconductor mailing list
> >>> Bioconductor at r-project.org
> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>> Search the archives:
> >>>
http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> >> --
> >>
> >>
> >> Wolfgang Huber
> >> EMBL
> >> http://www.embl.de/research/units/genome_biology/huber
> >>
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor at r-project.org
> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives:
> >>
http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> >
> >
>
> --
>
>
> Wolfgang Huber
> EMBL
> http://www.embl.de/research/units/genome_biology/huber
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor