comparing multiple individual genomes in bioc?
1
0
Entering edit mode
Paul Shannon ★ 1.1k
@paul-shannon-578
Last seen 10.4 years ago
I am becoming acquainted with the bioc packages helpful in DNA sequence analysis. There is lots of nice stuff. We (like the rest of the world...) hope to soon have many individual human genomes. We wish to compare them, looking for fine-grained variations in intragenic and extragenic regions, for clues to phenotypic variety. Is there support for this kind of analysis in bioc? If not, is this planned or hoped for? Thanks - - Paul Shannon
genomes genomes • 767 views
ADD COMMENT
0
Entering edit mode
@kasper-daniel-hansen-2979
Last seen 19 months ago
United States
This is a rather difficult question to answer because no-one has really done what you are proposing to do. It is true that many people hope to be able to do something like this, but it is unclear how much data we are talking about, what form the data is in and finally what kind of stuff we want to do with the data. Without a more clear specification it is pretty hard to answer this. There is for example a difference in whether the data is just a list of SNPs, a collection of genomes in FASTA format or a (big) collection of short reads that needs to be assembled. Given that no-one has done this, it is clear that the first attempts will involve a lot of custom code, so don't expect any off-the-shelf method for any suite of software. I will say that Bioconductor is a priori not more nor less suitable for this than any other piece of software. Kasper On Jan 27, 2009, at 9:21 , Paul Shannon wrote: > I am becoming acquainted with the bioc packages helpful in DNA > sequence analysis. There is lots of nice stuff. > > We (like the rest of the world...) hope to soon have many individual > human genomes. We wish to compare them, looking for fine-grained > variations in intragenic and extragenic regions, for clues to > phenotypic variety. > > Is there support for this kind of analysis in bioc? If not, is this > planned or hoped for? > > Thanks - > > - Paul Shannon > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Kasper Daniel Hansen wrote: > This is a rather difficult question to answer because no-one has really > done what you are proposing to do. It is true that many people hope to > be able to do something like this, but it is unclear how much data we > are talking about, what form the data is in and finally what kind of > stuff we want to do with the data. Without a more clear specification it > is pretty hard to answer this. > > There is for example a difference in whether the data is just a list of > SNPs, a collection of genomes in FASTA format or a (big) collection of > short reads that needs to be assembled. > > Given that no-one has done this, it is clear that the first attempts > will involve a lot of custom code, so don't expect any off-the-shelf > method for any suite of software. I will say that Bioconductor is a > priori not more nor less suitable for this than any other piece of > software. Hmm. Some of the Bioc infrastructure might make for great building blocks. The BSgenome package has facilities for dealing with genome-scale data, especially reference genomes. Biostrings tools for custom and comparatively fast pattern matching seem very suitable to exploratory analysis. The conceptual foundation of IRanges and Rle classes seem well-suited to efficient representation of genome-scale 'features of interest' coupled with a flexibility for investigating genome-scale questions. For instance, if SNPs were represented as RLEs, it would be straight-forward to summarize site-specific SNP abundance (just add the RLEs using '+') in memory-efficient ways. Likewise the overlap function of IRanges might provide a very useful tool for rapidly filtering per-genome annotations to identify features that are shared across samples. Plus the usual arguments for R viz., established statistical and visualization tools and ready interface to data bases, web resources, etc. Agreed though that the question is open ended and therefore hard to answer. It would be really exciting to here use cases sketched out, here or on the bioc-sig-sequencing mailing list. Martin > Kasper > > On Jan 27, 2009, at 9:21 , Paul Shannon wrote: > >> I am becoming acquainted with the bioc packages helpful in DNA >> sequence analysis. There is lots of nice stuff. >> >> We (like the rest of the world...) hope to soon have many individual >> human genomes. We wish to compare them, looking for fine-grained >> variations in intragenic and extragenic regions, for clues to >> phenotypic variety. >> >> Is there support for this kind of analysis in bioc? If not, is this >> planned or hoped for? >> >> Thanks - >> >> - Paul Shannon >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793
ADD REPLY

Login before adding your answer.

Traffic: 703 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6