GWASTools suggestions: explicit interface for GenotypeReaders,
1
0
Entering edit mode
Karl Forner ▴ 70
@karl-forner-2831
Last seen 7.9 years ago
Switzerland
Hello, Explicit interface for GenotypeReaders, -------------------------------------------------- I am a big fan of the GenotypeData object architecture, that enables to use a unique object type which can use any representation or storage of the actual genotypes thanks to its GenotypeReader concept. But from what I've seen, the different readers just stick to a common interface, that is not clearly defined. For example the method hasVariable() is not available for MatrixGenotypeReader. It is important when developing functions taking a GenotypeData as argument, to know which interface is safe to use. I believe that this is a very good example for the use of an abstract class GenotypeReader, that each specialized Reader should derive from. sorted GenotypeData --------------------------- I realized that some functions rely on the SNPs to be sorted by chromosome. In assocTestRegression() for instance, these lines of code are wrong if the chromosome are not sorted. chrom <- getChromosome(genoData) unique_chrom <- unique(chrom) nChromosomes <- max(chrom) rle_chrom <- rle(chrom) rle_chrom2 <- rep(0, nChromosomes) rle_chrom2[unique_chrom] <- rle_chrom$lengths I think that either it should be clearly stated in the function documentation that it takes a sorted genotype data as argument, or that a stronger assumption that all genotypedata must be sorted should be enforced. subset Genotype Reader --------------------------------- A useful addition would be a SubsetGenotypeReader, that would take as argument an exiting GenotypeReader instance, and lists of snpIPs and scanIDs. This reader will act as a database view on the data, and would allow to use subset data with all functions taking GenotypeData arguments. These are suggestions, and I realize that implementing them requires work, but if ever you need it I could contribute some code. Thanks for your attention Karl Forner [[alternative HTML version deleted]]
safe safe • 1.1k views
ADD COMMENT
0
Entering edit mode
@stephanie-m-gogarten-5121
Last seen 4 months ago
University of Washington
Hi Karl, I updated the documentation for the functions that assume chromosomes are in blocks (this had been on my to-do list for a while). The other things are good ideas but we don't have time for them right now. If you wanted to work on them and send us some code, we'd be happy to incorporate it. Stephanie On 12/19/13 2:20 AM, Karl Forner wrote: > Hello, > > Explicit interface for GenotypeReaders, > -------------------------------------------------- > I am a big fan of the GenotypeData object architecture, that enables to use > a unique object type which can use any representation or storage of the > actual genotypes thanks to its GenotypeReader concept. > > But from what I've seen, the different readers just stick to a common > interface, that is not clearly defined. > For example the method hasVariable() is not available for > MatrixGenotypeReader. > It is important when developing functions taking a GenotypeData as > argument, to know which interface is safe to use. > > I believe that this is a very good example for the use of an abstract class > GenotypeReader, that each specialized Reader should derive from. > > > sorted GenotypeData > --------------------------- > I realized that some functions rely on the SNPs to be sorted by chromosome. > In assocTestRegression() for instance, these lines of code are wrong if the > chromosome are not sorted. > > chrom <- getChromosome(genoData) > unique_chrom <- unique(chrom) > nChromosomes <- max(chrom) > rle_chrom <- rle(chrom) > rle_chrom2 <- rep(0, nChromosomes) > rle_chrom2[unique_chrom] <- rle_chrom$lengths > > > I think that either it should be clearly stated in the function > documentation that it takes a sorted genotype data as argument, or that a > stronger assumption that all genotypedata must be sorted should be enforced. > > > subset Genotype Reader > --------------------------------- > A useful addition would be a SubsetGenotypeReader, that would take as > argument an exiting GenotypeReader instance, and lists of snpIPs and > scanIDs. > This reader will act as a database view on the data, and would allow to use > subset data with all functions taking GenotypeData arguments. > > > > These are suggestions, and I realize that implementing them requires work, > but if ever you need it I could contribute some code. > > Thanks for your attention > > Karl Forner > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT

Login before adding your answer.

Traffic: 669 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6