GWASTools suggestions: explicit interface for GenotypeReaders,

0

Entering edit mode

Karl Forner ▴ 70

@karl-forner-2831

Last seen 8.3 years ago

Switzerland

Hello, Explicit interface for GenotypeReaders, -------------------------------------------------- I am a big fan of the GenotypeData object architecture, that enables to use a unique object type which can use any representation or storage of the actual genotypes thanks to its GenotypeReader concept. But from what I've seen, the different readers just stick to a common interface, that is not clearly defined. For example the method hasVariable() is not available for MatrixGenotypeReader. It is important when developing functions taking a GenotypeData as argument, to know which interface is safe to use. I believe that this is a very good example for the use of an abstract class GenotypeReader, that each specialized Reader should derive from. sorted GenotypeData --------------------------- I realized that some functions rely on the SNPs to be sorted by chromosome. In assocTestRegression() for instance, these lines of code are wrong if the chromosome are not sorted. chrom <- getChromosome(genoData) unique_chrom <- unique(chrom) nChromosomes <- max(chrom) rle_chrom <- rle(chrom) rle_chrom2 <- rep(0, nChromosomes) rle_chrom2[unique_chrom] <- rle_chrom$lengths I think that either it should be clearly stated in the function documentation that it takes a sorted genotype data as argument, or that a stronger assumption that all genotypedata must be sorted should be enforced. subset Genotype Reader --------------------------------- A useful addition would be a SubsetGenotypeReader, that would take as argument an exiting GenotypeReader instance, and lists of snpIPs and scanIDs. This reader will act as a database view on the data, and would allow to use subset data with all functions taking GenotypeData arguments. These are suggestions, and I realize that implementing them requires work, but if ever you need it I could contribute some code. Thanks for your attention Karl Forner [[alternative HTML version deleted]]

safe safe • 1.2k views

ADD COMMENT • link updated 11.3 years ago by Stephanie M. Gogarten ▴ 890 • written 11.3 years ago by Karl Forner ▴ 70

0

Entering edit mode

Stephanie M. Gogarten ▴ 890

@stephanie-m-gogarten-5121

Last seen 9 months ago

University of Washington

Hi Karl, I updated the documentation for the functions that assume chromosomes are in blocks (this had been on my to-do list for a while). The other things are good ideas but we don't have time for them right now. If you wanted to work on them and send us some code, we'd be happy to incorporate it. Stephanie On 12/19/13 2:20 AM, Karl Forner wrote: > Hello, > > Explicit interface for GenotypeReaders, > -------------------------------------------------- > I am a big fan of the GenotypeData object architecture, that enables to use > a unique object type which can use any representation or storage of the > actual genotypes thanks to its GenotypeReader concept. > > But from what I've seen, the different readers just stick to a common > interface, that is not clearly defined. > For example the method hasVariable() is not available for > MatrixGenotypeReader. > It is important when developing functions taking a GenotypeData as > argument, to know which interface is safe to use. > > I believe that this is a very good example for the use of an abstract class > GenotypeReader, that each specialized Reader should derive from. > > > sorted GenotypeData > --------------------------- > I realized that some functions rely on the SNPs to be sorted by chromosome. > In assocTestRegression() for instance, these lines of code are wrong if the > chromosome are not sorted. > > chrom <- getChromosome(genoData) > unique_chrom <- unique(chrom) > nChromosomes <- max(chrom) > rle_chrom <- rle(chrom) > rle_chrom2 <- rep(0, nChromosomes) > rle_chrom2[unique_chrom] <- rle_chrom$lengths > > > I think that either it should be clearly stated in the function > documentation that it takes a sorted genotype data as argument, or that a > stronger assumption that all genotypedata must be sorted should be enforced. > > > subset Genotype Reader > --------------------------------- > A useful addition would be a SubsetGenotypeReader, that would take as > argument an exiting GenotypeReader instance, and lists of snpIPs and > scanIDs. > This reader will act as a database view on the data, and would allow to use > subset data with all functions taking GenotypeData arguments. > > > > These are suggestions, and I realize that implementing them requires work, > but if ever you need it I could contribute some code. > > Thanks for your attention > > Karl Forner > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 11.3 years ago Stephanie M. Gogarten ▴ 890

Login before adding your answer.