Entering edit mode
Hello,
Explicit interface for GenotypeReaders,
--------------------------------------------------
I am a big fan of the GenotypeData object architecture, that enables
to use
a unique object type which can use any representation or storage of
the
actual genotypes thanks to its GenotypeReader concept.
But from what I've seen, the different readers just stick to a common
interface, that is not clearly defined.
For example the method hasVariable() is not available for
MatrixGenotypeReader.
It is important when developing functions taking a GenotypeData as
argument, to know which interface is safe to use.
I believe that this is a very good example for the use of an abstract
class
GenotypeReader, that each specialized Reader should derive from.
sorted GenotypeData
---------------------------
I realized that some functions rely on the SNPs to be sorted by
chromosome.
In assocTestRegression() for instance, these lines of code are wrong
if the
chromosome are not sorted.
chrom <- getChromosome(genoData)
unique_chrom <- unique(chrom)
nChromosomes <- max(chrom)
rle_chrom <- rle(chrom)
rle_chrom2 <- rep(0, nChromosomes)
rle_chrom2[unique_chrom] <- rle_chrom$lengths
I think that either it should be clearly stated in the function
documentation that it takes a sorted genotype data as argument, or
that a
stronger assumption that all genotypedata must be sorted should be
enforced.
subset Genotype Reader
---------------------------------
A useful addition would be a SubsetGenotypeReader, that would take as
argument an exiting GenotypeReader instance, and lists of snpIPs and
scanIDs.
This reader will act as a database view on the data, and would allow
to use
subset data with all functions taking GenotypeData arguments.
These are suggestions, and I realize that implementing them requires
work,
but if ever you need it I could contribute some code.
Thanks for your attention
Karl Forner
[[alternative HTML version deleted]]