The devel version of GENESIS includes a vignette that may be helpful: http://bioconductor.org/packages/devel/bioc/vignettes/GENESIS/inst/doc/assoc_test_seq.html
1) How many SNPs should be used for KING relationship matrix?
We recommend LD pruning to select SNPs. The SNPRelate function snpgdsLDpruning
can be used for this. We usually set a minor allele frequency threshold in the pruning function to eliminate rare variants. After pruning, we usually end up with 200,000 - 300,000 SNPs.
2) What is the best way to select number of PCs to be used as covariates?
You want to select PCs that are informative for distinguishing populations. A good way to do this is make a parallel coordinates color-coded by population or self-identified race, as illustrated in the vignette. Look for the last PC that separates groups of colors instead of looking like noise.
3) How many SNPs should be used to estimate PCs using PC-Air and PC-Relate method?
The recommendations for LD pruning apply here also. We often do another round of LD pruning using only unrelated samples (selected with the pcairPartition
function).
4) Does the association model take care of NA in the phenotype data or the samples need to be removed before performing association?
fitNullModel
will remove any samples with NA
in the phenotype data prior to fitting the null model. However, I recommend explictly selecting non-missing samples with the sample.id
argument, because it makes it much easier to keep track of exactly how many samples are being used in your analysis and reduces the possibility of errors.