Hello devels/users,
I'm interested in using GENESIS package for admixMapMM function to perform local ancestry association on data on a three-way admixed population. I've inferred local ancestry using RfmixV1. I know location of each SNP, name and per ancestry dosage (0,1,2) information per person. I've two doubts in pipeline I'm learning from (links at end ).
1) Doubt 1
genoDataList <- list()
for (anc in c("eur", "afr", "amer")){
gdsr <- GdsGenotypeReader(gds, genotypeVar=paste0("dosage_", anc))
genoDataList[[anc]] <- GenotypeData(gdsr, scanAnnot=scanAnnot)
}
Over here - I think (maybe I'm wrong) the code wants to extract dosage per ancestry. However, when I look at genoDataList$afr
, for example, the .gds data contain eur, as well as amer dosage. Detailed code below:
gds.afr <- GdsGenotypeReader(gds, genotypeVar="dosage_afr")
gds.afr
File: /tmp/Rtmpt27CXN/file74522ce3bdf6 (66.9M)
+ [ ] *
|--+ sample.id { Int32,factor 173 ZIP(40.9%), 283B } *
|--+ snp.id { Int32 20000 ZIP(34.6%), 27.1K }
|--+ snp.position { Int32 20000 ZIP(34.6%), 27.1K }
|--+ snp.chromosome { Int32 20000 ZIP(0.13%), 103B }
|--+ genotype { Bit2 20000x173, 844.7K } *
|--+ dosage_eur { Int32 173x20000, 13.2M }
|--+ dosage_afr { Float64 173x20000, 26.4M }
\--+ dosage_amer { Float64 173x20000, 26.4M }
2) Doubt 2
Information per marker I've is inferred using data (position and chromsome) from imputation(SHAPEIT2-IMPUTE2) which were transformed for RFmixv1 tool input data using reference panels to infer ancestry haplotypes. I don't have genotype (0,1,2) information per SNP per person. However, I have SNP IMPUTE2 dosage information from each SNP, per person for which I've inferred local ancestry. More on IMPUTE2 dosage in doubt 2.c.
The data structure employed in analysis, as shown in above example code has genotype variable. I'd like to know following things:
2.a - Is genotype information used in joint association analysis?
2.b If the genotype information isn't used then how do I create dummy genotype for each person per marker?
2.c If genotype information is used, then, how do I proceed? I can convert imputed data into plink file and proceed, but that's approximated based on posterior probabilities from IMPUTE2 files. The dosage information from IMPUTE2 range from 0-2, a quantitative value, and not the ancestry dosage(0,1,2).
2.d If genotype information is used in association analysis, do you think IMPUTE2 dosage information could be used to perform a joint analysis?
"https://www.biostat.washington.edu/sites/default/files/modules/2017SISG1211admixture_mapping.pdf" https://rdrr.io/bioc/GENESIS/man/admixMapMM.html
R tools and version info:
[1] GENESIS_2.8.1 gdsfmt_1.14.1 GWASTools_1.24.1
[4] Biobase_2.38.0 BiocGenerics_0.24.0
R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)
Dear Stephanie,
Thank you very much for your prompt and detailed reply.
Can you please help with genotype variable in gds data structure (Question 2.a 2.b and 2.c)? Or, as stated:
so, the genotype variable is of no use.
I'm creating the .gds per chromosome using an in-house script by adding variable
add.gdsn
function that's why I'm bit lost.I'll not use IMPUTE2 dosage information in admix. I'll update the package, definitely.
Thank you, again.
It's true that K-1 ancestries must be used in analysis, otherwise the matrix generated would be non-invertible. Example in manual (Page 5) uses three ancestries, that's why I used three in my data due to three-way admixture present in individuals.
Manual link: https://bioconductor.org/packages/release/bioc/manuals/GENESIS/man/GENESIS.pdf
I'm afraid I don't understand your specific question about the data structure above. The example in the manual page is somewhat misleading; if you have three ancestries total, you should only test two of them. I've updated the example here: https://github.com/UW-GAC/GENESIS/blob/master/man/admixMap.Rd. The update should appear in the release of GENESIS next week.
Hi,
Thank you for clarifying. :) Sorry, for not being clearer regarding my doubt in data structure.
gds.afr
has certain attributes such as snp.id, position and position also, genotype.Are values from
genotype
ingds.afr
used in association for admixture mapping?Hi,
Thank you for clarifying. :) Sorry, for not being clearer regarding my doubt in data structure.
gds.afr
has certain attributes such as snp.id, position and position also, genotype.Are values from
genotype
ingds.afr
used in association for admixture mapping?No, not the way you've set it up here. It is possible to use three separate GDS files instead of a combined one, where each GDS file stores the local ancestry values in the "genotype" node. Then you would create the
GdsGenotypeReader
object with the default value ofgenotypeVar="genotype"
. In that case, it would use the genotype node.Thank you very much for helping me with these doubts. :)
Is it possible to have beta in this analysis? If so, then It would be great to see it in future releases.
The "Est" column in the output is the beta estimate.
Thank you. That helps.
Is
Stat joint
Joint.beta? (hopefully this is last question. :) )When analyzing data with 3 or more ancestries (so including 2 or more ancestries in the model), the Joint.Stat and Joint.pval are the test statistic and p-value of the joint Wald test of the null hypothesis that the beta for each ancestry is 0; i.e. that there is no association for any ancestry at that variant.
In this case, the function will return an Est for each of the input ancestries. This is the beta estimate for the effect of that ancestry, relative to the reference ancestry (which is the one you did not include in the model). There isn't really an equivalent concept of a "joint beta".