Question

Genesis: doubts with admixMAPMM

0

Entering edit mode

GENOMIC_region • 0

@genomic_region-13050

Last seen 4.2 years ago

Hello devels/users,

I'm interested in using GENESIS package for admixMapMM function to perform local ancestry association on data on a three-way admixed population. I've inferred local ancestry using RfmixV1. I know location of each SNP, name and per ancestry dosage (0,1,2) information per person. I've two doubts in pipeline I'm learning from (links at end ).

1) Doubt 1

genoDataList <- list()
for (anc in c("eur", "afr", "amer")){
  gdsr <- GdsGenotypeReader(gds, genotypeVar=paste0("dosage_", anc))
  genoDataList[[anc]] <- GenotypeData(gdsr, scanAnnot=scanAnnot)
}

Over here - I think (maybe I'm wrong) the code wants to extract dosage per ancestry. However, when I look at genoDataList$afr, for example, the .gds data contain eur, as well as amer dosage. Detailed code below:

 gds.afr <- GdsGenotypeReader(gds, genotypeVar="dosage_afr")
 gds.afr
File: /tmp/Rtmpt27CXN/file74522ce3bdf6 (66.9M)
+    [  ] *
|--+ sample.id   { Int32,factor 173 ZIP(40.9%), 283B } *
|--+ snp.id   { Int32 20000 ZIP(34.6%), 27.1K }
|--+ snp.position   { Int32 20000 ZIP(34.6%), 27.1K }
|--+ snp.chromosome   { Int32 20000 ZIP(0.13%), 103B }
|--+ genotype   { Bit2 20000x173, 844.7K } *
|--+ dosage_eur   { Int32 173x20000, 13.2M }
|--+ dosage_afr   { Float64 173x20000, 26.4M }
\--+ dosage_amer   { Float64 173x20000, 26.4M }

2) Doubt 2

Information per marker I've is inferred using data (position and chromsome) from imputation(SHAPEIT2-IMPUTE2) which were transformed for RFmixv1 tool input data using reference panels to infer ancestry haplotypes. I don't have genotype (0,1,2) information per SNP per person. However, I have SNP IMPUTE2 dosage information from each SNP, per person for which I've inferred local ancestry. More on IMPUTE2 dosage in doubt 2.c.

The data structure employed in analysis, as shown in above example code has genotype variable. I'd like to know following things:

2.a - Is genotype information used in joint association analysis?

2.b If the genotype information isn't used then how do I create dummy genotype for each person per marker?

2.c If genotype information is used, then, how do I proceed? I can convert imputed data into plink file and proceed, but that's approximated based on posterior probabilities from IMPUTE2 files. The dosage information from IMPUTE2 range from 0-2, a quantitative value, and not the ancestry dosage(0,1,2).

2.d If genotype information is used in association analysis, do you think IMPUTE2 dosage information could be used to perform a joint analysis?

"https://www.biostat.washington.edu/sites/default/files/modules/2017SISG1211admixture_mapping.pdf" https://rdrr.io/bioc/GENESIS/man/admixMapMM.html

R tools and version info:

[1] GENESIS_2.8.1       gdsfmt_1.14.1       GWASTools_1.24.1
[4] Biobase_2.38.0      BiocGenerics_0.24.0
R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

genesis admixmap • 3.4k views

ADD COMMENT • link updated 6.0 years ago by Stephanie M. Gogarten ▴ 890 • written 6.0 years ago by GENOMIC_region • 0

score 2 · Accepted Answer · 2019-04-24

2

Entering edit mode

Stephanie M. Gogarten ▴ 890

@stephanie-m-gogarten-5121

Last seen 9 months ago

University of Washington

The admixMapMM function is deprecated, and has been replaced by the admixMap function in current versions of GENESIS. However, I think your questions apply equally to the old and new versions.

The show method for a GdsGenotypeReader object displays all the information in the GDS file, regardless of which node is chosen to read genotypes from. When you call the getGenotype method on genoDataList$afr, it will read from the dosage_afr node.
The admixMapMM and admixMap functions will only use data from the nodes you specify with the genotypeVar argument when you create the GdsGenotypeReader object, as in the example. It looks like you have stored RFMix local ancestry dosages in the "dosageafr", "dosageeur", and "dosage_amer" nodes, which is exactly what the function expects. It will perform a joint test across as many ancestries as you provide in the genoDataList object, which should perhaps be two instead of three, if you only have three ancestries in total. Quoting one of my GENESIS co-authors: "The reason for this is that the sum of all local ancestries at a particular locus must add up to 2, so if there are K ancestry groups, then only K-1 genotypes can be included since one local ancestry count can be written as a linear combination of all of the other local ancestry counts, resulting in collinearity and a matrix that won't be invertible."

The IMPUTE2 genotype dosage should not be used in the admixMap function, but can be used to test for association using assocTestSingle in current versions of GENESIS (or assocTestMM in the version you are using, but please be aware that we can no longer provide support for that version).

ADD COMMENT • link 6.0 years ago Stephanie M. Gogarten ▴ 890

0

Entering edit mode

Dear Stephanie,

Thank you very much for your prompt and detailed reply.
Can you please help with genotype variable in gds data structure (Question 2.a 2.b and 2.c)? Or, as stated:

The admixMapMM and admixMap functions will only use data from the nodes you specify with the genotypeVar argument

so, the genotype variable is of no use.
I'm creating the .gds per chromosome using an in-house script by adding variable add.gdsn function that's why I'm bit lost.

I'll not use IMPUTE2 dosage information in admix. I'll update the package, definitely.

Thank you, again.

ADD REPLY • link 6.0 years ago GENOMIC_region • 0

0

Entering edit mode

It's true that K-1 ancestries must be used in analysis, otherwise the matrix generated would be non-invertible. Example in manual (Page 5) uses three ancestries, that's why I used three in my data due to three-way admixture present in individuals.

dosage_eur <- sample(0:2, nsnp*nsamp, replace=TRUE)
dosage_afr <- ifelse(dosage_eur == 2, 0, sample(0:1, nsnp*nsamp, replace=TRUE))
dosage_amer <- 2 - dosage_eur - dosage_afr
....

genoDataList <- list()
for (anc in c("eur", "afr", "amer")){

Manual link: https://bioconductor.org/packages/release/bioc/manuals/GENESIS/man/GENESIS.pdf

ADD REPLY • link 6.0 years ago GENOMIC_region • 0

1

Entering edit mode

I'm afraid I don't understand your specific question about the data structure above. The example in the manual page is somewhat misleading; if you have three ancestries total, you should only test two of them. I've updated the example here: https://github.com/UW-GAC/GENESIS/blob/master/man/admixMap.Rd. The update should appear in the release of GENESIS next week.

ADD REPLY • link 6.0 years ago Stephanie M. Gogarten ▴ 890

0

Entering edit mode

Hi,
Thank you for clarifying. :) Sorry, for not being clearer regarding my doubt in data structure.

gds.afr has certain attributes such as snp.id, position and position also, genotype.

gds.afr <- GdsGenotypeReader(gds, genotypeVar="dosage_afr")
 gds.afr
File: /tmp/Rtmpt27CXN/file74522ce3bdf6 (66.9M)
+    [  ] *
|--+ genotype   { Bit2 20000x173, 844.7K } *

Are values from genotype in gds.afr used in association for admixture mapping?

ADD REPLY • link 6.0 years ago GENOMIC_region • 0

0

Entering edit mode

Hi,
Thank you for clarifying. :) Sorry, for not being clearer regarding my doubt in data structure.

gds.afr has certain attributes such as snp.id, position and position also, genotype.

gds.afr <- GdsGenotypeReader(gds, genotypeVar="dosage_afr")
 gds.afr
File: /tmp/Rtmpt27CXN/file74522ce3bdf6 (66.9M)
+    [  ] *
|--+ genotype   { Bit2 20000x173, 844.7K } *

Are values from genotype in gds.afr used in association for admixture mapping?

ADD REPLY • link 6.0 years ago GENOMIC_region • 0

1

Entering edit mode

No, not the way you've set it up here. It is possible to use three separate GDS files instead of a combined one, where each GDS file stores the local ancestry values in the "genotype" node. Then you would create the GdsGenotypeReader object with the default value of genotypeVar="genotype". In that case, it would use the genotype node.

ADD REPLY • link 6.0 years ago Stephanie M. Gogarten ▴ 890

0

Entering edit mode

Thank you very much for helping me with these doubts. :)

Is it possible to have beta in this analysis? If so, then It would be great to see it in future releases.

ADD REPLY • link 6.0 years ago GENOMIC_region • 0

1

Entering edit mode

The "Est" column in the output is the beta estimate.

ADD REPLY • link 6.0 years ago Stephanie M. Gogarten ▴ 890

0

Entering edit mode

Thank you. That helps.

Is Stat joint Joint.beta? (hopefully this is last question. :) )

ADD REPLY • link 6.0 years ago GENOMIC_region • 0

1

Entering edit mode

When analyzing data with 3 or more ancestries (so including 2 or more ancestries in the model), the Joint.Stat and Joint.pval are the test statistic and p-value of the joint Wald test of the null hypothesis that the beta for each ancestry is 0; i.e. that there is no association for any ancestry at that variant.

In this case, the function will return an Est for each of the input ancestries. This is the beta estimate for the effect of that ancestry, relative to the reference ancestry (which is the one you did not include in the model). There isn't really an equivalent concept of a "joint beta".

ADD REPLY • link 5.9 years ago mconomos ▴ 70