Different order of samples in output of snpgdsSlidingWindow to calculate Fst and aggregation method
1
0
Entering edit mode
serpalma.v ▴ 60
@serpalmav-8912
Last seen 2.9 years ago
Germany

Hello!

I am using SNPRelate to calculate Fst for sliding windows. There are two things that I cannot find information about.

(1) If I pass a set of samples having a specific order, for example and their corresponding populations:

> samps
 [1] "H07750-L1" "H07754-L1" "H07760-L1" "H07775"    "H07762-L1" "H07782-L1"
 [7] "H07758-L1" "H07792-L1" "H07793-L1" "H07742-L1" "H07751-L1" "H07784"
[13] "H07746-L1" "H07767-L1" "H07781-L1" "H07741-L1" "H07779-L1" "H07748-L1"
[19] "H07778"    "H07773-L1"

> pops
 [1] pop1 pop1 pop1 pop1 pop1 pop1 pop1 pop1 pop1 pop1 pop2 pop2 pop2 pop2 pop2 pop2 pop2 pop2 pop2 pop2
Levels: pop1 pop2

After running the command:

res <- snpgdsSlidingWindow(genofile, winsize = 500000, shift = 250000, FUN ="snpgdsFst",sample.id = samps, population=pops, method = "W&C84")

The order of the samples is changed (sorted) in the output:

> res$sample.id
 [1] "H07741-L1" "H07742-L1" "H07746-L1" "H07748-L1" "H07750-L1" "H07751-L1"
 [7] "H07754-L1" "H07758-L1" "H07760-L1" "H07762-L1" "H07767-L1" "H07773-L1"
[13] "H07775"    "H07778"    "H07779-L1" "H07781-L1" "H07782-L1" "H07784"
[19] "H07792-L1" "H07793-L1"

I'm not sure what this means:

  • Is this the order in which samples are assigned to the argument population? --> not desired
  • res$sample.id just shows the samples that were used, but they were assigned to population as originally intended.

(2) Finally, how is the Fst window score calculated, is it the arithmetic mean of all Fst scores within?

Thanks in advance

SNPRelate • 1.1k views
ADD COMMENT
0
Entering edit mode
zhengx ▴ 30
@zhengx-7950
Last seen 5.4 years ago
United States

SNPRelate re-orders "population" internally according to the order of sample IDs. res$sample.id is the sample order in the GDS file.

If you are not sure whether the order of population is correct, you could order your input sample IDs as the order in the GDS file and provide population information according to your sample IDs.

See the function "snpgdsFst", there are two Fst (weighted Fst, mean Fst), snpgdsSlidingWindow() returns weighted Fst ("W&C84" suggests).

ADD COMMENT

Login before adding your answer.

Traffic: 504 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6