"set genotype" function for gds?
2
0
Entering edit mode
Lna • 0
@lna-10651
Last seen 7.6 years ago

I'm using GWASTools for my GWAS analysis. It turned out that there are some genotyping problems with a few hundred SNPs on the X-chromosome. I already have all my data stored in gds files for use in GWASTools and I want to change the genotypes of the erroneous SNPs directly in the gds file. I was hoping there is kind of a "setGenotype" function with works exactly like the "getGenotype" function that I use to obtain specific SNP genotypes, but couldn't find anything like that.

Did I miss anything? How can genotype changes of specific SNPs be done in gds files without going back to raw data file and creating a new gds file based on them?

Thank you for your help!

gwastools gds • 2.2k views
ADD COMMENT
1
Entering edit mode
@stephanie-m-gogarten-5121
Last seen 6 months ago
University of Washington

You need to use the gdsfmt package to modify GDS files directly.

library(gdsfmt)

file <- system.file("extdata", "illumina_geno.gds", package="GWASdata")
gdsfile <- tempfile()
file.copy(file, gdsfile)

# open file in write mode
gdsobj <- openfn.gds(gdsfile, readonly=FALSE)

# get node of the genotype variable
node <- index.gdsn(gdsobj, "genotype")

# uncompress node so we can write to it
compression.gdsn(node, "")

# change some values for 100 SNPs
nsnp <- 100
nsamp <- 77
newsnps <- matrix(sample(c(0,1,2), size=nsnp*nsamp, replace=TRUE), nrow=nsnp, ncol=nsamp)

# write new values for SNPs 101:200
write.gdsn(node, newsnps, start=c(101,1), count=c(100,77))

# close
closefn.gds(gdsobj)
cleanup.gds(gdsfile)

file.remove(gdsfile)

 

ADD COMMENT
0
Entering edit mode

There's one thing... When I read an entry for one SNP which includes "NA"s 

(like: NA   NA   NA   NA   NA   NA   NA   NA    2    NA    NA    NA    NA    NA    NA    NA     1    NA    NA    NA     2    NA     2   2  NA    NA    NA    NA)

from a gds file and try to write it with write.gdsn to another, the "NA"s are exchanged by "0"s when I read the same data from the changed gds file
(0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 1 0 0 0 2 0 2 2 0 0 0 0)

I checked the help entry for write.gdsn, because I hoped I can choose what happens to missing entries and it says:

"GDS format does not support missing characters NA, and any NA will be converted to a blank string "". "

I don't really get it. In general there can be missing values in a gds file (that is where my data came from), but how can I set them with write.gdsn?

ADD REPLY
0
Entering edit mode

The standard missing code for genotype gds files is 3. The GWASTools getGenotype function takes care of converting a value of 3 to NA, but the gdsfmt package does not. When writing the new genotype values, set the missing genotypes to 3 before calling write.gdsn. For example:

geno[is.na(geno)] <- 3

write.gdsn(node, geno, ...)
ADD REPLY
0
Entering edit mode

Thank you! Now everything seems to work. Maybe it would make sense to include this information in the help text for write.gdsn...

ADD REPLY
0
Entering edit mode
Lna • 0
@lna-10651
Last seen 7.6 years ago

Thank you! That worked fine!

ADD COMMENT

Login before adding your answer.

Traffic: 800 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6