Question

Importing affymetrix genotype calls with GWASTools

0

Entering edit mode

Vinicius Henrique da Silva ▴ 40

@vinicius-henrique-da-silva-6713

Last seen 22 months ago

Brazil

I am trying to import affymetrix genotype call data (-1, 0, 1 and 2) using createDataFile from GWASTools package. Follow my code and the error that I am getting:

library(GWASTools)
snp.anno <-   'snpID chromosome position      snpName
  AX-100676796          1   501997 AX-100676796
  AX-100120875          1   503822 AX-100120875
  AX-100067350          1   504790 AX-100067350'
snp.anno <- read.table(text=snp.anno, header=T)
signals <-  'probeset_id    sample1.cel  sample2.cel   sample3.cel
  AX-100676796-A   2126.7557   1184.8638  1134.2687
  AX-100676796-B   427.1864  2013.8512   1495.0654
  AX-100120875-A   1775.5816 2013.8512  651.1691
  AX-100120875-B    335.9226  2013.8512  1094.7429
  AX-100067350-A   2365.7755  2695.0053  2758.1739
  AX-100067350-B    2515.4818   2518.2818  28181.289 '
p1summ <- read.table(text=signals, header=T)
write.table(p1summ, "del.txt", sep="\t", col.names=T, row.names=F, quote=F)

### Make Scan
mdf <- p1summ
names <- as.data.frame(names(mdf))
names <- as.data.frame(names[-1,])
colnames(names) <- "scanName"
names$scanID <- 1:nrow(names)
names$file <- "del.txt"
scan.anno <- subset(names, select = c(scanID, scanName, file))
scan.anno$scanName <- gsub(".cel", "", scan.anno$scanName)

#scan.anno <- data.frame(scanID=1L, scanName="sample1", file="del.txt")
snp.anno$snpID <- 1:nrow(snp.anno)

p1summ <- createAffyIntensityFile(path=".", filename="tmp.gds", snp.annotation=snp.anno, scan.annotation=scan.anno, verbose=FALSE)
p1summ

(gds <- GdsIntensityReader("tmp.gds"))

getX(gds)

### Creating genotype files

geno <-  'probeset_id    sample1.cel  sample2.cel   sample3.cel
  AX-100676796   1   0  1
  AX-100120875   2 1  0
  AX-100067350   0  1  0'
geno <- read.table(text=geno, header=T)
write.table(geno, "geno.txt", sep="\t", col.names=T, row.names=F, quote=F)

  col.nums <- 'snp sample
      1  2'
col.nums <- read.table(text=col.nums, header=T)

path <- system.file("geno.txt", package="GWASdata")

diag.geno <- createDataFile(path=path, filename="tmp.gen", col.nums=col.nums, col.total=4, sep.type="\t", variables = "genotype", snp.annotation=snp.anno, scan.annotation=scan.anno, verbose=FALSE)

Error in .checkVars(variables, col.nums, col.total, intensity.vars) :
snp id missing in col.nums

Probably I missunderstood what 'col.nums' stands for, but I am really stuck here. I would be grateful for some light.
Thank you very much.

gwastools genotype affymetrix microarrays • 1.8k views

ADD COMMENT • link updated 9.5 years ago by Stephanie M. Gogarten ▴ 890 • written 9.5 years ago by Vinicius Henrique da Silva ▴ 40

score 2 · Accepted Answer · 2015-10-08

As it says in the man page for createDataFile, col.nums is not a data.frame but a named integer vector. Some other issues with your code: createDataFile (and createAffyIntensityFile) assume that you have one file per sample, where the file name is given in the "file" column of the scan.annotation data frame. "path" is the directory where all these files are found, something like "/Users/my_name/my_project/raw_data". When reading genotype data, the function is expecting alleles, either A/B (allele.coding="AB") or A/C/G/T (allele.coding="nucleotide"), although these genotypes are stored as 0/1/2 after input. I encourage you to go through the examples in the "Data Cleaning" and "Preparing Affymetrix Data" vignettes, as they might help you understand how to use these functions.

Here is a working version of your code:

library(GWASTools)
snp.anno <-   'snpID chromosome position      snpName
AX-100676796          1   501997 AX-100676796
AX-100120875          1   503822 AX-100120875
AX-100067350          1   504790 AX-100067350'
snp.anno <- read.table(text=snp.anno, header=T)
snp.anno$snpID <- 1:nrow(snp.anno)

scan.anno <- data.frame(scanID=1:2, scanName=paste0("sample", 1:2), file=paste0("geno", 1:2, ".txt"))

geno <-  'probeset_id    sample1.cel
  AX-100676796   AB
  AX-100120875   AA
  AX-100067350   BB'
geno <- read.table(text=geno, header=T)
write.table(geno, "geno1.txt", sep="\t", col.names=T, row.names=F, quote=F)

geno <-  'probeset_id   sample2.cel
  AX-100676796   BB
  AX-100120875   AB
  AX-100067350   AB'
geno <- read.table(text=geno, header=T)
write.table(geno, "geno2.txt", sep="\t", col.names=T, row.names=F, quote=F)

col.nums <- as.integer(c(1,2)); names(col.nums) <- c("snp", "geno")

diag.geno <- createDataFile(path=".", filename="tmp.gds", col.nums=col.nums, col.total=2, sep.type="\t", variables = "genotype", snp.annotation=snp.anno, scan.annotation=scan.anno, skip.num=1, scan.name.in.file=0, verbose=FALSE)

And the result:

> (gds <- GdsGenotypeReader("tmp.gds"))
File: /projects/users/stephanie/Code/Bioconductor/tmp.gds (1.3 KB)
+    [  ]
|--+ sample.id   { Int32 2 ZIP(175.00%), 14 bytes }
|--+ snp.id   { Int32 3 ZIP(141.67%), 17 bytes }
|--+ snp.chromosome   { UInt8 3 ZIP(366.67%), 11 bytes }
|--+ snp.position   { Int32 3 ZIP(166.67%), 20 bytes }
|--+ snp.rs.id   { Int32,factor 3 ZIP(141.67%), 17 bytes } *
|--+ genotype   { Bit2 3x2, 2 bytes } *
> getGenotype(gds)
     [,1] [,2]
[1,]    1    0
[2,]    2    1
[3,]    0    1
> close(gds)