Entering edit mode
bkauf
•
0
@bkauf-12726
Last seen 7.8 years ago
Sorry for total newbie question, but searching has turned up nothing.
I am running R3.3.3 with bioconductor on a mac. trying to compile coding variants from a snp files, but I can't figure out the source of error. Here is the script with traceback at end.
Thanks!
Brett
-------
> library(AnnotationHub) > library(VariantAnnotation) > # Get Ensembl 85 gene location. > hub = AnnotationHub() snapshotDate(): 2016-10-11 > hub = query(hub, c("ensembl", "mus musculus", "gtf")) > ensembl = hub[["AH51040"]] Importing File into R .. loading from cache ‘/Users/brettkaufman//.AnnotationHub/57778’ using guess work to populate seqinfo > # Sanger VCF file location. > vcf.file = "~/Box Sync/Lab unfiled data/bioconductor/mgp.v5.merged.snps_all.dbSNP142.vcf.gz" > # Pick a set of your favorite genes. > target.genes = c("Apob", "Dicer1", "Serpina1a") > vcf.header = scanVcfHeader(vcf.file) > strains = samples(vcf.header)[c(5, 2, 26, 28, 16, 30, 35)] > results = NULL > for(i in 1:length(target.genes)) { + print(paste(i, "of", length(target.genes))) + gene.loc = ensembl[ensembl$gene_name == target.genes[i] & ensembl$type == "gene"] + if(length(gene.loc) == 0) { + stop(paste("Gene", target.genes[i], "not found.")) + } + param = ScanVcfParam(geno = "GT", samples = strains, + which = GRanges(seqnames = seqnames(gene.loc)[1], ranges = + IRanges(start = start(gene.loc)[1], end = end(gene.loc)[1]))) + snps = readVcf(file = vcf.file, genome = "mm10", param = param) + csq = as.list(info(snps)$CSQ) + keep = lapply(csq, function(z) { grep("missense|stop_lost|stop_gained", z) }) + keep = which(sapply(keep, length) > 0) + snps = snps[keep] + snps = genotypeCodesToNucleotides(snps) + results = rbind(results, cbind(as.data.frame(rowRanges(snps)), geno(snps)$GT)) + } # for(i) [1] "1 of 3" Error in matrix(unlist(GTstr), ncol = 2, byrow = TRUE) : 'data' must be of a vector type, was 'NULL' > write.csv(results, file = "missense_snps.csv", quote = F, row.names = F) > traceback() 3: matrix(unlist(GTstr), ncol = 2, byrow = TRUE) 2: .geno2geno(NULL, ALT, REF, GT) 1: genotypeCodesToNucleotides(snps) at #15 >
It would be good to include the output of
sessionInfo()
in your post so people can see exactly what version of the packages you're using. Currently I'm unable to replicate this error with your code. The only difference I made is to use a remote version of the VCF file since I don't want to download 21GB e.g.Here's my
sessionInfo()
. You can compared it with yours to see if there are any obvious package version differences: