Error in locateVariants
1
0
Entering edit mode
@jolly-shrivastava-6060
Last seen 10.1 years ago
Hello Valerie, I am running VariantAnnotation package to locate the position of SNPs using locateVariants. The commands that I gave to extract vcf file and the gff file are as follows: ###Reading the vcf file vcf1<-readVcf("CGTACG_filteredSNPS.vcf","Pinfestans.fa") rd<-rowData(vcf1) ####imported and converted the gff3 file to GRangesList object subject<-import.gff("annotation.gff3") myGRanges<-as(subject,"GRanges") myGRangesList<-GRangesList(myGRanges) ### made sure that both of them have same sequence levels rd<-keepSeqlevels(rd,intersect(seqlevels(rd),seqlevels(myGRangesList)) myGRangesList<-keepSeqlevels(myGRangesList,intersect(seqlevels(rd),seq levels(myGRangesList)) ###Ran the locateVariants command variants_found<-locateVariants(rd,myGRangesList,CodingVariants()) ###Got the following error Error in DataFrame(...) : different row counts implied by arguments Can you please point to what I am doing wrong? Regards Jolly -- Jolly Shrivastava Ph.D candidate (Genetics, Genomics and Bioinformatics) University of California, Riverside [[alternative HTML version deleted]]
VariantAnnotation VariantAnnotation VariantAnnotation VariantAnnotation • 1.7k views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.7 years ago
United States
Hi Jolly, I need a reproducible example in order to help. Are 'rd' and 'myGRangesList' too large to send? Also provide the output of sessionInfo(). Valerie On 07/24/2013 11:38 AM, Jolly Shrivastava wrote: > Hello Valerie, > > I am running VariantAnnotation package to locate the position of SNPs using > locateVariants. > > The commands that I gave to extract vcf file and the gff file are as > follows: > > ###Reading the vcf file > vcf1<-readVcf("CGTACG_filteredSNPS.vcf","Pinfestans.fa") > rd<-rowData(vcf1) > > ####imported and converted the gff3 file to GRangesList object > subject<-import.gff("annotation.gff3") > myGRanges<-as(subject,"GRanges") > myGRangesList<-GRangesList(myGRanges) > > > ### made sure that both of them have same sequence levels > rd<-keepSeqlevels(rd,intersect(seqlevels(rd),seqlevels(myGRangesList)) > myGRangesList<-keepSeqlevels(myGRangesList,intersect(seqlevels(rd),s eqlevels(myGRangesList)) > > ###Ran the locateVariants command > variants_found<-locateVariants(rd,myGRangesList,CodingVariants()) > > ###Got the following error > Error in DataFrame(...) : different row counts implied by arguments > > Can you please point to what I am doing wrong? > > Regards > Jolly > >
ADD COMMENT
0
Entering edit mode
Thanks for sending the files. Please keep the conversation 'on-list' by cc'ing bioconductor at r-project.org. There was a bug in locateVariants() related to 'features' not having txid or cdsid. That is now fixed in devel (1.7.38) and release (1.6.7). Thanks for reporting this. A couple things to note: (1) The second argument to readVcf() is 'genome'. Here you are setting the genome to the string 'Pinfestans.fa' string which is probably not what you intended to do. vcf1<-readVcf("CGTACG_filteredSNPS.vcf","Pinfestans.fa") (2) When 'features' is a GRangesList it must be an appropriate extraction to match the 'region' argument (explained on the man page). You are interested in CodingVariants so the GRangesList must be equivalent to a cdsBy(txdb, "transcript") extraction. The 'myGRangesList' object is of length 1 with 801 different seqlevels in the single list element. myGRangesList<-GRangesList(myGRanges) > length(myGRangesList) [1] 1 > length(seqlevels(myGRangesList)) [1] 801 The idea is that cds regions from the same transcript (i.e., same seqlevel) are grouped in list elements. Assuming that the seqlevels in this file are transcript names, you could subset by 'type' and split on seqname to get cds by transcripts. > cds <- myGRanges[myGRanges$type == "CDS"] > cdsby <- split(cds, seqnames(cds)) > length(cdsby) [1] 801 An alternative approach to is to make a TranscriptDb from your gff file with makeTranscriptDbFromGFF() in GenomicFeatures. Either the txdb or cdsby extraction can be used as 'features'. txdb <- makeTranscriptDbFromGFF("annotation.gff3") cdsby <- cdsBy(txdb, "transcript") Oops. Looks like this gff file has something strange going on. I'll send the report to Marc. > txdb <- makeTranscriptDbFromGFF("PI_T30-4_FINAL_CALLGENES_3.annotation.gff3") Error in .parse_attrCol(attrCol, file, colnames) : Some attributes do not conform to 'tag=value' format At any rate, makeTranscriptDbFromGFF() can be a useful function to convert gff to txdb for well behaved files. Valerie On 07/24/2013 01:34 PM, Valerie Obenchain wrote: > Hi Jolly, > > I need a reproducible example in order to help. Are 'rd' and > 'myGRangesList' too large to send? Also provide the output of > sessionInfo(). > > Valerie > > > On 07/24/2013 11:38 AM, Jolly Shrivastava wrote: >> Hello Valerie, >> >> I am running VariantAnnotation package to locate the position of SNPs >> using >> locateVariants. >> >> The commands that I gave to extract vcf file and the gff file are as >> follows: >> >> ###Reading the vcf file >> vcf1<-readVcf("CGTACG_filteredSNPS.vcf","Pinfestans.fa") >> rd<-rowData(vcf1) >> >> ####imported and converted the gff3 file to GRangesList object >> subject<-import.gff("annotation.gff3") >> myGRanges<-as(subject,"GRanges") >> myGRangesList<-GRangesList(myGRanges) >> >> >> ### made sure that both of them have same sequence levels >> rd<-keepSeqlevels(rd,intersect(seqlevels(rd),seqlevels(myGRangesList)) >> myGRangesList<-keepSeqlevels(myGRangesList,intersect(seqlevels(rd), seqlevels(myGRangesList)) >> >> >> ###Ran the locateVariants command >> variants_found<-locateVariants(rd,myGRangesList,CodingVariants()) >> >> ###Got the following error >> Error in DataFrame(...) : different row counts implied by arguments >> >> Can you please point to what I am doing wrong? >> >> Regards >> Jolly >> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 507 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6