mutant allele read counts
4
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 10.2 years ago
Hi, I am interested in extracting information for functional annotation using CRAVAT. It requires the data to be in the following format. =========================================== # UID / Chr. / Position / Strand / Ref. base / Alt. base / Sample ID (optional) TR1 chr17 7577506 - G T TCGA-02-0231 TR2 chr10 123279680 - G A TCGA-02-3512 TR3 chr13 49033967 + C A TCGA-02-3532 TR4 chr7 116417505 + G T TCGA-02-1523 TR5 chr7 140453136 - T A TCGA-02-0023 TR6 chr17 37880998 + G T TCGA-02-0252 Ins1 chr17 37880998 + G GT TCGA-02-0252 Del1 chr17 37880998 + GA G TCGA-02-0252 CSub1 chr2 39871235 + ATGCT GA TCGA-02-0252 =============================================== http://www.cravat.us/help.jsp?chapter=how_to_cite&article=# I am trying to extract this information from vcf files generated by mutect. I am using VariantAnnotation extract this information. I have read the file using readVcf(), and renamed the chromosomes according to txdb. rowData(newVcfData) GRanges with 62991 ranges and 5 metadata columns: seqnames ranges strand | paramRangeID <rle> <iranges> <rle> | <factor> 1:109641_A/G chr1 [109641, 109641] * | <na> 1:526561_T/G chr1 [526561, 526561] * | <na> 1:691958_G/A chr1 [691958, 691958] * | <na> 1:763781_A/T chr1 [763781, 763781] * | <na> rs6594026 chr1 [782981, 782981] * | <na> ... ... ... ... ... ... rs480725 chrX [154903224, 154903224] * | <na> X:154925893_C/T chrX [154925893, 154925893] * | <na> X:155038107_C/G chrX [155038107, 155038107] * | <na> X:155204257_G/T chrX [155204257, 155204257] * | <na> X:155234730_T/C chrX [155234730, 155234730] * | <na> REF ALT QUAL FILTER <dnastringset> <dnastringsetlist> <numeric> <character> 1:109641_A/G A G 8.90 PASS 1:526561_T/G T G 9.19 PASS 1:691958_G/A G A 13.74 PASS 1:763781_A/T A T 16.03 PASS rs6594026 C T 11.24 PASS ... ... ... ... ... rs480725 A T 6.39 PASS X:154925893_C/T C T 6.53 PASS X:155038107_C/G C G 6.64 PASS X:155204257_G/T G T 6.35 PASS X:155234730_T/C T C 6.51 PASS --- seqlengths: chr1 chr10 chr11 chr12 chr13 chr14 ... chr5 chr6 chr7 chr8 chr9 chrX NA NA NA NA NA NA ... NA NA NA NA NA NA Can the information be extracted using VariantAnnotation()? I would appreciate your help with this. Thanks ../Murli -- output of sessionInfo(): > sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1 [2] GenomicFeatures_1.14.5 [3] AnnotationDbi_1.24.0 [4] Biobase_2.22.0 [5] VariantAnnotation_1.8.13 [6] Rsamtools_1.14.3 [7] Biostrings_2.30.1 [8] GenomicRanges_1.14.4 [9] XVector_0.2.0 [10] IRanges_1.20.7 [11] BiocGenerics_0.8.0 loaded via a namespace (and not attached): [1] biomaRt_2.18.0 bitops_1.0-6 BSgenome_1.30.0 DBI_0.2-7 [5] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.22.7 stats4_3.0.2 [9] tools_3.0.2 XML_3.98-1.1 zlibbioc_1.8.0 -- Sent via the guest posting facility at bioconductor.org.
VariantAnnotation VariantAnnotation VariantAnnotation VariantAnnotation • 1.8k views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.9 years ago
United States
Hi, Use readVcfAsVRanges() then coerce to a data.frame. fl <- system.file("extdata", "chr7-sub.vcf.gz", package="VariantAnnotation") vr <- readVcfAsVRanges(fl, "hg19") df <- as.data.frame(vr) You'll have some extra columns in the data.frame but you can remove / rename columns as necessary. Valerie On 06/13/2014 10:46 AM, Murli [guest] wrote: > Hi, > I am interested in extracting information for functional annotation using CRAVAT. It requires the data to be in the following format. > =========================================== > # UID / Chr. / Position / Strand / Ref. base / Alt. base / Sample ID (optional) > TR1 chr17 7577506 - G T TCGA-02-0231 > TR2 chr10 123279680 - G A TCGA-02-3512 > TR3 chr13 49033967 + C A TCGA-02-3532 > TR4 chr7 116417505 + G T TCGA-02-1523 > TR5 chr7 140453136 - T A TCGA-02-0023 > TR6 chr17 37880998 + G T TCGA-02-0252 > Ins1 chr17 37880998 + G GT TCGA-02-0252 > Del1 chr17 37880998 + GA G TCGA-02-0252 > CSub1 chr2 39871235 + ATGCT GA TCGA-02-0252 > > =============================================== > http://www.cravat.us/help.jsp?chapter=how_to_cite&article=# > > I am trying to extract this information from vcf files generated by mutect. I am using VariantAnnotation extract this information. I have read the file using readVcf(), and renamed the chromosomes according to txdb. > > rowData(newVcfData) > GRanges with 62991 ranges and 5 metadata columns: > seqnames ranges strand | paramRangeID > <rle> <iranges> <rle> | <factor> > 1:109641_A/G chr1 [109641, 109641] * | <na> > 1:526561_T/G chr1 [526561, 526561] * | <na> > 1:691958_G/A chr1 [691958, 691958] * | <na> > 1:763781_A/T chr1 [763781, 763781] * | <na> > rs6594026 chr1 [782981, 782981] * | <na> > ... ... ... ... ... ... > rs480725 chrX [154903224, 154903224] * | <na> > X:154925893_C/T chrX [154925893, 154925893] * | <na> > X:155038107_C/G chrX [155038107, 155038107] * | <na> > X:155204257_G/T chrX [155204257, 155204257] * | <na> > X:155234730_T/C chrX [155234730, 155234730] * | <na> > REF ALT QUAL FILTER > <dnastringset> <dnastringsetlist> <numeric> <character> > 1:109641_A/G A G 8.90 PASS > 1:526561_T/G T G 9.19 PASS > 1:691958_G/A G A 13.74 PASS > 1:763781_A/T A T 16.03 PASS > rs6594026 C T 11.24 PASS > ... ... ... ... ... > rs480725 A T 6.39 PASS > X:154925893_C/T C T 6.53 PASS > X:155038107_C/G C G 6.64 PASS > X:155204257_G/T G T 6.35 PASS > X:155234730_T/C T C 6.51 PASS > --- > seqlengths: > chr1 chr10 chr11 chr12 chr13 chr14 ... chr5 chr6 chr7 chr8 chr9 chrX > NA NA NA NA NA NA ... NA NA NA NA NA NA > > > Can the information be extracted using VariantAnnotation()? I would appreciate your help with this. > Thanks ../Murli > > > > -- output of sessionInfo(): > >> sessionInfo() > R version 3.0.2 (2013-09-25) > Platform: x86_64-redhat-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1 > [2] GenomicFeatures_1.14.5 > [3] AnnotationDbi_1.24.0 > [4] Biobase_2.22.0 > [5] VariantAnnotation_1.8.13 > [6] Rsamtools_1.14.3 > [7] Biostrings_2.30.1 > [8] GenomicRanges_1.14.4 > [9] XVector_0.2.0 > [10] IRanges_1.20.7 > [11] BiocGenerics_0.8.0 > > loaded via a namespace (and not attached): > [1] biomaRt_2.18.0 bitops_1.0-6 BSgenome_1.30.0 DBI_0.2-7 > [5] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.22.7 stats4_3.0.2 > [9] tools_3.0.2 XML_3.98-1.1 zlibbioc_1.8.0 > > > -- > Sent via the guest posting facility at bioconductor.org. > -- Valerie Obenchain Program in Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, Seattle, WA 98109 Email: vobencha at fhcrc.org Phone: (206) 667-3158
ADD COMMENT
0
Entering edit mode
Hi Valerie, Thanks for the help. I am getting the following errors when I am reading the vcf files. vr <- readVcfAsVRanges(vcfFile, "hg19") Error in lapply(ivar[inms], drop) : error in evaluating the argument 'X' in selecting a method for function 'lapply': Error in normalizeSingleBracketSubscript(j, x) : subscript contains invalid names With another file I am getting the following > source("vcfToCravat.R") Error in validObject(.Object) : invalid class VRanges Cheers../Murli On Fri, Jun 13, 2014 at 3:29 PM, Valerie Obenchain <vobencha@fhcrc.org> wrote: > Hi, > > Use readVcfAsVRanges() then coerce to a data.frame. > > fl <- system.file("extdata", "chr7-sub.vcf.gz", > package="VariantAnnotation") > vr <- readVcfAsVRanges(fl, "hg19") > df <- as.data.frame(vr) > > You'll have some extra columns in the data.frame but you can remove / > rename columns as necessary. > > Valerie > > > > > > On 06/13/2014 10:46 AM, Murli [guest] wrote: > >> Hi, >> I am interested in extracting information for functional annotation using >> CRAVAT. It requires the data to be in the following format. >> =========================================== >> # UID / Chr. / Position / Strand / Ref. base / Alt. base / Sample ID >> (optional) >> TR1 chr17 7577506 - G T TCGA-02-0231 >> TR2 chr10 123279680 - G A TCGA-02-3512 >> TR3 chr13 49033967 + C A TCGA-02-3532 >> TR4 chr7 116417505 + G T TCGA-02-1523 >> TR5 chr7 140453136 - T A TCGA-02-0023 >> TR6 chr17 37880998 + G T TCGA-02-0252 >> Ins1 chr17 37880998 + G GT TCGA-02-0252 >> Del1 chr17 37880998 + GA G TCGA-02-0252 >> CSub1 chr2 39871235 + ATGCT GA TCGA-02-0252 >> >> =============================================== >> http://www.cravat.us/help.jsp?chapter=how_to_cite&article=# >> >> I am trying to extract this information from vcf files generated by >> mutect. I am using VariantAnnotation extract this information. I have read >> the file using readVcf(), and renamed the chromosomes according to txdb. >> >> rowData(newVcfData) >> GRanges with 62991 ranges and 5 metadata columns: >> seqnames ranges strand | paramRangeID >> <rle> <iranges> <rle> | <factor> >> 1:109641_A/G chr1 [109641, 109641] * | <na> >> 1:526561_T/G chr1 [526561, 526561] * | <na> >> 1:691958_G/A chr1 [691958, 691958] * | <na> >> 1:763781_A/T chr1 [763781, 763781] * | <na> >> rs6594026 chr1 [782981, 782981] * | <na> >> ... ... ... ... ... ... >> rs480725 chrX [154903224, 154903224] * | <na> >> X:154925893_C/T chrX [154925893, 154925893] * | <na> >> X:155038107_C/G chrX [155038107, 155038107] * | <na> >> X:155204257_G/T chrX [155204257, 155204257] * | <na> >> X:155234730_T/C chrX [155234730, 155234730] * | <na> >> REF ALT QUAL FILTER >> <dnastringset> <dnastringsetlist> <numeric> <character> >> 1:109641_A/G A G 8.90 PASS >> 1:526561_T/G T G 9.19 PASS >> 1:691958_G/A G A 13.74 PASS >> 1:763781_A/T A T 16.03 PASS >> rs6594026 C T 11.24 PASS >> ... ... ... ... ... >> rs480725 A T 6.39 PASS >> X:154925893_C/T C T 6.53 PASS >> X:155038107_C/G C G 6.64 PASS >> X:155204257_G/T G T 6.35 PASS >> X:155234730_T/C T C 6.51 PASS >> --- >> seqlengths: >> chr1 chr10 chr11 chr12 chr13 chr14 ... chr5 chr6 chr7 chr8 chr9 >> chrX >> NA NA NA NA NA NA ... NA NA NA NA NA >> NA >> >> >> Can the information be extracted using VariantAnnotation()? I would >> appreciate your help with this. >> Thanks ../Murli >> >> >> >> -- output of sessionInfo(): >> >> sessionInfo() >>> >> R version 3.0.2 (2013-09-25) >> Platform: x86_64-redhat-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1 >> [2] GenomicFeatures_1.14.5 >> [3] AnnotationDbi_1.24.0 >> [4] Biobase_2.22.0 >> [5] VariantAnnotation_1.8.13 >> [6] Rsamtools_1.14.3 >> [7] Biostrings_2.30.1 >> [8] GenomicRanges_1.14.4 >> [9] XVector_0.2.0 >> [10] IRanges_1.20.7 >> [11] BiocGenerics_0.8.0 >> >> loaded via a namespace (and not attached): >> [1] biomaRt_2.18.0 bitops_1.0-6 BSgenome_1.30.0 DBI_0.2-7 >> [5] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.22.7 >> stats4_3.0.2 >> [9] tools_3.0.2 XML_3.98-1.1 zlibbioc_1.8.0 >> >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> > > -- > Valerie Obenchain > Program in Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, Seattle, WA 98109 > > Email: vobencha@fhcrc.org > Phone: (206) 667-3158 > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Sorry, forgot to post the code contained in the R file in the earlier post. These are the few lines in vcfToCravat.R that I am sourcing. library(VariantAnnotation) DataDir="../Project_NAI_01117_TNWGS/Sample_3_Middle_lobe_tumor/analysi s/" vcfFile=paste(DataDir,"3_Middle_lobe_tumor-- 2_mucosal_normal.snv.mutect.v1.1.4.annotated.vcf",sep ="") vcfFile=paste(DataDir,"3_Middle_lobe_tumor-- 2_mucosal_normal.indel.sominddet.v2.3-9.vcf",sep="") vr <- readVcfAsVRanges(vcfFile, "hg19") df <- as.data.frame(vr) On Fri, Jun 13, 2014 at 5:47 PM, Murli <murlinair@gmail.com> wrote: > Hi Valerie, > Thanks for the help. I am getting the following errors when I am reading > the vcf files. > vr <- readVcfAsVRanges(vcfFile, "hg19") > Error in lapply(ivar[inms], drop) : > error in evaluating the argument 'X' in selecting a method for function > 'lapply': Error in normalizeSingleBracketSubscript(j, x) : > subscript contains invalid names > > With another file I am getting the following > > source("vcfToCravat.R") > Error in validObject(.Object) : > invalid class VRanges > > Cheers../Murli > > > On Fri, Jun 13, 2014 at 3:29 PM, Valerie Obenchain <vobencha@fhcrc.org> > wrote: > >> Hi, >> >> Use readVcfAsVRanges() then coerce to a data.frame. >> >> fl <- system.file("extdata", "chr7-sub.vcf.gz", >> package="VariantAnnotation") >> vr <- readVcfAsVRanges(fl, "hg19") >> df <- as.data.frame(vr) >> >> You'll have some extra columns in the data.frame but you can remove / >> rename columns as necessary. >> >> Valerie >> >> >> >> >> >> On 06/13/2014 10:46 AM, Murli [guest] wrote: >> >>> Hi, >>> I am interested in extracting information for functional annotation >>> using CRAVAT. It requires the data to be in the following format. >>> =========================================== >>> # UID / Chr. / Position / Strand / Ref. base / Alt. base / Sample ID >>> (optional) >>> TR1 chr17 7577506 - G T TCGA-02-0231 >>> TR2 chr10 123279680 - G A TCGA-02-3512 >>> TR3 chr13 49033967 + C A TCGA-02-3532 >>> TR4 chr7 116417505 + G T TCGA-02-1523 >>> TR5 chr7 140453136 - T A TCGA-02-0023 >>> TR6 chr17 37880998 + G T TCGA-02-0252 >>> Ins1 chr17 37880998 + G GT TCGA-02-0252 >>> Del1 chr17 37880998 + GA G TCGA-02-0252 >>> CSub1 chr2 39871235 + ATGCT GA TCGA-02-0252 >>> >>> =============================================== >>> http://www.cravat.us/help.jsp?chapter=how_to_cite&article=# >>> >>> I am trying to extract this information from vcf files generated by >>> mutect. I am using VariantAnnotation extract this information. I have read >>> the file using readVcf(), and renamed the chromosomes according to txdb. >>> >>> rowData(newVcfData) >>> GRanges with 62991 ranges and 5 metadata columns: >>> seqnames ranges strand | >>> paramRangeID >>> <rle> <iranges> <rle> | >>> <factor> >>> 1:109641_A/G chr1 [109641, 109641] * | >>> <na> >>> 1:526561_T/G chr1 [526561, 526561] * | >>> <na> >>> 1:691958_G/A chr1 [691958, 691958] * | >>> <na> >>> 1:763781_A/T chr1 [763781, 763781] * | >>> <na> >>> rs6594026 chr1 [782981, 782981] * | >>> <na> >>> ... ... ... ... ... >>> ... >>> rs480725 chrX [154903224, 154903224] * | >>> <na> >>> X:154925893_C/T chrX [154925893, 154925893] * | >>> <na> >>> X:155038107_C/G chrX [155038107, 155038107] * | >>> <na> >>> X:155204257_G/T chrX [155204257, 155204257] * | >>> <na> >>> X:155234730_T/C chrX [155234730, 155234730] * | >>> <na> >>> REF ALT QUAL >>> FILTER >>> <dnastringset> <dnastringsetlist> <numeric> >>> <character> >>> 1:109641_A/G A G 8.90 >>> PASS >>> 1:526561_T/G T G 9.19 >>> PASS >>> 1:691958_G/A G A 13.74 >>> PASS >>> 1:763781_A/T A T 16.03 >>> PASS >>> rs6594026 C T 11.24 >>> PASS >>> ... ... ... ... >>> ... >>> rs480725 A T 6.39 >>> PASS >>> X:154925893_C/T C T 6.53 >>> PASS >>> X:155038107_C/G C G 6.64 >>> PASS >>> X:155204257_G/T G T 6.35 >>> PASS >>> X:155234730_T/C T C 6.51 >>> PASS >>> --- >>> seqlengths: >>> chr1 chr10 chr11 chr12 chr13 chr14 ... chr5 chr6 chr7 chr8 >>> chr9 chrX >>> NA NA NA NA NA NA ... NA NA NA NA >>> NA NA >>> >>> >>> Can the information be extracted using VariantAnnotation()? I would >>> appreciate your help with this. >>> Thanks ../Murli >>> >>> >>> >>> -- output of sessionInfo(): >>> >>> sessionInfo() >>>> >>> R version 3.0.2 (2013-09-25) >>> Platform: x86_64-redhat-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods >>> [8] base >>> >>> other attached packages: >>> [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1 >>> [2] GenomicFeatures_1.14.5 >>> [3] AnnotationDbi_1.24.0 >>> [4] Biobase_2.22.0 >>> [5] VariantAnnotation_1.8.13 >>> [6] Rsamtools_1.14.3 >>> [7] Biostrings_2.30.1 >>> [8] GenomicRanges_1.14.4 >>> [9] XVector_0.2.0 >>> [10] IRanges_1.20.7 >>> [11] BiocGenerics_0.8.0 >>> >>> loaded via a namespace (and not attached): >>> [1] biomaRt_2.18.0 bitops_1.0-6 BSgenome_1.30.0 DBI_0.2-7 >>> [5] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.22.7 >>> stats4_3.0.2 >>> [9] tools_3.0.2 XML_3.98-1.1 zlibbioc_1.8.0 >>> >>> >>> -- >>> Sent via the guest posting facility at bioconductor.org. >>> >>> >> >> -- >> Valerie Obenchain >> Program in Computational Biology >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, Seattle, WA 98109 >> >> Email: vobencha@fhcrc.org >> Phone: (206) 667-3158 >> > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.9 years ago
United States
Hi, I see you've got an old version of the package. The release version is 1.10.1. FYI you can see all release versions here: http://www.bioconductor.org/checkResults/release/bioc-LATEST/ Please update R and reinstall packages with biocLite(). If you need any help with that guidelines are here: http://www.bioconductor.org/install/ If you want to send me a small test file offline I can confirm that the current version can read it. Valerie On 06/13/2014 03:06 PM, Murli wrote: > Sorry, forgot to post the code contained in the R file in the earlier > post. These are the few lines in vcfToCravat.R that I am sourcing. > library(VariantAnnotation) > DataDir="../Project_NAI_01117_TNWGS/Sample_3_Middle_lobe_tumor/analy sis/" > vcfFile=paste(DataDir,"3_Middle_lobe_tumor-- 2_mucosal_normal.snv.mutect.v1.1.4.annotated.vcf",sep > ="") > vcfFile=paste(ataDir,"3_Middle_lobe_tumor-- 2_mucosal_normal.indel.sominddet.v2.3-9.vcf",sep="") > vr <- readVcfAsVRanges(vcfFile, "hg19") > df <- as.data.frame(vr) > > > On Fri, Jun 13, 2014 at 5:47 PM, Murli <murlinair at="" gmail.com=""> <mailto:murlinair at="" gmail.com="">> wrote: > > Hi Valerie, > Thanks for the help. I am getting the following errors when I am > reading the vcf files. > vr <- readVcfAsVRanges(vcfFile, "hg19") > Error in lapply(ivar[inms], drop) : > error in evaluating the argument 'X' in selecting a method for > function 'lapply': Error in normalizeSingleBracketSubscript(j, x) : > subscript contains invalid names > > With another file I am getting the following > > source("vcfToCravat.R") > Error in validObject(.Object) : > invalid class VRanges > > Cheers../Murli > > > On Fri, Jun 13, 2014 at 3:29 PM, Valerie Obenchain > <vobencha at="" fhcrc.org="" <mailto:vobencha="" at="" fhcrc.org="">> wrote: > > Hi, > > Use readVcfAsVRanges() then coerce to a data.frame. > > fl <- system.file("extdata", "chr7-sub.vcf.gz", > package="VariantAnnotation") > vr <- readVcfAsVRanges(fl, "hg19") > df <- as.data.frame(vr) > > You'll have some extra columns in the data.frame but you can > remove / rename columns as necessary. > > Valerie > > > > > > On 06/13/2014 10:46 AM, Murli [guest] wrote: > > Hi, > I am interested in extracting information for functional > annotation using CRAVAT. It requires the data to be in the > following format. > ==============================__============= > # UID / Chr. / Position / Strand / Ref. base / Alt. base / > Sample ID (optional) > TR1 chr17 7577506 - G T TCGA-02-0231 > TR2 chr10 123279680 - G A > TCGA-02-3512 > TR3 chr13 49033967 + C A > TCGA-02-3532 > TR4 chr7 116417505 + G T > TCGA-02-1523 > TR5 chr7 140453136 - T A > TCGA-02-0023 > TR6 chr17 37880998 + G T > TCGA-02-0252 > Ins1 chr17 37880998 + G GT > TCGA-02-0252 > Del1 chr17 37880998 + GA G > TCGA-02-0252 > CSub1 chr2 39871235 + ATGCT GA > TCGA-02-0252 > > ==============================__================= > http://www.cravat.us/help.jsp?__chapter=how_to_cite&article=# <http: www.cravat.us="" help.jsp?chapter="how_to_cite&amp;article=#"> > > I am trying to extract this information from vcf files > generated by mutect. I am using VariantAnnotation extract > this information. I have read the file using readVcf(), and > renamed the chromosomes according to txdb. > > rowData(newVcfData) > GRanges with 62991 ranges and 5 metadata columns: > seqnames ranges strand > | paramRangeID > <rle> <iranges> <rle> > | <factor> > 1:109641_A/G chr1 [109641, 109641] * > | <na> > 1:526561_T/G chr1 [526561, 526561] * > | <na> > 1:691958_G/A chr1 [691958, 691958] * > | <na> > 1:763781_A/T chr1 [763781, 763781] * > | <na> > rs6594026 chr1 [782981, 782981] * > | <na> > ... ... ... ... > ... ... > rs480725 chrX [154903224, 154903224] * > | <na> > X:154925893_C/T chrX [154925893, 154925893] * > | <na> > X:155038107_C/G chrX [155038107, 155038107] * > | <na> > X:155204257_G/T chrX [155204257, 155204257] * > | <na> > X:155234730_T/C chrX [155234730, 155234730] * > | <na> > REF ALT > QUAL FILTER > <dnastringset> <dnastringsetlist> > <numeric> <character> > 1:109641_A/G A G > 8.90 PASS > 1:526561_T/G T G > 9.19 PASS > 1:691958_G/A G A > 13.74 PASS > 1:763781_A/T A T > 16.03 PASS > rs6594026 C T > 11.24 PASS > ... ... ... > ... ... > rs480725 A T > 6.39 PASS > X:154925893_C/T C T > 6.53 PASS > X:155038107_C/G C G > 6.64 PASS > X:155204257_G/T G T > 6.35 PASS > X:155234730_T/C T C > 6.51 PASS > --- > seqlengths: > chr1 chr10 chr11 chr12 chr13 chr14 ... chr5 chr6 > chr7 chr8 chr9 chrX > NA NA NA NA NA NA ... NA NA > NA NA NA NA > > > Can the information be extracted using VariantAnnotation()? > I would appreciate your help with this. > Thanks ../Murli > > > > -- output of sessionInfo(): > > sessionInfo() > > R version 3.0.2 (2013-09-25) > Platform: x86_64-redhat-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils > datasets methods > [8] base > > other attached packages: > [1] TxDb.Hsapiens.UCSC.hg19.__knownGene_2.10.1 > [2] GenomicFeatures_1.14.5 > [3] AnnotationDbi_1.24.0 > [4] Biobase_2.22.0 > [5] VariantAnnotation_1.8.13 > [6] Rsamtools_1.14.3 > [7] Biostrings_2.30.1 > [8] GenomicRanges_1.14.4 > [9] XVector_0.2.0 > [10] IRanges_1.20.7 > [11] BiocGenerics_0.8.0 > > loaded via a namespace (and not attached): > [1] biomaRt_2.18.0 bitops_1.0-6 BSgenome_1.30.0 > DBI_0.2-7 > [5] RCurl_1.95-4.1 RSQLite_0.11.4 > rtracklayer_1.22.7 stats4_3.0.2 > [9] tools_3.0.2 XML_3.98-1.1 zlibbioc_1.8.0 > > > -- > Sent via the guest posting facility at bioconductor.org > <http: bioconductor.org="">. > > > > -- > Valerie Obenchain > Program in Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, Seattle, WA 98109 > > Email: vobencha at fhcrc.org <mailto:vobencha at="" fhcrc.org=""> > Phone: (206) 667-3158 <tel:%28206%29%20667-3158> > > > -- Valerie Obenchain Program in Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, Seattle, WA 98109 Email: vobencha at fhcrc.org Phone: (206) 667-3158
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.9 years ago
United States
Thanks for providing the test files. The subset bug you hit was fixed in devel a few weeks ago. I've ported it to release as 1.10.2 which should be available via biocLite() Monday noon-ish or from svn immediately. With 1.10.2 I can read the snv file fine: vr <- readVcfAsVRanges("3_Middle_lobe_tumor-- 2_mucosal_normal.snv.mutect.v1.1.4.annotated.vcf", "") >> vr[1:3,] > VRanges with 3 ranges and 0 metadata columns: > seqnames ranges strand ref alt > <rle> <iranges> <rle> <character> <characterorrle> > [1] 1 [109641, 109641] + A G > [2] 1 [526561, 526561] + T G > [3] 1 [691958, 691958] + G A > totalDepth refDepth altDepth sampleNames > <integerorrle> <integerorrle> <integerorrle> <factororrle> > [1] <na> 31 5 3_Middle_lobe_tumor > [2] <na> 56 5 3_Middle_lobe_tumor > [3] <na> 41 7 3_Middle_lobe_tumor > softFilterMatrix > <matrix> > [1] > [2] > [3] Trying to read the indel file gives this error: > vr <- readVcfAsVRanges("3_Middle_lobe_tumor-- 2_mucosal_normal.indel.sominddet.v2.3-9.vcf", "") Error in validObject(.Object) : invalid class ?VRanges? object: 'refDepth' must be non-negative Look at the depth variable only with readGeno(). > fl <- "3_Middle_lobe_tumor- 2_mucosal_normal.indel.sominddet.v2.3-9.vcf" > ad <- readGeno(fl, "AD") > dim(ad) [1] 547698 2 2 > min(ad) [1] -23 The negative depth may need some investigating. Valerie >> sessionInfo() > R version 3.1.0 Patched (2014-04-18 r65405) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] VariantAnnotation_1.10.2 Rsamtools_1.16.0 Biostrings_2.32.0 > [4] XVector_0.4.0 GenomicRanges_1.16.3 GenomeInfoDb_1.0.2 > [7] IRanges_1.22.9 BiocGenerics_0.10.0 > > loaded via a namespace (and not attached): > [1] AnnotationDbi_1.26.0 BatchJobs_1.2 BBmisc_1.6 > [4] Biobase_2.24.0 BiocParallel_0.6.1 biomaRt_2.20.0 > [7] bitops_1.0-6 brew_1.0-6 BSgenome_1.32.0 > [10] codetools_0.2-8 DBI_0.2-7 digest_0.6.4 > [13] fail_1.2 foreach_1.4.2 GenomicAlignments_1.0.1 > [16] GenomicFeatures_1.16.2 iterators_1.0.7 plyr_1.8.1 > [19] Rcpp_0.11.2 RCurl_1.95-4.1 RSQLite_0.11.4 > [22] rtracklayer_1.24.2 sendmailR_1.1-2 stats4_3.1.0 > [25] stringr_0.6.2 tools_3.1.0 XML_3.98-1.1 > [28] zlibbioc_1.10.0 On 06/13/2014 04:33 PM, Valerie Obenchain wrote: > Hi, > > I see you've got an old version of the package. The release version is > 1.10.1. FYI you can see all release versions here: > > http://www.bioconductor.org/checkResults/release/bioc-LATEST/ > > Please update R and reinstall packages with biocLite(). If you need any > help with that guidelines are here: > http://www.bioconductor.org/install/ > > If you want to send me a small test file offline I can confirm that the > current version can read it. > > > Valerie > > > On 06/13/2014 03:06 PM, Murli wrote: >> Sorry, forgot to post the code contained in the R file in the earlier >> post. These are the few lines in vcfToCravat.R that I am sourcing. >> library(VariantAnnotation) >> DataDir="../Project_NAI_01117_TNWGS/Sample_3_Middle_lobe_tumor/anal ysis/" >> vcfFile=paste(DataDir,"3_Middle_lobe_tumor-- 2_mucosal_normal.snv.mutect.v1.1.4.annotated.vcf",sep >> >> ="") >> vcfFile=paste(ataDir,"3_Middle_lobe_tumor-- 2_mucosal_normal.indel.sominddet.v2.3-9.vcf",sep="") >> >> vr <- readVcfAsVRanges(vcfFile, "hg19") >> df <- as.data.frame(vr) >> >> >> On Fri, Jun 13, 2014 at 5:47 PM, Murli <murlinair at="" gmail.com="">> <mailto:murlinair at="" gmail.com="">> wrote: >> >> Hi Valerie, >> Thanks for the help. I am getting the following errors when I am >> reading the vcf files. >> vr <- readVcfAsVRanges(vcfFile, "hg19") >> Error in lapply(ivar[inms], drop) : >> error in evaluating the argument 'X' in selecting a method for >> function 'lapply': Error in normalizeSingleBracketSubscript(j, x) : >> subscript contains invalid names >> >> With another file I am getting the following >> > source("vcfToCravat.R") >> Error in validObject(.Object) : >> invalid class VRanges >> >> Cheers../Murli >> >> >> On Fri, Jun 13, 2014 at 3:29 PM, Valerie Obenchain >> <vobencha at="" fhcrc.org="" <mailto:vobencha="" at="" fhcrc.org="">> wrote: >> >> Hi, >> >> Use readVcfAsVRanges() then coerce to a data.frame. >> >> fl <- system.file("extdata", "chr7-sub.vcf.gz", >> package="VariantAnnotation") >> vr <- readVcfAsVRanges(fl, "hg19") >> df <- as.data.frame(vr) >> >> You'll have some extra columns in the data.frame but you can >> remove / rename columns as necessary. >> >> Valerie >> >> >> >> >> >> On 06/13/2014 10:46 AM, Murli [guest] wrote: >> >> Hi, >> I am interested in extracting information for functional >> annotation using CRAVAT. It requires the data to be in the >> following format. >> ==============================__============= >> # UID / Chr. / Position / Strand / Ref. base / Alt. base / >> Sample ID (optional) >> TR1 chr17 7577506 - G T TCGA-02-0231 >> TR2 chr10 123279680 - G A >> TCGA-02-3512 >> TR3 chr13 49033967 + C A >> TCGA-02-3532 >> TR4 chr7 116417505 + G T >> TCGA-02-1523 >> TR5 chr7 140453136 - T A >> TCGA-02-0023 >> TR6 chr17 37880998 + G T >> TCGA-02-0252 >> Ins1 chr17 37880998 + G GT >> TCGA-02-0252 >> Del1 chr17 37880998 + GA G >> TCGA-02-0252 >> CSub1 chr2 39871235 + ATGCT GA >> TCGA-02-0252 >> >> ==============================__================= >> >> http://www.cravat.us/help.jsp?__chapter=how_to_cite&article=# >> <http: www.cravat.us="" help.jsp?chapter="how_to_cite&amp;article=#"> >> >> I am trying to extract this information from vcf files >> generated by mutect. I am using VariantAnnotation extract >> this information. I have read the file using readVcf(), and >> renamed the chromosomes according to txdb. >> >> rowData(newVcfData) >> GRanges with 62991 ranges and 5 metadata columns: >> seqnames ranges strand >> | paramRangeID >> <rle> <iranges> <rle> >> | <factor> >> 1:109641_A/G chr1 [109641, 109641] * >> | <na> >> 1:526561_T/G chr1 [526561, 526561] * >> | <na> >> 1:691958_G/A chr1 [691958, 691958] * >> | <na> >> 1:763781_A/T chr1 [763781, 763781] * >> | <na> >> rs6594026 chr1 [782981, 782981] * >> | <na> >> ... ... ... ... >> ... ... >> rs480725 chrX [154903224, 154903224] * >> | <na> >> X:154925893_C/T chrX [154925893, 154925893] * >> | <na> >> X:155038107_C/G chrX [155038107, 155038107] * >> | <na> >> X:155204257_G/T chrX [155204257, 155204257] * >> | <na> >> X:155234730_T/C chrX [155234730, 155234730] * >> | <na> >> REF ALT >> QUAL FILTER >> <dnastringset> <dnastringsetlist> >> <numeric> <character> >> 1:109641_A/G A G >> 8.90 PASS >> 1:526561_T/G T G >> 9.19 PASS >> 1:691958_G/A G A >> 13.74 PASS >> 1:763781_A/T A T >> 16.03 PASS >> rs6594026 C T >> 11.24 PASS >> ... ... ... >> ... ... >> rs480725 A T >> 6.39 PASS >> X:154925893_C/T C T >> 6.53 PASS >> X:155038107_C/G C G >> 6.64 PASS >> X:155204257_G/T G T >> 6.35 PASS >> X:155234730_T/C T C >> 6.51 PASS >> --- >> seqlengths: >> chr1 chr10 chr11 chr12 chr13 chr14 ... chr5 chr6 >> chr7 chr8 chr9 chrX >> NA NA NA NA NA NA ... NA NA >> NA NA NA NA >> >> >> Can the information be extracted using VariantAnnotation()? >> I would appreciate your help with this. >> Thanks ../Murli >> >> >> >> -- output of sessionInfo(): >> >> sessionInfo() >> >> R version 3.0.2 (2013-09-25) >> Platform: x86_64-redhat-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils >> datasets methods >> [8] base >> >> other attached packages: >> [1] TxDb.Hsapiens.UCSC.hg19.__knownGene_2.10.1 >> [2] GenomicFeatures_1.14.5 >> [3] AnnotationDbi_1.24.0 >> [4] Biobase_2.22.0 >> [5] VariantAnnotation_1.8.13 >> [6] Rsamtools_1.14.3 >> [7] Biostrings_2.30.1 >> [8] GenomicRanges_1.14.4 >> [9] XVector_0.2.0 >> [10] IRanges_1.20.7 >> [11] BiocGenerics_0.8.0 >> >> loaded via a namespace (and not attached): >> [1] biomaRt_2.18.0 bitops_1.0-6 BSgenome_1.30.0 >> DBI_0.2-7 >> [5] RCurl_1.95-4.1 RSQLite_0.11.4 >> rtracklayer_1.22.7 stats4_3.0.2 >> [9] tools_3.0.2 XML_3.98-1.1 zlibbioc_1.8.0 >> >> >> -- >> Sent via the guest posting facility at bioconductor.org >> <http: bioconductor.org="">. >> >> >> >> -- >> Valerie Obenchain >> Program in Computational Biology >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, Seattle, WA 98109 >> >> Email: vobencha at fhcrc.org <mailto:vobencha at="" fhcrc.org=""> >> Phone: (206) 667-3158 <tel:%28206%29%20667-3158> >> >> >> > > -- Valerie Obenchain Program in Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, Seattle, WA 98109 Email: vobencha at fhcrc.org Phone: (206) 667-3158
ADD COMMENT
0
Entering edit mode
Thanks Valerie. Shall use the new version and go from there. Cheers../Murli On Sat, Jun 14, 2014 at 11:16 PM, Valerie Obenchain <vobencha@fhcrc.org> wrote: > Thanks for providing the test files. The subset bug you hit was fixed in > devel a few weeks ago. I've ported it to release as 1.10.2 which should be > available via biocLite() Monday noon-ish or from svn immediately. > > With 1.10.2 I can read the snv file fine: > > vr <- readVcfAsVRanges("3_Middle_lobe_tumor--2_mucosal_normal. > snv.mutect.v1.1.4.annotated.vcf", "") > > vr[1:3,] >>> >> VRanges with 3 ranges and 0 metadata columns: >> seqnames ranges strand ref alt >> <rle> <iranges> <rle> <character> <characterorrle> >> [1] 1 [109641, 109641] + A G >> [2] 1 [526561, 526561] + T G >> [3] 1 [691958, 691958] + G A >> totalDepth refDepth altDepth sampleNames >> <integerorrle> <integerorrle> <integerorrle> <factororrle> >> [1] <na> 31 5 3_Middle_lobe_tumor >> [2] <na> 56 5 3_Middle_lobe_tumor >> [3] <na> 41 7 3_Middle_lobe_tumor >> softFilterMatrix >> <matrix> >> [1] >> [2] >> [3] >> > > Trying to read the indel file gives this error: > > > vr <- readVcfAsVRanges("3_Middle_lobe_tumor-- 2_mucosal_normal.indel.sominddet.v2.3-9.vcf", > "") > Error in validObject(.Object) : > invalid class “VRanges” object: 'refDepth' must be non- negative > > > Look at the depth variable only with readGeno(). > > > fl <- "3_Middle_lobe_tumor- 2_mucosal_normal.indel.sominddet.v2.3-9.vcf" > > ad <- readGeno(fl, "AD") > > dim(ad) > [1] 547698 2 2 > > min(ad) > [1] -23 > > The negative depth may need some investigating. > > Valerie > > sessionInfo() >>> >> R version 3.1.0 Patched (2014-04-18 r65405) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] VariantAnnotation_1.10.2 Rsamtools_1.16.0 Biostrings_2.32.0 >> >> [4] XVector_0.4.0 GenomicRanges_1.16.3 GenomeInfoDb_1.0.2 >> [7] IRanges_1.22.9 BiocGenerics_0.10.0 >> >> loaded via a namespace (and not attached): >> [1] AnnotationDbi_1.26.0 BatchJobs_1.2 BBmisc_1.6 >> [4] Biobase_2.24.0 BiocParallel_0.6.1 biomaRt_2.20.0 >> [7] bitops_1.0-6 brew_1.0-6 BSgenome_1.32.0 >> [10] codetools_0.2-8 DBI_0.2-7 digest_0.6.4 >> [13] fail_1.2 foreach_1.4.2 >> GenomicAlignments_1.0.1 >> [16] GenomicFeatures_1.16.2 iterators_1.0.7 plyr_1.8.1 >> [19] Rcpp_0.11.2 RCurl_1.95-4.1 RSQLite_0.11.4 >> [22] rtracklayer_1.24.2 sendmailR_1.1-2 stats4_3.1.0 >> [25] stringr_0.6.2 tools_3.1.0 XML_3.98-1.1 >> [28] zlibbioc_1.10.0 >> > > > On 06/13/2014 04:33 PM, Valerie Obenchain wrote: > >> Hi, >> >> I see you've got an old version of the package. The release version is >> 1.10.1. FYI you can see all release versions here: >> >> http://www.bioconductor.org/checkResults/release/bioc-LATEST/ >> >> Please update R and reinstall packages with biocLite(). If you need any >> help with that guidelines are here: >> http://www.bioconductor.org/install/ >> >> If you want to send me a small test file offline I can confirm that the >> current version can read it. >> >> >> Valerie >> >> >> On 06/13/2014 03:06 PM, Murli wrote: >> >>> Sorry, forgot to post the code contained in the R file in the earlier >>> post. These are the few lines in vcfToCravat.R that I am sourcing. >>> library(VariantAnnotation) >>> DataDir="../Project_NAI_01117_TNWGS/Sample_3_Middle_lobe_ >>> tumor/analysis/" >>> vcfFile=paste(DataDir,"3_Middle_lobe_tumor--2_mucosal_ >>> normal.snv.mutect.v1.1.4.annotated.vcf",sep >>> >>> ="") >>> vcfFile=paste(ataDir,"3_Middle_lobe_tumor--2_mucosal_ >>> normal.indel.sominddet.v2.3-9.vcf",sep="") >>> >>> vr <- readVcfAsVRanges(vcfFile, "hg19") >>> df <- as.data.frame(vr) >>> >>> >>> On Fri, Jun 13, 2014 at 5:47 PM, Murli <murlinair@gmail.com>>> <mailto:murlinair@gmail.com>> wrote: >>> >>> Hi Valerie, >>> Thanks for the help. I am getting the following errors when I am >>> reading the vcf files. >>> vr <- readVcfAsVRanges(vcfFile, "hg19") >>> Error in lapply(ivar[inms], drop) : >>> error in evaluating the argument 'X' in selecting a method for >>> function 'lapply': Error in normalizeSingleBracketSubscript(j, x) : >>> subscript contains invalid names >>> >>> With another file I am getting the following >>> > source("vcfToCravat.R") >>> Error in validObject(.Object) : >>> invalid class VRanges >>> >>> Cheers../Murli >>> >>> >>> On Fri, Jun 13, 2014 at 3:29 PM, Valerie Obenchain >>> <vobencha@fhcrc.org <mailto:vobencha@fhcrc.org="">> wrote: >>> >>> Hi, >>> >>> Use readVcfAsVRanges() then coerce to a data.frame. >>> >>> fl <- system.file("extdata", "chr7-sub.vcf.gz", >>> package="VariantAnnotation") >>> vr <- readVcfAsVRanges(fl, "hg19") >>> df <- as.data.frame(vr) >>> >>> You'll have some extra columns in the data.frame but you can >>> remove / rename columns as necessary. >>> >>> Valerie >>> >>> >>> >>> >>> >>> On 06/13/2014 10:46 AM, Murli [guest] wrote: >>> >>> Hi, >>> I am interested in extracting information for functional >>> annotation using CRAVAT. It requires the data to be in the >>> following format. >>> ==============================__============= >>> >>> # UID / Chr. / Position / Strand / Ref. base / Alt. base / >>> Sample ID (optional) >>> TR1 chr17 7577506 - G T TCGA-02-0231 >>> TR2 chr10 123279680 - G A >>> TCGA-02-3512 >>> TR3 chr13 49033967 + C A >>> TCGA-02-3532 >>> TR4 chr7 116417505 + G T >>> TCGA-02-1523 >>> TR5 chr7 140453136 - T A >>> TCGA-02-0023 >>> TR6 chr17 37880998 + G T >>> TCGA-02-0252 >>> Ins1 chr17 37880998 + G GT >>> TCGA-02-0252 >>> Del1 chr17 37880998 + GA G >>> TCGA-02-0252 >>> CSub1 chr2 39871235 + ATGCT GA >>> TCGA-02-0252 >>> >>> ==============================__================= >>> >>> http://www.cravat.us/help.jsp?__chapter=how_to_cite&article=# >>> >>> <http: www.cravat.us="" help.jsp?chapter="how_to_cite&amp;article=#"> >>> >>> I am trying to extract this information from vcf files >>> generated by mutect. I am using VariantAnnotation extract >>> this information. I have read the file using readVcf(), and >>> renamed the chromosomes according to txdb. >>> >>> rowData(newVcfData) >>> GRanges with 62991 ranges and 5 metadata columns: >>> seqnames ranges strand >>> | paramRangeID >>> <rle> <iranges> <rle> >>> | <factor> >>> 1:109641_A/G chr1 [109641, 109641] * >>> | <na> >>> 1:526561_T/G chr1 [526561, 526561] * >>> | <na> >>> 1:691958_G/A chr1 [691958, 691958] * >>> | <na> >>> 1:763781_A/T chr1 [763781, 763781] * >>> | <na> >>> rs6594026 chr1 [782981, 782981] * >>> | <na> >>> ... ... ... ... >>> ... ... >>> rs480725 chrX [154903224, 154903224] * >>> | <na> >>> X:154925893_C/T chrX [154925893, 154925893] * >>> | <na> >>> X:155038107_C/G chrX [155038107, 155038107] * >>> | <na> >>> X:155204257_G/T chrX [155204257, 155204257] * >>> | <na> >>> X:155234730_T/C chrX [155234730, 155234730] * >>> | <na> >>> REF ALT >>> QUAL FILTER >>> <dnastringset> <dnastringsetlist> >>> <numeric> <character> >>> 1:109641_A/G A G >>> 8.90 PASS >>> 1:526561_T/G T G >>> 9.19 PASS >>> 1:691958_G/A G A >>> 13.74 PASS >>> 1:763781_A/T A T >>> 16.03 PASS >>> rs6594026 C T >>> 11.24 PASS >>> ... ... ... >>> ... ... >>> rs480725 A T >>> 6.39 PASS >>> X:154925893_C/T C T >>> 6.53 PASS >>> X:155038107_C/G C G >>> 6.64 PASS >>> X:155204257_G/T G T >>> 6.35 PASS >>> X:155234730_T/C T C >>> 6.51 PASS >>> --- >>> seqlengths: >>> chr1 chr10 chr11 chr12 chr13 chr14 ... chr5 chr6 >>> chr7 chr8 chr9 chrX >>> NA NA NA NA NA NA ... NA NA >>> NA NA NA NA >>> >>> >>> Can the information be extracted using VariantAnnotation()? >>> I would appreciate your help with this. >>> Thanks ../Murli >>> >>> >>> >>> -- output of sessionInfo(): >>> >>> sessionInfo() >>> >>> R version 3.0.2 (2013-09-25) >>> Platform: x86_64-redhat-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils >>> datasets methods >>> [8] base >>> >>> other attached packages: >>> [1] TxDb.Hsapiens.UCSC.hg19.__knownGene_2.10.1 >>> >>> [2] GenomicFeatures_1.14.5 >>> [3] AnnotationDbi_1.24.0 >>> [4] Biobase_2.22.0 >>> [5] VariantAnnotation_1.8.13 >>> [6] Rsamtools_1.14.3 >>> [7] Biostrings_2.30.1 >>> [8] GenomicRanges_1.14.4 >>> [9] XVector_0.2.0 >>> [10] IRanges_1.20.7 >>> [11] BiocGenerics_0.8.0 >>> >>> loaded via a namespace (and not attached): >>> [1] biomaRt_2.18.0 bitops_1.0-6 BSgenome_1.30.0 >>> DBI_0.2-7 >>> [5] RCurl_1.95-4.1 RSQLite_0.11.4 >>> rtracklayer_1.22.7 stats4_3.0.2 >>> [9] tools_3.0.2 XML_3.98-1.1 zlibbioc_1.8.0 >>> >>> >>> -- >>> Sent via the guest posting facility at bioconductor.org >>> <http: bioconductor.org="">. >>> >>> >>> >>> >>> -- >>> Valerie Obenchain >>> Program in Computational Biology >>> Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N, Seattle, WA 98109 >>> >>> Email: vobencha@fhcrc.org <mailto:vobencha@fhcrc.org> >>> Phone: (206) 667-3158 <tel:%28206%29%20667-3158> >>> >>> >>> >>> >> >> > > -- > Valerie Obenchain > Program in Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, Seattle, WA 98109 > > Email: vobencha@fhcrc.org > Phone: (206) 667-3158 > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.9 years ago
United States
On 06/14/14 20:16, Valerie Obenchain wrote: > Thanks for providing the test files. The subset bug you hit was fixed in > devel a few weeks ago. I've ported it to release as 1.10.2 which should > be available via biocLite() Monday noon-ish or from svn immediately. > > With 1.10.2 I can read the snv file fine: > > vr <- > readVcfAsVRanges("3_Middle_lobe_tumor-- 2_mucosal_normal.snv.mutect.v1.1.4.annotated.vcf", > "") > >>> vr[1:3,] >> VRanges with 3 ranges and 0 metadata columns: >> seqnames ranges strand ref alt >> <rle> <iranges> <rle> <character> <characterorrle> >> [1] 1 [109641, 109641] + A G >> [2] 1 [526561, 526561] + T G >> [3] 1 [691958, 691958] + G A >> totalDepth refDepth altDepth sampleNames >> <integerorrle> <integerorrle> <integerorrle> <factororrle> >> [1] <na> 31 5 3_Middle_lobe_tumor >> [2] <na> 56 5 3_Middle_lobe_tumor >> [3] <na> 41 7 3_Middle_lobe_tumor >> softFilterMatrix >> <matrix> >> [1] >> [2] >> [3] > > Trying to read the indel file gives this error: > > > vr <- > readVcfAsVRanges("3_Middle_lobe_tumor-- 2_mucosal_normal.indel.sominddet.v2.3-9.vcf", > "") > Error in validObject(.Object) : > invalid class ?VRanges? object: 'refDepth' must be non-negative > > > Look at the depth variable only with readGeno(). > > > fl <- "3_Middle_lobe_tumor- 2_mucosal_normal.indel.sominddet.v2.3-9.vcf" > > ad <- readGeno(fl, "AD") > > dim(ad) > [1] 547698 2 2 > > min(ad) > [1] -23 > > The negative depth may need some investigating. Sorry, this isn't clear. What I should have said was readVcf() readVcfAsVRanges() operate as expected on this file. readVcfAsVRanges() requires AD to be positive if it is present. If the negative AD values don't concern you another approach would be to use readVcf() with a param that drops all info and geno variables. param <- ScanVcfParam(info=NA, geno=NA) vcf <- readVcf(fl, "genome", param=param) Then coerce to a VRanges. Valerie > > Valerie > >>> sessionInfo() >> R version 3.1.0 Patched (2014-04-18 r65405) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] VariantAnnotation_1.10.2 Rsamtools_1.16.0 Biostrings_2.32.0 >> [4] XVector_0.4.0 GenomicRanges_1.16.3 GenomeInfoDb_1.0.2 >> [7] IRanges_1.22.9 BiocGenerics_0.10.0 >> >> loaded via a namespace (and not attached): >> [1] AnnotationDbi_1.26.0 BatchJobs_1.2 BBmisc_1.6 >> [4] Biobase_2.24.0 BiocParallel_0.6.1 biomaRt_2.20.0 >> [7] bitops_1.0-6 brew_1.0-6 BSgenome_1.32.0 >> [10] codetools_0.2-8 DBI_0.2-7 digest_0.6.4 >> [13] fail_1.2 foreach_1.4.2 >> GenomicAlignments_1.0.1 >> [16] GenomicFeatures_1.16.2 iterators_1.0.7 plyr_1.8.1 >> [19] Rcpp_0.11.2 RCurl_1.95-4.1 RSQLite_0.11.4 >> [22] rtracklayer_1.24.2 sendmailR_1.1-2 stats4_3.1.0 >> [25] stringr_0.6.2 tools_3.1.0 XML_3.98-1.1 >> [28] zlibbioc_1.10.0 > > > On 06/13/2014 04:33 PM, Valerie Obenchain wrote: >> Hi, >> >> I see you've got an old version of the package. The release version is >> 1.10.1. FYI you can see all release versions here: >> >> http://www.bioconductor.org/checkResults/release/bioc-LATEST/ >> >> Please update R and reinstall packages with biocLite(). If you need any >> help with that guidelines are here: >> http://www.bioconductor.org/install/ >> >> If you want to send me a small test file offline I can confirm that the >> current version can read it. >> >> >> Valerie >> >> >> On 06/13/2014 03:06 PM, Murli wrote: >>> Sorry, forgot to post the code contained in the R file in the earlier >>> post. These are the few lines in vcfToCravat.R that I am sourcing. >>> library(VariantAnnotation) >>> DataDir="../Project_NAI_01117_TNWGS/Sample_3_Middle_lobe_tumor/ana lysis/" >>> >>> vcfFile=paste(DataDir,"3_Middle_lobe_tumor-- 2_mucosal_normal.snv.mutect.v1.1.4.annotated.vcf",sep >>> >>> >>> ="") >>> vcfFile=paste(ataDir,"3_Middle_lobe_tumor-- 2_mucosal_normal.indel.sominddet.v2.3-9.vcf",sep="") >>> >>> >>> vr <- readVcfAsVRanges(vcfFile, "hg19") >>> df <- as.data.frame(vr) >>> >>> >>> On Fri, Jun 13, 2014 at 5:47 PM, Murli <murlinair at="" gmail.com="">>> <mailto:murlinair at="" gmail.com="">> wrote: >>> >>> Hi Valerie, >>> Thanks for the help. I am getting the following errors when I am >>> reading the vcf files. >>> vr <- readVcfAsVRanges(vcfFile, "hg19") >>> Error in lapply(ivar[inms], drop) : >>> error in evaluating the argument 'X' in selecting a method for >>> function 'lapply': Error in normalizeSingleBracketSubscript(j, x) : >>> subscript contains invalid names >>> >>> With another file I am getting the following >>> > source("vcfToCravat.R") >>> Error in validObject(.Object) : >>> invalid class VRanges >>> >>> Cheers../Murli >>> >>> >>> On Fri, Jun 13, 2014 at 3:29 PM, Valerie Obenchain >>> <vobencha at="" fhcrc.org="" <mailto:vobencha="" at="" fhcrc.org="">> wrote: >>> >>> Hi, >>> >>> Use readVcfAsVRanges() then coerce to a data.frame. >>> >>> fl <- system.file("extdata", "chr7-sub.vcf.gz", >>> package="VariantAnnotation") >>> vr <- readVcfAsVRanges(fl, "hg19") >>> df <- as.data.frame(vr) >>> >>> You'll have some extra columns in the data.frame but you can >>> remove / rename columns as necessary. >>> >>> Valerie >>> >>> >>> >>> >>> >>> On 06/13/2014 10:46 AM, Murli [guest] wrote: >>> >>> Hi, >>> I am interested in extracting information for functional >>> annotation using CRAVAT. It requires the data to be in the >>> following format. >>> ==============================__============= >>> # UID / Chr. / Position / Strand / Ref. base / Alt. base / >>> Sample ID (optional) >>> TR1 chr17 7577506 - G T TCGA-02-0231 >>> TR2 chr10 123279680 - G A >>> TCGA-02-3512 >>> TR3 chr13 49033967 + C A >>> TCGA-02-3532 >>> TR4 chr7 116417505 + G T >>> TCGA-02-1523 >>> TR5 chr7 140453136 - T A >>> TCGA-02-0023 >>> TR6 chr17 37880998 + G T >>> TCGA-02-0252 >>> Ins1 chr17 37880998 + G GT >>> TCGA-02-0252 >>> Del1 chr17 37880998 + GA G >>> TCGA-02-0252 >>> CSub1 chr2 39871235 + ATGCT GA >>> TCGA-02-0252 >>> >>> ==============================__================= >>> >>> http://www.cravat.us/help.jsp?__chapter=how_to_cite&article=# >>> <http: www.cravat.us="" help.jsp?chapter="how_to_cite&amp;article=#"> >>> >>> I am trying to extract this information from vcf files >>> generated by mutect. I am using VariantAnnotation extract >>> this information. I have read the file using readVcf(), and >>> renamed the chromosomes according to txdb. >>> >>> rowData(newVcfData) >>> GRanges with 62991 ranges and 5 metadata columns: >>> seqnames ranges strand >>> | paramRangeID >>> <rle> <iranges> <rle> >>> | <factor> >>> 1:109641_A/G chr1 [109641, 109641] * >>> | <na> >>> 1:526561_T/G chr1 [526561, 526561] * >>> | <na> >>> 1:691958_G/A chr1 [691958, 691958] * >>> | <na> >>> 1:763781_A/T chr1 [763781, 763781] * >>> | <na> >>> rs6594026 chr1 [782981, 782981] * >>> | <na> >>> ... ... ... ... >>> ... ... >>> rs480725 chrX [154903224, 154903224] * >>> | <na> >>> X:154925893_C/T chrX [154925893, 154925893] * >>> | <na> >>> X:155038107_C/G chrX [155038107, 155038107] * >>> | <na> >>> X:155204257_G/T chrX [155204257, 155204257] * >>> | <na> >>> X:155234730_T/C chrX [155234730, 155234730] * >>> | <na> >>> REF ALT >>> QUAL FILTER >>> <dnastringset> <dnastringsetlist> >>> <numeric> <character> >>> 1:109641_A/G A G >>> 8.90 PASS >>> 1:526561_T/G T G >>> 9.19 PASS >>> 1:691958_G/A G A >>> 13.74 PASS >>> 1:763781_A/T A T >>> 16.03 PASS >>> rs6594026 C T >>> 11.24 PASS >>> ... ... ... >>> ... ... >>> rs480725 A T >>> 6.39 PASS >>> X:154925893_C/T C T >>> 6.53 PASS >>> X:155038107_C/G C G >>> 6.64 PASS >>> X:155204257_G/T G T >>> 6.35 PASS >>> X:155234730_T/C T C >>> 6.51 PASS >>> --- >>> seqlengths: >>> chr1 chr10 chr11 chr12 chr13 chr14 ... chr5 chr6 >>> chr7 chr8 chr9 chrX >>> NA NA NA NA NA NA ... NA NA >>> NA NA NA NA >>> >>> >>> Can the information be extracted using VariantAnnotation()? >>> I would appreciate your help with this. >>> Thanks ../Murli >>> >>> >>> >>> -- output of sessionInfo(): >>> >>> sessionInfo() >>> >>> R version 3.0.2 (2013-09-25) >>> Platform: x86_64-redhat-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils >>> datasets methods >>> [8] base >>> >>> other attached packages: >>> [1] TxDb.Hsapiens.UCSC.hg19.__knownGene_2.10.1 >>> [2] GenomicFeatures_1.14.5 >>> [3] AnnotationDbi_1.24.0 >>> [4] Biobase_2.22.0 >>> [5] VariantAnnotation_1.8.13 >>> [6] Rsamtools_1.14.3 >>> [7] Biostrings_2.30.1 >>> [8] GenomicRanges_1.14.4 >>> [9] XVector_0.2.0 >>> [10] IRanges_1.20.7 >>> [11] BiocGenerics_0.8.0 >>> >>> loaded via a namespace (and not attached): >>> [1] biomaRt_2.18.0 bitops_1.0-6 BSgenome_1.30.0 >>> DBI_0.2-7 >>> [5] RCurl_1.95-4.1 RSQLite_0.11.4 >>> rtracklayer_1.22.7 stats4_3.0.2 >>> [9] tools_3.0.2 XML_3.98-1.1 zlibbioc_1.8.0 >>> >>> >>> -- >>> Sent via the guest posting facility at bioconductor.org >>> <http: bioconductor.org="">. >>> >>> >>> >>> -- >>> Valerie Obenchain >>> Program in Computational Biology >>> Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N, Seattle, WA 98109 >>> >>> Email: vobencha at fhcrc.org <mailto:vobencha at="" fhcrc.org=""> >>> Phone: (206) 667-3158 <tel:%28206%29%20667-3158> >>> >>> >>> >> >> > >
ADD COMMENT

Login before adding your answer.

Traffic: 722 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6