On 06/14/14 20:16, Valerie Obenchain wrote:
> Thanks for providing the test files. The subset bug you hit was
fixed in
> devel a few weeks ago. I've ported it to release as 1.10.2 which
should
> be available via biocLite() Monday noon-ish or from svn immediately.
>
> With 1.10.2 I can read the snv file fine:
>
> vr <-
> readVcfAsVRanges("3_Middle_lobe_tumor--
2_mucosal_normal.snv.mutect.v1.1.4.annotated.vcf",
> "")
>
>>> vr[1:3,]
>> VRanges with 3 ranges and 0 metadata columns:
>> seqnames ranges strand ref alt
>> <rle> <iranges> <rle> <character> <characterorrle>
>> [1] 1 [109641, 109641] + A G
>> [2] 1 [526561, 526561] + T G
>> [3] 1 [691958, 691958] + G A
>> totalDepth refDepth altDepth
sampleNames
>> <integerorrle> <integerorrle> <integerorrle>
<factororrle>
>> [1] <na> 31 5
3_Middle_lobe_tumor
>> [2] <na> 56 5
3_Middle_lobe_tumor
>> [3] <na> 41 7
3_Middle_lobe_tumor
>> softFilterMatrix
>> <matrix>
>> [1]
>> [2]
>> [3]
>
> Trying to read the indel file gives this error:
>
> > vr <-
> readVcfAsVRanges("3_Middle_lobe_tumor--
2_mucosal_normal.indel.sominddet.v2.3-9.vcf",
> "")
> Error in validObject(.Object) :
> invalid class ?VRanges? object: 'refDepth' must be non-negative
>
>
> Look at the depth variable only with readGeno().
>
> > fl <- "3_Middle_lobe_tumor-
2_mucosal_normal.indel.sominddet.v2.3-9.vcf"
> > ad <- readGeno(fl, "AD")
> > dim(ad)
> [1] 547698 2 2
> > min(ad)
> [1] -23
>
> The negative depth may need some investigating.
Sorry, this isn't clear. What I should have said was readVcf()
readVcfAsVRanges() operate as expected on this file.
readVcfAsVRanges()
requires AD to be positive if it is present. If the negative AD values
don't concern you another approach would be to use readVcf() with a
param that drops all info and geno variables.
param <- ScanVcfParam(info=NA, geno=NA)
vcf <- readVcf(fl, "genome", param=param)
Then coerce to a VRanges.
Valerie
>
> Valerie
>
>>> sessionInfo()
>> R version 3.1.0 Patched (2014-04-18 r65405)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel stats graphics grDevices utils datasets
methods
>> [8] base
>>
>> other attached packages:
>> [1] VariantAnnotation_1.10.2 Rsamtools_1.16.0
Biostrings_2.32.0
>> [4] XVector_0.4.0 GenomicRanges_1.16.3
GenomeInfoDb_1.0.2
>> [7] IRanges_1.22.9 BiocGenerics_0.10.0
>>
>> loaded via a namespace (and not attached):
>> [1] AnnotationDbi_1.26.0 BatchJobs_1.2 BBmisc_1.6
>> [4] Biobase_2.24.0 BiocParallel_0.6.1 biomaRt_2.20.0
>> [7] bitops_1.0-6 brew_1.0-6
BSgenome_1.32.0
>> [10] codetools_0.2-8 DBI_0.2-7 digest_0.6.4
>> [13] fail_1.2 foreach_1.4.2
>> GenomicAlignments_1.0.1
>> [16] GenomicFeatures_1.16.2 iterators_1.0.7 plyr_1.8.1
>> [19] Rcpp_0.11.2 RCurl_1.95-4.1 RSQLite_0.11.4
>> [22] rtracklayer_1.24.2 sendmailR_1.1-2 stats4_3.1.0
>> [25] stringr_0.6.2 tools_3.1.0 XML_3.98-1.1
>> [28] zlibbioc_1.10.0
>
>
> On 06/13/2014 04:33 PM, Valerie Obenchain wrote:
>> Hi,
>>
>> I see you've got an old version of the package. The release version
is
>> 1.10.1. FYI you can see all release versions here:
>>
>>
http://www.bioconductor.org/checkResults/release/bioc-LATEST/
>>
>> Please update R and reinstall packages with biocLite(). If you need
any
>> help with that guidelines are here:
>>
http://www.bioconductor.org/install/
>>
>> If you want to send me a small test file offline I can confirm that
the
>> current version can read it.
>>
>>
>> Valerie
>>
>>
>> On 06/13/2014 03:06 PM, Murli wrote:
>>> Sorry, forgot to post the code contained in the R file in the
earlier
>>> post. These are the few lines in vcfToCravat.R that I am sourcing.
>>> library(VariantAnnotation)
>>> DataDir="../Project_NAI_01117_TNWGS/Sample_3_Middle_lobe_tumor/ana
lysis/"
>>>
>>> vcfFile=paste(DataDir,"3_Middle_lobe_tumor--
2_mucosal_normal.snv.mutect.v1.1.4.annotated.vcf",sep
>>>
>>>
>>> ="")
>>> vcfFile=paste(ataDir,"3_Middle_lobe_tumor--
2_mucosal_normal.indel.sominddet.v2.3-9.vcf",sep="")
>>>
>>>
>>> vr <- readVcfAsVRanges(vcfFile, "hg19")
>>> df <- as.data.frame(vr)
>>>
>>>
>>> On Fri, Jun 13, 2014 at 5:47 PM, Murli <murlinair at="" gmail.com="">>> <mailto:murlinair at="" gmail.com="">> wrote:
>>>
>>> Hi Valerie,
>>> Thanks for the help. I am getting the following errors when I
am
>>> reading the vcf files.
>>> vr <- readVcfAsVRanges(vcfFile, "hg19")
>>> Error in lapply(ivar[inms], drop) :
>>> error in evaluating the argument 'X' in selecting a method
for
>>> function 'lapply': Error in normalizeSingleBracketSubscript(j,
x) :
>>> subscript contains invalid names
>>>
>>> With another file I am getting the following
>>> > source("vcfToCravat.R")
>>> Error in validObject(.Object) :
>>> invalid class VRanges
>>>
>>> Cheers../Murli
>>>
>>>
>>> On Fri, Jun 13, 2014 at 3:29 PM, Valerie Obenchain
>>> <vobencha at="" fhcrc.org="" <mailto:vobencha="" at="" fhcrc.org="">> wrote:
>>>
>>> Hi,
>>>
>>> Use readVcfAsVRanges() then coerce to a data.frame.
>>>
>>> fl <- system.file("extdata", "chr7-sub.vcf.gz",
>>> package="VariantAnnotation")
>>> vr <- readVcfAsVRanges(fl, "hg19")
>>> df <- as.data.frame(vr)
>>>
>>> You'll have some extra columns in the data.frame but you
can
>>> remove / rename columns as necessary.
>>>
>>> Valerie
>>>
>>>
>>>
>>>
>>>
>>> On 06/13/2014 10:46 AM, Murli [guest] wrote:
>>>
>>> Hi,
>>> I am interested in extracting information for
functional
>>> annotation using CRAVAT. It requires the data to be in
the
>>> following format.
>>> ==============================__=============
>>> # UID / Chr. / Position / Strand / Ref. base / Alt.
base /
>>> Sample ID (optional)
>>> TR1 chr17 7577506 - G T
TCGA-02-0231
>>> TR2 chr10 123279680 - G A
>>> TCGA-02-3512
>>> TR3 chr13 49033967 + C A
>>> TCGA-02-3532
>>> TR4 chr7 116417505 + G T
>>> TCGA-02-1523
>>> TR5 chr7 140453136 - T A
>>> TCGA-02-0023
>>> TR6 chr17 37880998 + G T
>>> TCGA-02-0252
>>> Ins1 chr17 37880998 + G GT
>>> TCGA-02-0252
>>> Del1 chr17 37880998 + GA G
>>> TCGA-02-0252
>>> CSub1 chr2 39871235 + ATGCT GA
>>> TCGA-02-0252
>>>
>>> ==============================__=================
>>>
>>>
http://www.cravat.us/help.jsp?__chapter=how_to_cite&article=#
>>> <http: www.cravat.us="" help.jsp?chapter="how_to_cite&article=#">
>>>
>>> I am trying to extract this information from vcf files
>>> generated by mutect. I am using VariantAnnotation
extract
>>> this information. I have read the file using
readVcf(), and
>>> renamed the chromosomes according to txdb.
>>>
>>> rowData(newVcfData)
>>> GRanges with 62991 ranges and 5 metadata columns:
>>> seqnames ranges
strand
>>> | paramRangeID
>>> <rle> <iranges>
<rle>
>>> | <factor>
>>> 1:109641_A/G chr1 [109641, 109641]
*
>>> | <na>
>>> 1:526561_T/G chr1 [526561, 526561]
*
>>> | <na>
>>> 1:691958_G/A chr1 [691958, 691958]
*
>>> | <na>
>>> 1:763781_A/T chr1 [763781, 763781]
*
>>> | <na>
>>> rs6594026 chr1 [782981, 782981]
*
>>> | <na>
>>> ... ... ...
...
>>> ... ...
>>> rs480725 chrX [154903224, 154903224]
*
>>> | <na>
>>> X:154925893_C/T chrX [154925893, 154925893]
*
>>> | <na>
>>> X:155038107_C/G chrX [155038107, 155038107]
*
>>> | <na>
>>> X:155204257_G/T chrX [155204257, 155204257]
*
>>> | <na>
>>> X:155234730_T/C chrX [155234730, 155234730]
*
>>> | <na>
>>> REF ALT
>>> QUAL FILTER
>>> <dnastringset> <dnastringsetlist>
>>> <numeric> <character>
>>> 1:109641_A/G A G
>>> 8.90 PASS
>>> 1:526561_T/G T G
>>> 9.19 PASS
>>> 1:691958_G/A G A
>>> 13.74 PASS
>>> 1:763781_A/T A T
>>> 16.03 PASS
>>> rs6594026 C T
>>> 11.24 PASS
>>> ... ... ...
>>> ... ...
>>> rs480725 A T
>>> 6.39 PASS
>>> X:154925893_C/T C T
>>> 6.53 PASS
>>> X:155038107_C/G C G
>>> 6.64 PASS
>>> X:155204257_G/T G T
>>> 6.35 PASS
>>> X:155234730_T/C T C
>>> 6.51 PASS
>>> ---
>>> seqlengths:
>>> chr1 chr10 chr11 chr12 chr13 chr14 ... chr5
chr6
>>> chr7 chr8 chr9 chrX
>>> NA NA NA NA NA NA ... NA
NA
>>> NA NA NA NA
>>>
>>>
>>> Can the information be extracted using
VariantAnnotation()?
>>> I would appreciate your help with this.
>>> Thanks ../Murli
>>>
>>>
>>>
>>> -- output of sessionInfo():
>>>
>>> sessionInfo()
>>>
>>> R version 3.0.2 (2013-09-25)
>>> Platform: x86_64-redhat-linux-gnu (64-bit)
>>>
>>> locale:
>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>> [3] LC_TIME=en_US.UTF-8
LC_COLLATE=en_US.UTF-8
>>> [5] LC_MONETARY=en_US.UTF-8
LC_MESSAGES=en_US.UTF-8
>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] parallel stats graphics grDevices utils
>>> datasets methods
>>> [8] base
>>>
>>> other attached packages:
>>> [1] TxDb.Hsapiens.UCSC.hg19.__knownGene_2.10.1
>>> [2] GenomicFeatures_1.14.5
>>> [3] AnnotationDbi_1.24.0
>>> [4] Biobase_2.22.0
>>> [5] VariantAnnotation_1.8.13
>>> [6] Rsamtools_1.14.3
>>> [7] Biostrings_2.30.1
>>> [8] GenomicRanges_1.14.4
>>> [9] XVector_0.2.0
>>> [10] IRanges_1.20.7
>>> [11] BiocGenerics_0.8.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] biomaRt_2.18.0 bitops_1.0-6
BSgenome_1.30.0
>>> DBI_0.2-7
>>> [5] RCurl_1.95-4.1 RSQLite_0.11.4
>>> rtracklayer_1.22.7 stats4_3.0.2
>>> [9] tools_3.0.2 XML_3.98-1.1
zlibbioc_1.8.0
>>>
>>>
>>> --
>>> Sent via the guest posting facility at
bioconductor.org
>>> <http: bioconductor.org="">.
>>>
>>>
>>>
>>> --
>>> Valerie Obenchain
>>> Program in Computational Biology
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N, Seattle, WA 98109
>>>
>>> Email: vobencha at fhcrc.org <mailto:vobencha at="" fhcrc.org="">
>>> Phone: (206) 667-3158 <tel:%28206%29%20667-3158>
>>>
>>>
>>>
>>
>>
>
>