VariantAnnotation ALT Field
1
0
Entering edit mode
@samuel-younkin-5497
Last seen 10.2 years ago
I have been looking at the VariantAnnotation vignette and have encountered something strange. The R code is below. See how the ALT field lists only ########. The vignette, however, correctly shows the alternate allele. The data file chr22.vcf.gz also correctly contains the alternate allele information. Any suggestions? Thanks. Sam ~~ > library(VariantAnnotation) > fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation") > vcf <- readVcf(fl, "hg19") > head( fixed(vcf), 3 ) GRanges with 3 ranges and 5 metadata columns: seqnames ranges strand | paramRangeID <rle> <iranges> <rle> | <factor> rs7410291 22 [50300078, 50300078] * | <na> rs147922003 22 [50300086, 50300086] * | <na> rs114143073 22 [50300101, 50300101] * | <na> REF ALT QUAL FILTER <dnastringset> <dnastringsetlist> <numeric> <character> rs7410291 A ######## 100 PASS rs147922003 C ######## 100 PASS rs114143073 G ######## 100 PASS --- seqlengths: 22 NA > sessionInfo() R version 2.15.2 Patched (2012-10-28 r61038) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.iso885915 LC_NUMERIC=C [3] LC_TIME=en_US.iso885915 LC_COLLATE=en_US.iso885915 [5] LC_MONETARY=en_US.iso885915 LC_MESSAGES=en_US.iso885915 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] VariantAnnotation_1.4.5 Rsamtools_1.10.2 Biostrings_2.26.2 [4] GenomicRanges_1.10.5 IRanges_1.16.4 BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] AnnotationDbi_1.20.3 Biobase_2.18.0 biomaRt_2.14.0 [4] bitops_1.0-5 BSgenome_1.26.1 DBI_0.2-5 [7] GenomicFeatures_1.10.1 parallel_2.15.2 RCurl_1.95-3 [10] RSQLite_0.11.2 rtracklayer_1.18.1 stats4_2.15.2 [13] tools_2.15.2 XML_3.95-0.1 zlibbioc_1.4.0 >
VariantAnnotation VariantAnnotation VariantAnnotation VariantAnnotation • 1.5k views
ADD COMMENT
0
Entering edit mode
Paul Shannon ▴ 750
@paul-shannon-5161
Last seen 10.2 years ago
Hi Sam, Here's a quick workaround: fixed(vcf)[ , c("REF", "ALT")] The backstory on this is that the ALT field is a DNAStringSetList which, until very recently (the change is in bioc-devel) displayed itself, via its show methods, as '######'. Realizing this was somewhat less than helpful, the latest version of VariantAnnotation display the alt sequence in a more natural way. But in the meantime, and if you do not use bioc devel, the explicit extraction of REF and ALT demonstrated above should get you part of what you want. - Paul On Nov 21, 2012, at 6:50 AM, Samuel Younkin wrote: > I have been looking at the VariantAnnotation vignette and have encountered something strange. The R code is below. See how the ALT field lists only ########. The vignette, however, correctly shows the alternate allele. The data file chr22.vcf.gz also correctly contains the alternate allele information. > > Any suggestions? > > Thanks. > > Sam > > ~~ > > > library(VariantAnnotation) > > fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation") > > vcf <- readVcf(fl, "hg19") > > head( fixed(vcf), 3 ) > GRanges with 3 ranges and 5 metadata columns: > seqnames ranges strand | paramRangeID > <rle> <iranges> <rle> | <factor> > rs7410291 22 [50300078, 50300078] * | <na> > rs147922003 22 [50300086, 50300086] * | <na> > rs114143073 22 [50300101, 50300101] * | <na> > REF ALT QUAL FILTER > <dnastringset> <dnastringsetlist> <numeric> <character> > rs7410291 A ######## 100 PASS > rs147922003 C ######## 100 PASS > rs114143073 G ######## 100 PASS > --- > seqlengths: > 22 > NA > > sessionInfo() > R version 2.15.2 Patched (2012-10-28 r61038) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.iso885915 LC_NUMERIC=C > [3] LC_TIME=en_US.iso885915 LC_COLLATE=en_US.iso885915 > [5] LC_MONETARY=en_US.iso885915 LC_MESSAGES=en_US.iso885915 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices datasets utils methods base > > other attached packages: > [1] VariantAnnotation_1.4.5 Rsamtools_1.10.2 Biostrings_2.26.2 > [4] GenomicRanges_1.10.5 IRanges_1.16.4 BiocGenerics_0.4.0 > > loaded via a namespace (and not attached): > [1] AnnotationDbi_1.20.3 Biobase_2.18.0 biomaRt_2.14.0 > [4] bitops_1.0-5 BSgenome_1.26.1 DBI_0.2-5 > [7] GenomicFeatures_1.10.1 parallel_2.15.2 RCurl_1.95-3 > [10] RSQLite_0.11.2 rtracklayer_1.18.1 stats4_2.15.2 > [13] tools_2.15.2 XML_3.95-0.1 zlibbioc_1.4.0 > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Also, Val has added in bioc-devel the ExpandedVCF class which is like the old VCF, except one row per position+alt, so that there is a single ALT per row, and positions can occur multiple times, one for each ALT. This simplifies the DNAStringSetList column to a DNAStringSet, which is much easier to manipulate. Michael On Wed, Nov 21, 2012 at 9:19 AM, Paul Shannon <pshannon@fhcrc.org> wrote: > Hi Sam, > > Here's a quick workaround: > > fixed(vcf)[ , c("REF", "ALT")] > > The backstory on this is that the ALT field is a DNAStringSetList which, > until very recently (the change is in bioc-devel) displayed itself, via its > show methods, as '######'. Realizing this was somewhat less than helpful, > the latest version of VariantAnnotation display the alt sequence in a more > natural way. > > But in the meantime, and if you do not use bioc devel, the explicit > extraction of REF and ALT demonstrated above should get you part of what > you want. > > - Paul > > > On Nov 21, 2012, at 6:50 AM, Samuel Younkin wrote: > > > I have been looking at the VariantAnnotation vignette and have > encountered something strange. The R code is below. See how the ALT field > lists only ########. The vignette, however, correctly shows the alternate > allele. The data file chr22.vcf.gz also correctly contains the alternate > allele information. > > > > Any suggestions? > > > > Thanks. > > > > Sam > > > > ~~ > > > > > library(VariantAnnotation) > > > fl <- system.file("extdata", "chr22.vcf.gz", > package="VariantAnnotation") > > > vcf <- readVcf(fl, "hg19") > > > head( fixed(vcf), 3 ) > > GRanges with 3 ranges and 5 metadata columns: > > seqnames ranges strand | paramRangeID > > <rle> <iranges> <rle> | <factor> > > rs7410291 22 [50300078, 50300078] * | <na> > > rs147922003 22 [50300086, 50300086] * | <na> > > rs114143073 22 [50300101, 50300101] * | <na> > > REF ALT QUAL FILTER > > <dnastringset> <dnastringsetlist> <numeric> <character> > > rs7410291 A ######## 100 PASS > > rs147922003 C ######## 100 PASS > > rs114143073 G ######## 100 PASS > > --- > > seqlengths: > > 22 > > NA > > > sessionInfo() > > R version 2.15.2 Patched (2012-10-28 r61038) > > Platform: x86_64-unknown-linux-gnu (64-bit) > > > > locale: > > [1] LC_CTYPE=en_US.iso885915 LC_NUMERIC=C > > [3] LC_TIME=en_US.iso885915 LC_COLLATE=en_US.iso885915 > > [5] LC_MONETARY=en_US.iso885915 LC_MESSAGES=en_US.iso885915 > > [7] LC_PAPER=C LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C > > > > attached base packages: > > [1] stats graphics grDevices datasets utils methods base > > > > other attached packages: > > [1] VariantAnnotation_1.4.5 Rsamtools_1.10.2 Biostrings_2.26.2 > > [4] GenomicRanges_1.10.5 IRanges_1.16.4 BiocGenerics_0.4.0 > > > > loaded via a namespace (and not attached): > > [1] AnnotationDbi_1.20.3 Biobase_2.18.0 biomaRt_2.14.0 > > [4] bitops_1.0-5 BSgenome_1.26.1 DBI_0.2-5 > > [7] GenomicFeatures_1.10.1 parallel_2.15.2 RCurl_1.95-3 > > [10] RSQLite_0.11.2 rtracklayer_1.18.1 stats4_2.15.2 > > [13] tools_2.15.2 XML_3.95-0.1 zlibbioc_1.4.0 > > > > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 647 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6