Add data to Vcf Info Field
1
0
Entering edit mode
@moiz-bootwalla-5215
Last seen 9.8 years ago
United States

I have a vcf file to which I'm trying to add additional annotations. I wanted to know how I can add additional fields to the Info DataFrame. Is there a helper function that allows me to do so in a straightforward manner or do I need to directly manipulate the Info DataFrame itself?

Thanks,

Moiz

> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-suse-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] RMySQL_0.9-3                            AnnotationForge_1.8.1                   human.db0_3.0.0                        
 [4] Homo.sapiens_1.1.2                      TxDb.Hsapiens.UCSC.hg19.knownGene_3.0.0 org.Hs.eg.db_3.0.0                     
 [7] GO.db_3.0.0                             RSQLite_0.11.4                          DBI_0.3.1                              
[10] OrganismDbi_1.8.0                       GenomicFeatures_1.18.1                  AnnotationDbi_1.28.0                   
[13] Biobase_2.26.0                          VariantAnnotation_1.12.1                Rsamtools_1.18.0                       
[16] Biostrings_2.34.0                       XVector_0.6.0                           GenomicRanges_1.18.1                   
[19] GenomeInfoDb_1.2.0                      IRanges_2.0.0                           S4Vectors_0.4.0                        
[22] BiocGenerics_0.12.0                     BiocInstaller_1.16.0                   

loaded via a namespace (and not attached):
 [1] base64enc_0.1-2         BatchJobs_1.4           BBmisc_1.7              BiocParallel_1.0.0      biomaRt_2.22.0         
 [6] bitops_1.0-6            brew_1.0-6              BSgenome_1.34.0         checkmate_1.5.0         codetools_0.2-8        
[11] digest_0.6.4            evaluate_0.5.5          fail_1.2                foreach_1.4.2           formatR_1.0            
[16] GenomicAlignments_1.2.0 graph_1.44.0            iterators_1.0.7         knitr_1.7               RBGL_1.42.0            
[21] RCurl_1.95-4.3          rtracklayer_1.26.1      sendmailR_1.2-1         stringr_0.6.2           tools_3.1.1            
[26] XML_3.98-1.1            yaml_2.1.13             zlibbioc_1.12.0

 

variantannotation • 5.1k views
ADD COMMENT
3
Entering edit mode
@valerie-obenchain-4275
Last seen 3.0 years ago
United States

Hi,

You can use the 'info' getter and setter. A list of all getter/setters for the VCF class are on the ?VCF man page.

library(VariantAnnotation)
fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation") 

vcf <- readVcf(fl, "hg19")

> names(info(vcf))
[1] "NS" "DP" "AF" "AA" "DB" "H2"

Use the standard '$' to add a variable. You'll see a warning about no corresponding header information.

> info(vcf)$newVar <- 1:5
Warning message:
info fields with no header: newVar 

> names(info(vcf))
[1] "NS"     "DP"     "AF"     "AA"     "DB"     "H2"     "newVar"

You can add a line to the header DataFrame for 'newVar'. The header is accessed with header():

>info(header(vcf))
DataFrame with 6 rows and 3 columns
        Number        Type                 Description
   <character> <character>                 <character>
NS           1     Integer Number of Samples With Data
DP           1     Integer                 Total Depth
AF           A       Float            Allele Frequency
AA           1      String            Ancestral Allele
DB           0        Flag dbSNP membership, build 129
H2           0        Flag          HapMap2 membership

To remove instead of add variables, use '[' :

> info(vcf) <- info(vcf)[,1:2]
> info(vcf)
DataFrame with 5 rows and 2 columns
                      NS        DP
               <integer> <integer>
rs6054257              3        14
20:17330_T/A           3        11
rs6040355              2        10
20:1230237_T/.         3        13
microsat1              3         9

 

Valerie

 

 

 

ADD COMMENT
0
Entering edit mode

Thanks Valerie. This is exactly what I was looking for.

ADD REPLY

Login before adding your answer.

Traffic: 852 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6