Question

Annotate mutations on DNA level to splice transcripts

0

Entering edit mode

Kasper_holm2 • 0

@013d2701

Last seen 3.7 years ago

Denmark

Hello

I would like to ask for some help on annotating variants in transcripts with predictCoding() in the VariantAnnotation package.

I can get the amino acid change by inputting the gff3 file to make a TxDB object and get variants on gene level.

But I cant get this to work when genes consist of multiple exons or spliced genes. In some gff3 files I see the exon, mRNA and transcript features, but i cannot make VariantAnnotation annotate variants in these transcripts.

Example with spliced genes: See HPV16_genes_PaVE, where part of E1 and part of E2 are spliced together to form E1^E4. This means that the amino acid sequence is changed and in some cases like this the frame may get shifted.

gene E1 have amino acids MADPAGTNGE...

splice transcript E1^E4 have amino acids MADPAAATKYP...

One variant in position 3400 will affect the E2 gene AND the E1^E4 splice transcript, but I can only get the ALT amino acid in gene E2. I'm not sure if the gff3 files are correctly set up to pass the info of the spliced genes to predictCoding() or if it is not possible to do it?

Thank you very much.

Functions used:

library(GenomicFeatures)
library(VariantAnnotation)

txdb <- makeTxDbFromGFF("GFFfile", format="gff3")
vcffile <- readVcf("vcffile.vcf")
faf <- open(FaFile("Ref.fasta"))
vcf_anno <- predictCoding(vcffile, txdb, seqSource = faf)

sessionInfo( )
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=da_DK.UTF-8       LC_NUMERIC=C               LC_TIME=da_DK.UTF-8        LC_COLLATE=da_DK.UTF-8     LC_MONETARY=da_DK.UTF-8   
 [6] LC_MESSAGES=da_DK.UTF-8    LC_PAPER=da_DK.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=da_DK.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] genbankr_1.20.0             Repitools_1.38.0            forcats_0.5.1               stringr_1.4.0              
 [5] dplyr_1.0.7                 purrr_0.3.4                 readr_1.4.0                 tidyr_1.1.3                
 [9] tibble_3.1.2                ggplot2_3.3.5               tidyverse_1.3.1             GenomicFeatures_1.44.0     
[13] AnnotationDbi_1.54.1        VariantAnnotation_1.38.0    Rsamtools_2.8.0             Biostrings_2.60.1          
[17] XVector_0.32.0              SummarizedExperiment_1.22.0 Biobase_2.52.0              GenomicRanges_1.44.0       
[21] GenomeInfoDb_1.28.1         IRanges_2.26.0              S4Vectors_0.30.0            MatrixGenerics_1.4.0       
[25] matrixStats_0.59.0          BiocGenerics_0.38.0        

loaded via a namespace (and not attached):
  [1] colorspace_2.0-2         rjson_0.2.20             ellipsis_0.3.2           DNAcopy_1.66.0           fs_1.5.0                
  [6] rstudioapi_0.13          affyio_1.62.0            bit64_4.0.5              fansi_0.5.0              lubridate_1.7.10        
 [11] xml2_1.3.2               splines_4.1.0            cachem_1.0.5             jsonlite_1.7.2           broom_0.7.8             
 [16] annotate_1.70.0          cluster_2.1.2            vsn_3.60.0               dbplyr_2.1.1             png_0.1-7               
 [21] BiocManager_1.30.16      compiler_4.1.0           httr_1.4.2               backports_1.2.1          assertthat_0.2.1        
 [26] Matrix_1.3-3             fastmap_1.1.0            limma_3.48.1             cli_3.0.0                prettyunits_1.1.1       
 [31] tools_4.1.0              gtable_0.3.0             glue_1.4.2               GenomeInfoDbData_1.2.6   affy_1.70.0             
 [36] rappdirs_0.3.3           tinytex_0.32             Rcpp_1.0.7               cellranger_1.1.0         vctrs_0.3.8             
 [41] preprocessCore_1.54.0    rtracklayer_1.52.0       xfun_0.24                rvest_1.0.0              lifecycle_1.0.0         
 [46] restfulr_0.0.13          gtools_3.9.2             XML_3.99-0.6             edgeR_3.34.0             zlibbioc_1.38.0         
 [51] MASS_7.3-54              scales_1.1.1             BSgenome_1.60.0          hms_1.1.0                yaml_2.2.1              
 [56] curl_4.3.2               memoise_2.0.0            biomaRt_2.48.2           stringi_1.6.2            RSQLite_2.2.7           
 [61] genefilter_1.74.0        BiocIO_1.2.0             caTools_1.18.2           Ringo_1.56.0             filelock_1.0.2          
 [66] BiocParallel_1.26.1      truncnorm_1.0-8          rlang_0.4.11             pkgconfig_2.0.3          bitops_1.0-7            
 [71] Rsolnp_1.16              lattice_0.20-44          GenomicAlignments_1.28.0 bit_4.0.4                tidyselect_1.1.1        
 [76] magrittr_2.0.1           R6_2.5.0                 gplots_3.1.1             generics_0.1.0           DelayedArray_0.18.0     
 [81] DBI_1.1.1                gsmoothr_0.1.7           pillar_1.6.1             haven_2.4.1              withr_2.4.2             
 [86] survival_3.2-11          KEGGREST_1.32.0          RCurl_1.98-1.3           modelr_0.1.8             crayon_1.4.1            
 [91] KernSmooth_2.23-20       utf8_1.2.1               BiocFileCache_2.0.0      progress_1.2.2           locfit_1.5-9.4          
 [96] grid_4.1.0               readxl_1.3.1             blob_1.2.1               reprex_2.0.0             digest_0.6.27           
[101] xtable_1.8-4             munsell_0.5.0

VariantAnnotation • 812 views

ADD COMMENT • link 3.8 years ago Kasper_holm2 • 0