Question

GRanges object out-of-bound

0

Entering edit mode

Kasper_holm2 • 0

@013d2701

Last seen 3.7 years ago

Denmark

Hi, I have trouble using the predictCoding function in VariantAnnotation. I have the following gff3 file, which contains:

A gene which consists of 2 coding regions (865-880 and 3358-3620)

A gene which consists of one coding region (2756-3853)

##gff-version 3
##sequence-region HPV16_K02718_1_revised    1   7906
HPV16_K02718_1_revised  PaVE    region  1   7906    .   .   .   ID=HPV16_K02718_1_revised
HPV16_K02718_1_revised  PaVE    gene    865 3620    .   +   .   ID=E4_splice
HPV16_K02718_1_revised  PaVE    mRNA    865 3620    .   +   .   ID=E4_spliceRNA1;Parent=E4_splice
HPV16_K02718_1_revised  PaVE    exon    865 880 .   +   .   ID=E4exon1;Parent=E4_spliceRNA1
HPV16_K02718_1_revised  PaVE    exon    3358    3620    .   +   .   ID=E4exon2;Parent=E4_spliceRNA1
HPV16_K02718_1_revised  PaVE    CDS 865 880 .   +   0   ID=E4cds1;Parent=E4_spliceRNA1
HPV16_K02718_1_revised  PaVE    CDS 3358    3620    .   +   2   ID=E4cds2;Parent=E4_spliceRNA1
HPV16_K02718_1_revised  PaVE    gene    2756    3853    .   +   .   ID=E2
HPV16_K02718_1_revised  PaVE    mRNA    2756    3853    .   +   .   ID=E2mRNA;Parent=E2
HPV16_K02718_1_revised  PaVE    CDS 2756    3853    .   +   0   ID=E2cds;Parent=E2mRNA

I also have a vcf file with the following variants:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  1
HPV16_K02718_1_revised  3360    .   G   T   5455.04 PASS    AC=1;AF=1.00;AN=1;BaseQRankSum=0.624;DP=156;FS=0.000;MLEAC=1;MLEAF=1.00;MQ=60.00;MQRankSum=0.000;QD=31.63;ReadPosRankSum=-0.565;SOR=0.416   GT:AD:DP:GQ:PL  1:1,154:155:99:5465,0
HPV16_K02718_1_revised  3367    .   A   T   5455.04 PASS    AC=1;AF=1.00;AN=1;BaseQRankSum=0.624;DP=156;FS=0.000;MLEAC=1;MLEAF=1.00;MQ=60.00;MQRankSum=0.000;QD=31.63;ReadPosRankSum=-0.565;SOR=0.416   GT:AD:DP:GQ:PL  1:1,154:155:99:5465,0
HPV16_K02718_1_revised  3373    .   C   T   5455.04 PASS    AC=1;AF=1.00;AN=1;BaseQRankSum=0.624;DP=156;FS=0.000;MLEAC=1;MLEAF=1.00;MQ=60.00;MQRankSum=0.000;QD=31.63;ReadPosRankSum=-0.565;SOR=0.416   GT:AD:DP:GQ:PL  1:1,154:155:99:5465,0
HPV16_K02718_1_revised  3374    .   T   A   5455.04 PASS    AC=1;AF=1.00;AN=1;BaseQRankSum=0.624;DP=156;FS=0.000;MLEAC=1;MLEAF=1.00;MQ=60.00;MQRankSum=0.000;QD=31.63;ReadPosRankSum=-0.565;SOR=0.416   GT:AD:DP:GQ:PL  1:1,154:155:99:5465,0

However I get the error:

GRanges object contains 2 out-of-bound ranges located on sequence 1. Note that ranges located on a sequence whose
  length is unknown (NA) or on a circular sequence are not considered out-of-bound (use seqlengths() and isCircular()
  to get the lengths and circularity flags of the underlying sequences). You can use trim() to trim these ranges. See
  ?`trim,GenomicRanges-method` for more information.

... when trying to predict the amino acid mutations from these variants with predictCoding. Unless im completely blind, the variants and gff file have no positions outside the sequence, but something is making the function think so.

I think it has something to do with the first CDS region as I can avoid this error by changing the 2nd CDS stop position to =< 3373 (making it 15 bases long, like the first CDS region), but this ofcourse does not allow calls outside 3373 in the E4 gene.

I am very confused by the behavior of the txdb object and predictCoding function and have spent hours trying to figure this out. Can anyone help in creating a working gff file?

Functions used:

txdb <- makeTxDbFromGFF("GFFfile", format="gff3")
vcffile <- readVcf("vcffile.vcf")
faf <- open(FaFile("Ref.fasta"))
vcf_anno <- predictCoding(vcffile, txdb, seqSource = faf)

sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=da_DK.UTF-8       LC_NUMERIC=C               LC_TIME=da_DK.UTF-8        LC_COLLATE=da_DK.UTF-8     LC_MONETARY=da_DK.UTF-8   
 [6] LC_MESSAGES=da_DK.UTF-8    LC_PAPER=da_DK.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=da_DK.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] genbankr_1.20.0             Repitools_1.38.0            forcats_0.5.1               stringr_1.4.0              
 [5] dplyr_1.0.7                 purrr_0.3.4                 readr_1.4.0                 tidyr_1.1.3                
 [9] tibble_3.1.2                ggplot2_3.3.5               tidyverse_1.3.1             GenomicFeatures_1.44.0     
[13] AnnotationDbi_1.54.1        VariantAnnotation_1.38.0    Rsamtools_2.8.0             Biostrings_2.60.1          
[17] XVector_0.32.0              SummarizedExperiment_1.22.0 Biobase_2.52.0              GenomicRanges_1.44.0       
[21] GenomeInfoDb_1.28.1         IRanges_2.26.0              S4Vectors_0.30.0            MatrixGenerics_1.4.0       
[25] matrixStats_0.59.0          BiocGenerics_0.38.0        

loaded via a namespace (and not attached):
  [1] colorspace_2.0-2         rjson_0.2.20             ellipsis_0.3.2           DNAcopy_1.66.0           fs_1.5.0                
  [6] rstudioapi_0.13          affyio_1.62.0            bit64_4.0.5              fansi_0.5.0              lubridate_1.7.10        
 [11] xml2_1.3.2               splines_4.1.0            cachem_1.0.5             jsonlite_1.7.2           broom_0.7.8             
 [16] annotate_1.70.0          cluster_2.1.2            vsn_3.60.0               dbplyr_2.1.1             png_0.1-7               
 [21] BiocManager_1.30.16      compiler_4.1.0           httr_1.4.2               backports_1.2.1          assertthat_0.2.1        
 [26] Matrix_1.3-3             fastmap_1.1.0            limma_3.48.1             cli_3.0.0                prettyunits_1.1.1       
 [31] tools_4.1.0              gtable_0.3.0             glue_1.4.2               GenomeInfoDbData_1.2.6   affy_1.70.0             
 [36] rappdirs_0.3.3           tinytex_0.32             Rcpp_1.0.7               cellranger_1.1.0         vctrs_0.3.8             
 [41] preprocessCore_1.54.0    rtracklayer_1.52.0       xfun_0.24                rvest_1.0.0              lifecycle_1.0.0         
 [46] restfulr_0.0.13          gtools_3.9.2             XML_3.99-0.6             edgeR_3.34.0             zlibbioc_1.38.0         
 [51] MASS_7.3-54              scales_1.1.1             BSgenome_1.60.0          hms_1.1.0                yaml_2.2.1              
 [56] curl_4.3.2               memoise_2.0.0            biomaRt_2.48.2           stringi_1.6.2            RSQLite_2.2.7           
 [61] genefilter_1.74.0        BiocIO_1.2.0             caTools_1.18.2           Ringo_1.56.0             filelock_1.0.2          
 [66] BiocParallel_1.26.1      truncnorm_1.0-8          rlang_0.4.11             pkgconfig_2.0.3          bitops_1.0-7            
 [71] Rsolnp_1.16              lattice_0.20-44          GenomicAlignments_1.28.0 bit_4.0.4                tidyselect_1.1.1        
 [76] magrittr_2.0.1           R6_2.5.0                 gplots_3.1.1             generics_0.1.0           DelayedArray_0.18.0     
 [81] DBI_1.1.1                gsmoothr_0.1.7           pillar_1.6.1             haven_2.4.1              withr_2.4.2             
 [86] survival_3.2-11          KEGGREST_1.32.0          RCurl_1.98-1.3           modelr_0.1.8             crayon_1.4.1            
 [91] KernSmooth_2.23-20       utf8_1.2.1               BiocFileCache_2.0.0      progress_1.2.2           locfit_1.5-9.4          
 [96] grid_4.1.0               readxl_1.3.1             blob_1.2.1               reprex_2.0.0             digest_0.6.27           
[101] xtable_1.8-4             munsell_0.5.0

VariantAnnotation • 1.2k views

ADD COMMENT • link updated 3.7 years ago by Vincent J. Carey, Jr. 6.7k • written 3.8 years ago by Kasper_holm2 • 0

0

Entering edit mode

Can we reproduce this without Ref.fasta?

ADD REPLY • link 3.7 years ago Vincent J. Carey, Jr. 6.7k