Overall Goal: Use ggplot to create individual gene plots of average signal ± SEM for biological replicates of a given gene. After I finishing troubleshooting code, I'll create a function/loop to take a csv of gene names and return plots for each gene.
I'm using rtracklayer to import the region of the BigWig file corresponding to the gene of interest, in this case a 1938 bp section of Chromosome XII. Right now I'm ignoring strand information, but this will be coded eventually.
Issue: The extracted list containing the coverage information is not always the expected length (1938 bp). I believe this is due to the way I create Views, selecting regions that contain coverage greater than 0 (some replicates have more than 1 View - see Rep2). However, I cannot find a way to create the View explicitly for a given range (Chromosome XII, 805387 - 807325).
I'm new to GenomicRanges and the associated data structures. Please point out if you see a better way of accomplishing my overall goal or changes to the way I'm currently importing and selecting the the range of interest.
Thanks for any help!
library(rtracklayer)
library(trackViewer)
#Import region of bigwig file for desired gene
Control_Rep1 <- import("/path/to/file/B28-JV31_S31_R1_001_MaskedCPM.bigwig", format="BigWig",
which=GRanges("chrXII",
IRanges(805387, 807325)))
Control_Rep2 <- import("/path/to/file/B28-JV32_S32_R1_001_MaskedCPM.bigwig", format="BigWig",
which=GRanges("chrXII",
IRanges(805387, 807325)))
Control_Rep3 <- import("/path/to/file/B28-JV33_S33_R1_001_MaskedCPM.bigwig", format="BigWig",
which=GRanges("chrXII",
IRanges(805387, 807325)))
#Get coverage information
cRep1_cov <- coverage(Control_Rep1, weight = "score")
cRep2_cov <- coverage(Control_Rep2, weight = "score")
cRep3_cov <- coverage(Control_Rep3, weight = "score")
#Create views for regions with coverage > 0
cRep1_view <- Views(cRep1_cov, cRep1_cov > 0)
cRep2_view <- Views(cRep2_cov, cRep2_cov > 0)
cRep3_view <- Views(cRep3_cov, cRep3_cov > 0)
#Extract values as a list. [[#A]][[#B]] where #A is the chromosome and #B is the view number
cRep1_Values <- as.list(cRep1_view[[12]][[1]])
cRep2_Values <- as.list(cRep2_view[[12]][[1]])
cRep3_Values <- as.list(cRep3_view[[12]][[1]])
> length(cRep1_Values)
[1] 1939
> length(cRep2_Values)
[1] 1239
> length(cRep3_Values)
[1] 1939
> length(cRep1_view$chrXII)
[1] 1
> length(cRep2_view$chrXII)
[1] 2
> length(cRep3_view$chrXII)
[1] 1
> sessionInfo( )
R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.6.5
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] grid stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] trackViewer_1.28.1 Rcpp_1.0.8.3 rtracklayer_1.52.1 GenomicRanges_1.44.0 GenomeInfoDb_1.28.4
[6] IRanges_2.26.0 S4Vectors_0.30.2 BiocGenerics_0.38.0
loaded via a namespace (and not attached):
[1] colorspace_2.0-3 rjson_0.2.21 ellipsis_0.3.2 biovizBase_1.40.0
[5] htmlTable_2.4.0 XVector_0.32.0 base64enc_0.1-3 dichromat_2.0-0
[9] rstudioapi_0.13 bit64_4.0.5 AnnotationDbi_1.54.1 fansi_1.0.3
[13] xml2_1.3.3 splines_4.1.0 cachem_1.0.6 knitr_1.38
[17] Formula_1.2-4 Rsamtools_2.8.0 cluster_2.1.3 dbplyr_2.1.1
[21] png_0.1-7 grImport_0.9-5 graph_1.70.0 compiler_4.1.0
[25] httr_1.4.2 backports_1.4.1 assertthat_0.2.1 Matrix_1.4-1
[29] fastmap_1.1.0 lazyeval_0.2.2 cli_3.2.0 htmltools_0.5.2
[33] prettyunits_1.1.1 tools_4.1.0 gtable_0.3.0 glue_1.6.2
[37] GenomeInfoDbData_1.2.6 dplyr_1.0.8 rappdirs_0.3.3 Biobase_2.52.0
[41] vctrs_0.4.0 Biostrings_2.60.2 rhdf5filters_1.4.0 xfun_0.30
[45] stringr_1.4.0 lifecycle_1.0.1 restfulr_0.0.13 ensembldb_2.16.4
[49] XML_3.99-0.9 InteractionSet_1.20.0 zlibbioc_1.38.0 scales_1.1.1
[53] BSgenome_1.60.0 VariantAnnotation_1.38.0 hms_1.1.1 MatrixGenerics_1.4.3
[57] ProtGenerics_1.24.0 SummarizedExperiment_1.22.0 rhdf5_2.36.0 AnnotationFilter_1.16.0
[61] RColorBrewer_1.1-3 yaml_2.3.5 curl_4.3.2 memoise_2.0.1
[65] gridExtra_2.3 ggplot2_3.3.5 biomaRt_2.48.3 rpart_4.1.16
[69] latticeExtra_0.6-29 stringi_1.7.6 RSQLite_2.2.12 BiocIO_1.2.0
[73] plotrix_3.8-2 checkmate_2.0.0 GenomicFeatures_1.44.2 filelock_1.0.2
[77] BiocParallel_1.26.2 rlang_1.0.2 pkgconfig_2.0.3 matrixStats_0.61.0
[81] bitops_1.0-7 lattice_0.20-45 purrr_0.3.4 Rhdf5lib_1.14.2
[85] GenomicAlignments_1.28.0 htmlwidgets_1.5.4 bit_4.0.4 tidyselect_1.1.2
[89] magrittr_2.0.3 R6_2.5.1 generics_0.1.2 Hmisc_4.6-0
[93] DelayedArray_0.18.0 DBI_1.1.2 pillar_1.7.0 foreign_0.8-82
[97] survival_3.3-1 KEGGREST_1.32.0 RCurl_1.98-1.6 nnet_7.3-17
[101] tibble_3.1.6 crayon_1.5.1 utf8_1.2.2 BiocFileCache_2.0.0
[105] jpeg_0.1-9 progress_1.2.2 data.table_1.14.2 blob_1.2.2
[109] Rgraphviz_2.36.0 digest_0.6.29 munsell_0.5.0 Gviz_1.36.2
Added response as an Answer so thread can be closed.