Hello,
Basically, what I'm trying to do is to plot the linkage disequilibrium (LD) in a certain genomic region in a similar way as this website does: https://analysistools.nci.nih.gov/LDlink/?tab=ldmatrix. The problem is that the figure from this website has a low quality and I need the figure for publication purposes. All the sample files I have used are at the end of the post so that you can reproduce my code.
This is what I have done so far:
library(ggplot2) library(reshape) library(scales) library(ggbio) library(GenomicRanges) library(Homo.sapiens) library(rtracklayer) library(GenomicRanges) library(BSgenome.Hsapiens.UCSC.hg19) library(VariantAnnotation) source("bed2granges.R") data(hg19IdeogramCyto, package = "biovizBase") data(hg19Ideogram, package = "biovizBase") data(genesymbol, package = "biovizBase") hg19 <- keepSeqlevels(hg19IdeogramCyto, paste0("chr", c(1:22, "X", "Y")))
I downloaded the matrix txt file from the LDlink website to use it for my LD plot. I melted the dataframe to obtain the grading colors:
# FIRST PLOT - LD plot for CEU population #Load r-squared values in Table format CEU_plot<- read.table("r2_SOD1_matrix.txt", header = TRUE) #Load chromosomal positions for each rs ID RS_location <- read.table("SOD1_SNPs.bed", header = FALSE) #Melt table to be able to make a gradient CEU_plot_m <- melt(CEU_plot) # x-axis values - chromosomal position CEU_plot_m$RS_number <- factor(RS_location[,2], levels = RS_location[,2]) # y-axis - variable CEU_plot_m$variable <- factor(CEU_plot_m$variable, levels = sort(CEU_plot_m$variable, decreasing = TRUE)) CEU = ggplot(CEU_plot_m, aes(x = RS_number, y = variable)) + labs (title = "r2 plot for SOD1 gene in CEU population ", x = element_blank (), y = "rs IDs") + theme(text = element_text(size=8), axis.text.x = element_blank(), axis.text.y = element_text(hjust=0)) + geom_tile(aes(fill = value), colour = "white") + scale_fill_gradient(low = "white" ,high = "red") CEU
Then I use ggbio package to generate the third and fourth track of my final plot
# DATA FOR REST OF PLOTS wh <- genesymbol[c("SOD1")] chr_name <- as.vector(slot(wh@seqnames,"values")) chr_start <- slot(wh@ranges, "start") + 0 - 2000 chr_end <- slot (wh@ranges, "start") + slot (wh@ranges, "width") - 1 + 2000 gene_name <- slot(wh@ranges, "NAMES") #THIRD PLOT - SOD1 SNPs SNP <- bed_to_granges("SOD1_SNPs.bed") SNP_chr <- slot(SNP@seqnames,"values") if (chr_name %in% SNP_chr) { seqlengths(SNP) <- seqlengths(hg19Ideogram)[names(seqlengths(SNP))] SNP_dn <- keepSeqlevels(unique(SNP), chr_name) } SNPs_plot <- autoplot(SNP_dn, xlab = chr_name) + guides (colour = TRUE) + theme_bw()+ theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.border = element_blank(), panel.background = element_blank()) + theme(legend.position="none") + scale_color_manual(values = score <- c("black")) + scale_fill_manual (breaks = score <- c("2"), values= score <- c("black"), name = "Variant type", labels = expression(bold(SNP))) + xlim(chr_start,chr_end) + scale_x_sequnit("Mb") fixed(SNP_IDs) <- TRUE SNPs_plot ##FOURTH PLOT - SOD1 gene GENES_plot <- autoplot(Homo.sapiens, which = wh, layout = "linear", xlab = chr_name) + guides (colour = FALSE) + xlim(chr_start,chr_end) + scale_x_sequnit("Mb") fixed(GENES_plot) <- TRUE GENES_plot # Main tracks plot title plot_title <- as.expression(bquote('Variation in'~italic(.(gene_name))~'gene')) # X axis scale according to gene start and gene end scale_combined <- GRanges(chr_name, IRanges(start = chr_start, end = chr_end)) # Combination of SNPs plot with Genes plot Gene_SNPs <- tracks(SNPs = SNP_IDs, Genes=GENES_plot, heights = c(0.5,2.0), xlim = scale_combined, xlab = paste("\n",chr_name,"\n"), title = plot_title, label.bg.fill = "grey60") + scale_x_sequnit("Mb") Gene_SNPs
The problem comes when I want to combine the CEU plot (first track) with the SNPs (third track) and the Gene (fourth track) plots. I need to combine a ggplot list (CEU) with two GGBio objects (SNPs_plot and GENES_plot). On top of that I would like to generate an additional track between the LD plot and the SNPs_plot that links the discrete values of the x axis from the CEU plot with the SNPs located in a continuous scale from the SNPs_plot.
In summary, this is what I have:
LD Plot:
Combined SNPs data with Gene track:
An this is the final combined plot that I want:
Thank you very much for your help in advance,
These are the original files I'm using:
'r2_SOD1_matrix.txt' input file
https://drive.google.com/open?id=0B34ok3wh5PjTMklhNzBrT3JoM0k
'SOD1_SNPs.bed' input file
https://drive.google.com/open?id=0B34ok3wh5PjTaGx6QWdZd3pmbUE
'bed2granges.R' function
https://drive.google.com/open?id=0B34ok3wh5PjTMHNFa1pod0liYzA
R version 3.3.2 (2016-10-31) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods [9] base other attached packages: [1] scales_0.4.1 reshape_0.8.6 [3] VariantAnnotation_1.20.2 Rsamtools_1.26.1 [5] SummarizedExperiment_1.4.0 BSgenome.Hsapiens.UCSC.hg19_1.4.0 [7] BSgenome_1.42.0 Biostrings_2.42.1 [9] XVector_0.14.0 rtracklayer_1.34.1 [11] Homo.sapiens_1.3.1 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 [13] org.Hs.eg.db_3.4.0 GO.db_3.4.0 [15] OrganismDbi_1.16.0 GenomicFeatures_1.26.2 [17] AnnotationDbi_1.36.1 Biobase_2.34.0 [19] GenomicRanges_1.26.2 GenomeInfoDb_1.10.2 [21] IRanges_2.8.1 S4Vectors_0.12.1 [23] ggbio_1.22.3 BiocGenerics_0.20.0 [25] ggplot2_2.2.1 loaded via a namespace (and not attached): [1] httr_1.2.1 AnnotationHub_2.6.4 [3] splines_3.3.2 Formula_1.2-1 [5] shiny_1.0.0 assertthat_0.1 [7] interactiveDisplayBase_1.12.0 latticeExtra_0.6-28 [9] RBGL_1.50.0 yaml_2.1.14 [11] RSQLite_1.1-2 backports_1.0.5 [13] lattice_0.20-34 biovizBase_1.22.0 [15] digest_0.6.11 RColorBrewer_1.1-2 [17] checkmate_1.8.2 colorspace_1.3-2 [19] htmltools_0.3.5 httpuv_1.3.3 [21] Matrix_1.2-8 plyr_1.8.4 [23] XML_3.98-1.5 biomaRt_2.30.0 [25] zlibbioc_1.20.0 xtable_1.8-2 [27] BiocParallel_1.8.1 htmlTable_1.8 [29] tibble_1.2 nnet_7.3-12 [31] lazyeval_0.2.0 survival_2.40-1 [33] magrittr_1.5 mime_0.5 [35] memoise_1.0.0 GGally_1.3.0 [37] foreign_0.8-67 graph_1.52.0 [39] BiocInstaller_1.24.0 tools_3.3.2 [41] data.table_1.10.0 stringr_1.1.0 [43] munsell_0.4.3 cluster_2.0.5 [45] ensembldb_1.6.2 grid_3.3.2 [47] RCurl_1.95-4.8 dichromat_2.0-0 [49] labeling_0.3 bitops_1.0-6 [51] base64enc_0.1-3 gtable_0.2.0 [53] DBI_0.5-1 reshape2_1.4.2 [55] R6_2.2.0 GenomicAlignments_1.10.0 [57] gridExtra_2.2.1 knitr_1.15.1 [59] Hmisc_4.0-2 stringi_1.1.2 [61] Rcpp_0.12.9 rpart_4.1-10 [63] acepack_1.4.1
Dear Michael,
I'll have a look at the `plotRangesLinkedToData()` to see if I can get what I'm trying to do.
Anyway, what do you mean by using a more modular approach? Can I import the data in a different way that would make the plot easier to generate?
Thank you very much for your help
I just meant a more modular approach to the design of ggbio, which pretty close to abandonware. It would be good not to write your own BED parser though.