Missing levels in legend using autoplot () + scale_color_manual () functions
0
0
Entering edit mode
@oswaldolorenzo-11449
Last seen 7.2 years ago

Hello,

I'm trying to plot three different bed files (one with SNP data, another one with deletions and a third one with duplication data), but I cannot manage the legend to contain the values of the three layers unless I put the data altogether in the same file. The problem about combining the three files into a single one is that I'm not able to set ylims for each of the levels of the variable.

This is an example of one of my input files (the one containing the SNP info):

chr10    47000019    47000019    rs150696937    2    +
chr11    1017064    1017064    NA    2    +
chr11    1017280    1017280    rs199539548    2    +
chr11    1017294    1017294    NA    2    +
chr11    1017756    1017756    NA    2    +
chr13    31898038    31898038    rs200460848    2    +
chr13    40298639    40298639    NA    2    +
chr13    48996928    48996928    rs530812916    2    +
chr13    50204777    50204777    rs117251022    2    +
chr14    20216005    20216005    rs566685404    2    +
chr14    20404076    20404076    rs114526346    2    +
chr21    10944668    10944668    rs138088406    2    +

I'm using the score column to specify the kind of variant type that I want to plot in the following way: "1" = Deletion; "2" = SNP and "3" = Duplication.

The code that I'm using to generate the plot is the following one:

## Load libraries and required databases
library(ggbio)
data(hg19IdeogramCyto, package = "biovizBase")

library(GenomicRanges)
hg19 <- keepSeqlevels(hg19IdeogramCyto, paste0("chr", c(1:22, "X", "Y")))

biovizBase::isIdeogram(hg19)

data("hg19IdeogramCyto", package = "biovizBase")

data("hg19Ideogram", package = "biovizBase")

I use the Bed2GRanges function available at this website: http://davetang.org/muse/2015/02/04/bed-granges/ to convert my bed file into a GRanges object.

# Required Bed2GRanges function

# BED to GRanges
#
# This function loads a BED-like file and stores it as a GRanges object.
# The tab-delimited file must be ordered as 'chr', 'start', 'end', 'id', 'score', 'strand'.
# The minimal BED file must have the 'chr', 'start', 'end' columns.
# Any columns after the strand column are ignored.
#
# @param file Location of your file
# @keywords BED GRanges
# @export
# @examples
# bed_to_granges('my_bed_file.bed')

bed_to_granges <- function(file){
        df <- read.table(file,
                         header=F,
                         stringsAsFactors=F)
        
        if(length(df) > 6){
                df <- df[,-c(7:length(df))]
        }
        
        if(length(df)<3){
                stop("File has less than 3 columns")
        }
        
        header <- c('chr','start','end','id','score','strand')
        names(df) <- header[1:length(names(df))]
        
        if('strand' %in% colnames(df)){
                df$strand <- gsub(pattern="[^+-]+", replacement = '*', x = df$strand)
        }
        
        library("GenomicRanges")
        
        if(length(df)==3){
                gr <- with(df, GRanges(chr, IRanges(start, end)))
        } else if (length(df)==4){
                gr <- with(df, GRanges(chr, IRanges(start, end), id=id))
        } else if (length(df)==5){
                gr <- with(df, GRanges(chr, IRanges(start, end), id=id, score=as.character(score)))
        } else if (length(df)==6){
                gr <- with(df, GRanges(chr, IRanges(start, end), id=id, score=as.character(score), strand=strand))
        }
        return(gr)
}

 

I import my bedfile:

## Import bed files as GRanges file
SNP <- bed_to_granges("SNPs.bed")
seqlengths(SNP) <- seqlengths(hg19Ideogram)[names(seqlengths(SNP))]
SNP_dn <- keepSeqlevels(SNP, paste0("chr", c(1:22, "X", "Y")))

I plot my data:

#Plotting SNP_dn according to score column
test <- autoplot(SNP_dn, aes(color = score)) +
        scale_color_manual("Variant type",
                           values = score <- c("black", "red", "blue"),
                           breaks = c("2","1","3"),
                           drop = FALSE,
                           labels = c("SNP", "Deletion", "Duplication")) +
        theme(legend.position = "right")

test

Even I specify the option drop = FALSE, I always miss the levels "Deletion" and "Duplication" in the legend.

Here you have an example of the output plot:

I would like to have a legend that includes the three levels that I have specified with the scale_color_manual () function (i.e. SNP, Deletion, Duplication), even if there are none of them in the bed file.

Thank you very much,

sessionInfo()

R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252  
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                         
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] biovizBase_1.20.0    ggbio_1.20.2         GenomicRanges_1.24.3 GenomeInfoDb_1.8.7   IRanges_2.6.1      
[6] S4Vectors_0.10.3     ggplot2_2.1.0        BiocGenerics_0.18.0 

loaded via a namespace (and not attached):
[1] Rcpp_0.12.7                   lattice_0.20-34               Rsamtools_1.24.0            
[4] Biostrings_2.40.2             digest_0.6.10                 mime_0.5                    
[7] R6_2.1.3                      plyr_1.8.4                    chron_2.3-47                
[10] acepack_1.3-3.3               RSQLite_1.0.0                 BiocInstaller_1.22.3        
[13] httr_1.2.1                    zlibbioc_1.18.0               GenomicFeatures_1.24.5      
[16] data.table_1.9.6              rpart_4.1-10                  Matrix_1.2-7.1              
[19] labeling_0.3                  splines_3.3.1                 BiocParallel_1.6.6          
[22] AnnotationHub_2.4.2           stringr_1.1.0                 foreign_0.8-67              
[25] RCurl_1.95-4.8                biomaRt_2.28.0                munsell_0.4.3               
[28] shiny_0.14                    httpuv_1.3.3                  rtracklayer_1.32.2          
[31] htmltools_0.3.5               nnet_7.3-12                   SummarizedExperiment_1.2.3  
[34] gridExtra_2.2.1               interactiveDisplayBase_1.10.3 Hmisc_3.17-4                
[37] XML_3.98-1.4                  reshape_0.8.5                 GenomicAlignments_1.8.4     
[40] bitops_1.0-6                  RBGL_1.48.1                   grid_3.3.1                  
[43] xtable_1.8-2                  GGally_1.2.0                  gtable_0.2.0                
[46] DBI_0.5-1                     magrittr_1.5                  scales_0.4.0                
[49] graph_1.50.0                  stringi_1.1.1                 XVector_0.12.1              
[52] reshape2_1.4.1                latticeExtra_0.6-28           Formula_1.2-1               
[55] RColorBrewer_1.1-2            ensembldb_1.4.7               tools_3.3.1                 
[58] dichromat_2.0-0               OrganismDbi_1.14.1            BSgenome_1.40.1             
[61] Biobase_2.32.0                survival_2.39-5               AnnotationDbi_1.34.4        
[64] colorspace_1.2-6              cluster_2.0.4                 VariantAnnotation_1.18.7     
ggbio ggplot2 legend bioconductor karyiogram • 2.2k views
ADD COMMENT
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 927 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6