How to transform chromosome start end to band in batch using biomaRt
1
0
Entering edit mode
574233829 • 0
@574233829-16391
Last seen 5.6 years ago

Hi, I want to transform 'chromosome start end' to 'band' in batch using biomaRt now. But I met some questions. Could you help me ? I refer to the examples on the user manual.

getBM(attributes = c('affy_hg_u133_plus_2','ensembl_gene_id'), 
      filters = c('chromosome_name','start','end'),
      values = list(16,1100000,1250000), 
      mart = ensembl)

But this example can not transform in batch. There are 328 items needed to annotate. My data structure:

    seqnames     start       end width strand no.cpgs    minfdr  Stouffer  maxbetafc  meanbetafc
1        16  54971700  54973047  1348      *       5  0.00e+00  0.00e+00  0.3311636  0.28623709
2        14  36992001  36993517  1517      *       4 2.93e-235 2.50e-278  0.3493025  0.32264970
3         6  31650760  31651158   399      *       7 8.20e-288 3.14e-256  0.2683500  0.23795246
4        17  79004850  79005386   537      *       3 7.45e-249 1.69e-202  0.3201577  0.29638743
5        10 134150489 134150690   202      *       4 1.88e-232 1.34e-197  0.2779639  0.26302729
6         3 169384410 169384648   239      *       3 1.50e-234 7.84e-186  0.3797378  0.30127753
7         1 151810485 151811364   880      *       6 2.08e-188 1.18e-164  0.2746188  0.23411387
8         7   5271516   5271655   140      *       2 2.54e-212 7.38e-163  0.3677478  0.34913817
9        16  54316049  54317413  1365      *       4 8.52e-184 2.45e-162  0.3459313  0.26682181
10        5  37834958  37836246  1289      *       5 1.35e-178 8.61e-155  0.2931416  0.21486234
11        7  36764019  36764140   122      *       2 9.41e-197 2.13e-148 -0.3248119 -0.28964118
12       12  16762789  16762831    43      *       2 2.69e-173 8.30e-137  0.3245738  0.31208577
13       16  68676557  68676806   250      *       2 4.21e-156 9.80e-128  0.2981041  0.28468263
14        1 247712237 247712239     3      *       2 1.69e-154 1.43e-126 -0.2896461 -0.28435472
15        5   2747922   2748369   448      *       2 7.63e-136 2.13e-113  0.2947875  0.28194606
16       11  86383696  86383761    66      *       2 3.24e-130 1.30e-109  0.2938009  0.26314046
17       16   4714794   4714815    22      *       2 2.37e-129 3.62e-109  0.2962017  0.28677182
18       19  58220295  58220818   524      *       8 3.68e-112 3.14e-108  0.1722510  0.15602547
19        1 147782116 147782452   337      *       2 7.91e-128 3.76e-108  0.2651204  0.26211464
20       19  13135662  13135808   147      *       2 1.46e-127 7.33e-108  0.2977374  0.27440837
21       10  77155376  77156421  1046      *       4  9.50e-92 3.01e-105  0.2826025  0.23205180
22       12  30976001  30976016    16      *       2 1.95e-122 2.96e-104  0.2653739  0.25972562
23       15  89951787  89953295  1509      *       4  1.25e-87 1.92e-102  0.1948644  0.17701206
24       12  58021295  58021917   623      *       4 1.00e-110 3.62e-100  0.2756699  0.20133960
25       22  30476206  30476452   247      *       3 4.68e-110  1.28e-98  0.2147699  0.18921464
26       12  54408664  54409599   936      *       7 5.71e-104  5.27e-98  0.1720937  0.14245105
27        6  10416007  10416508   502      *       4 2.52e-102  1.98e-95 -0.2120803 -0.17534331
28        7  96650096  96650192    97      *       3 7.07e-113  3.65e-95  0.2560395  0.21009693
29        7  96642096  96643508  1413      *       4 1.46e-105  3.09e-92  0.2656333  0.18161992
30        3 156324038 156324118    81      *       2 7.00e-106  4.97e-92  0.2429422  0.23746324
31       12  47219737  47219920   184      *       5  6.61e-95  1.09e-90  0.2079795  0.17331125
32        2 172953134 172953925   792      *       4 7.05e-115  2.85e-89  0.2600936  0.18543391
33        6  10420798  10422322  1525      *       5  1.04e-82  1.48e-87  0.2163537  0.16366814
34        6   7540834   7541066   233      *       3  2.00e-95  4.06e-87  0.2151257  0.21056723
35        2 175193377 175193551   175      *       2 4.88e-107  9.56e-87  0.2864994  0.26058300
36        1 226187852 226187876    25      *       2  4.13e-95  1.98e-83  0.2609258  0.24550333
37        5   2754753   2755540   788      *       2  4.75e-74  6.31e-82  0.3180281  0.27040688
38       15  89949617  89949776   160      *       2  3.59e-92  1.91e-81  0.2766399  0.24868424

And I try to write my code.

DMRS_Band=getBM(attributes = c("chromosome_name", "start_position","end_position", "band"),
                filters = c('chromosome_name','start','end'),
                values = list(chromosome_name=Methy_5_DMRs_S1_S3_annotation$seqnames,start=Methy_5_DMRs_S1_S3_annotation$start,end=Methy_5_DMRs_S1_S3_annotation$end),
                mart = mart)

But the annotation results return 55512 records. I think the result should be return 328 records in theory. But there are 55512 records returned. So think there may be error in my code. This follows an example.

    chromosome_name start_position end_position   band
1                 1       16873708     16873851 p36.13
2                 1       12507246     12507397 p36.21
3                 1      202070841    202070943  q32.1
4                 1       45592722     45592814  p34.1
5                 1       23370254     23370346 p36.12
6                 1       12166943     12167038 p36.22
7                 1      167998660    167998726  q24.2
8                 1       32086949     32087007  p35.2
9                 1      171101728    171101806  q24.3
10                1       28648600     28648733  p35.3
11                1       62078786     62078859  p31.3
12                1      220117853    220117962    q41
13                1      148334479    148334554  q21.2

I want to solve this problem. So I write a loop to annotate one by one. But there are another error. We can not call biomaRt API for many times. Otherwise there will be error. My code:

DMRs_Final<-data.frame(seqnames=c(NA), start=c(NA), end=c(NA), width=c(NA), strand=c(NA), no.cpgs=c(NA), minfdr=c(NA),
                       Stouffer=c(NA), maxbetafc=c(NA), meanbetafc=c(NA), overlapping.promoters=c(NA),chromosome_name=c(NA),  
                       start_position=c(NA),  end_position=c(NA),  band=c(NA))

for(i in 1:nrow(Methy_5_DMRs_S1_S3_annotation)){
  DMRs_original=Methy_5_DMRs_S1_S3_annotation[i,]
  DMRS_Band=getBM(attributes = c("chromosome_name", "start_position","end_position", "band"), 
        filters = c('chromosome_name','start','end'),
        values = list(DMRs_original$seqnames,DMRs_original$start,DMRs_original$end), 
        mart = mart)
  if(nrow(DMRS_Band)==0){
    i=i+1
  }else{
    DMRs_merge=cbind(DMRs_original,DMRS_Band)
    DMRs_Final=rbind(DMRs_Final,DMRs_merge)
  }
}

There follows the error.

Error in getBM(attributes = c("chromosome_name", "start_position", "end_position",  : 
  The query to the BioMart webservice returned an invalid result: biomaRt expected a character string of length 1. 
Please report this on the support site at http://support.bioconductor.org

So, could you tell me to how to transform 'chromosome start end' to 'band' in batch using biomaRt? Thank you very much.

software error • 599 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 21 minutes ago
United States

I would do it a different way.

> library(biovizBase)
> library(GenomicRanges)

## just the first bit of your GRanges
> gr <- GRanges(paste0("chr", c(16,14,6)), IRanges(c(54971700,36992001,31650760), c(54973047,36993517,31651158)))
> ideo <- getIdeogram("hg19")
> mcols(gr) <- mcols(subsetByOverlaps(ideo, gr))
> gr
GRanges object with 3 ranges and 2 metadata columns:
      seqnames            ranges strand |     name gieStain
         <Rle>         <IRanges>  <Rle> | <factor> <factor>
  [1]    chr16 54971700-54973047      * |    q13.3     gneg
  [2]    chr14 36992001-36993517      * |    q12.2   gpos50
  [3]     chr6 31650760-31651158      * |   p21.33     gneg
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths
ADD COMMENT
0
Entering edit mode

I should also note that you have Ensembl chromosomes that you need to convert to UCSC style

> gr <- GRanges( c(16,14,6), IRanges(c(54971700,36992001,31650760), c(54973047,36993517,31651158)))
> gr
GRanges object with 3 ranges and 0 metadata columns:
      seqnames            ranges strand
         <Rle>         <IRanges>  <Rle>
  [1]       16 54971700-54973047      *
  [2]       14 36992001-36993517      *
  [3]        6 31650760-31651158      *
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths
> seqlevelsStyle(gr) <- "UCSC"
> gr
GRanges object with 3 ranges and 0 metadata columns:
      seqnames            ranges strand
         <Rle>         <IRanges>  <Rle>
  [1]    chr16 54971700-54973047      *
  [2]    chr14 36992001-36993517      *
  [3]     chr6 31650760-31651158      *
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths
ADD REPLY

Login before adding your answer.

Traffic: 867 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6