replace, merge and delete entries in GRangeList
1
0
Entering edit mode
@shanrongzhao-10606
Last seen 7.7 years ago

Background:  I have GRangeLit object, and want to know how to merge, replace and delete entries.

txdb <- makeTxDbFromGFF(file="hg19.gencode.v19.gtf", format="gtf")

seqlevels(txdb, force=TRUE) <- "chr22"

exons <- reduce(exonsBy(txdb, by = "gene"))

 

Then I identify a cluster of overlapping of genes, let’s say #3, #321. Now I want to replace exons[3] by merging exons[3] and exons[321]. My code snippet loos like:

     ids <- c(3,321)

     id <- ids[1]

    exons[[id]] <- reduce(unlist(exons[ids]))

 

It works, but the statement “exons[[id]] <- reduce(unlist(exons[ids]))” is very slow, and I don’t understand why.

Other generic questions are:

For GRangeList exons object above, what are the most elegant way to delete/merge/and replace entries.

1. replace: asked above, but performance is bad

2. delete:  I cannot do something like exons[3] <- NULL.


> exons[[100]] <- NULL
> exons[[100]]
GRanges object with 2 ranges and 0 metadata columns:
   seqnames               ranges strand
      <Rle>            <IRanges>  <Rle>
      chr22 [50247490, 50247637]      +
      chr22 [50276982, 50283726]      +
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> exons[[100]] <- NULL
> exons[[100]]
GRanges object with 2 ranges and 0 metadata columns:
   seqnames               ranges strand
      <Rle>            <IRanges>  <Rle>
      chr22 [50919985, 50920386]      +
      chr22 [50920996, 50924869]      +
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> exons[[100]] <- NULL
> exons[[100]]
GRanges object with 12 ranges and 0 metadata columns:
      seqnames               ranges strand
         <Rle>            <IRanges>  <Rle>
         chr22 [20067755, 20067906]      +
         chr22 [20068092, 20068208]      +
         chr22 [20073210, 20074844]      +
         chr22 [20077192, 20077781]      +
         chr22 [20078958, 20079155]      +
  ...      ...                  ...    ...
         chr22 [20082235, 20082318]      +
         chr22 [20093700, 20093800]      +
         chr22 [20094090, 20094559]      +
         chr22 [20094794, 20096707]      +
         chr22 [20097548, 20099400]      +
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
 

grangeslist • 1.2k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 2.9 years ago
United States

Let's assume you have a list of those clusters, call it idList. If it is a partitioning (a many-to-one grouping) of your exons, then (in devel) you can cast it as such and regroup the exons (currently grouped by transcript) to be grouped by cluster:

exonClusters <- regroup(exons, ManyToOneGrouping(idList))

Then just reduce them:

reduce(exonClusters)

In general, you want to avoid loops and avoid replacement. Instead, model your analysis as functional transformations, where the structure, such as the grouping, is inherent in the object, not your code. Your code should convey only the high-level semantics.

ADD COMMENT
0
Entering edit mode

Thanks. I am using GenomicFeatures_1.20.1, and the two functions you mentioned are not available.

Do you know where I get them (regroup, ManyToOneGrouping)? Also how does ManyToOneGrouping merge the orignal names?

 

ADD REPLY
0
Entering edit mode

They are from IRanges in devel. Actually, I never got around to exporting and documenting regroup(), so I just did that. IRanges 2.5.47.

ADD REPLY

Login before adding your answer.

Traffic: 584 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6