reduce function after combining 2 GRanges objects from cn.mops output
2
0
Entering edit mode
Haiying.Kong ▴ 110
@haiyingkong-9254
Last seen 5.8 years ago
Germany

  I have 2 GRanges objects:

> cnv1
GRanges object with 3 ranges and 7 metadata columns:
      seqnames             ranges strand |   T10229   T10380   T10713    T2523
         <Rle>          <IRanges>  <Rle> | <factor> <factor> <factor> <factor>
  [1]        1 [  65509,  721942]      * |      CN1      CN2      CN2      CN2
  [2]        1 [ 762280, 3417947]      * |      CN1      CN1      CN1      CN1
  [3]        1 [7289685, 7737801]      * |      CN2      CN1      CN2      CN2
          T2378     T2501     T2556
      <logical> <logical> <logical>
  [1]      <NA>      <NA>      <NA>
  [2]      <NA>      <NA>      <NA>
  [3]      <NA>      <NA>      <NA>
  -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths
> cnv2
GRanges object with 5 ranges and 7 metadata columns:
      seqnames             ranges strand |    T10229    T10380    T10713
         <Rle>          <IRanges>  <Rle> | <logical> <logical> <logical>
  [1]        1 [  14620,  783151]      * |      <NA>      <NA>      <NA>
  [2]        1 [1374455, 1438931]      * |      <NA>      <NA>      <NA>
  [3]        1 [1593783, 1596048]      * |      <NA>      <NA>      <NA>
  [4]        1 [1601455, 1630886]      * |      <NA>      <NA>      <NA>
  [5]        1 [1634934, 1683496]      * |      <NA>      <NA>      <NA>
          T2523    T2378    T2501    T2556
      <logical> <factor> <factor> <factor>
  [1]      <NA>      CN2      CN2      CN2
  [2]      <NA>      CN2      CN2      CN2
  [3]      <NA>      CN2      CN2      CN2
  [4]      <NA>      CN2      CN2      CN2
  [5]      <NA>      CN3      CN1      CN1
  -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths

If I combine them:

cnv = c(cnv1, cnv2)

> cnv

GRanges object with 8 ranges and 7 metadata columns:
      seqnames             ranges strand |   T10229   T10380   T10713    T2523
         <Rle>          <IRanges>  <Rle> | <factor> <factor> <factor> <factor>
  [1]        1 [  65509,  721942]      * |      CN1      CN2      CN2      CN2
  [2]        1 [ 762280, 3417947]      * |      CN1      CN1      CN1      CN1
  [3]        1 [7289685, 7737801]      * |      CN2      CN1      CN2      CN2
  [4]        1 [  14620,  783151]      * |     <NA>     <NA>     <NA>     <NA>
  [5]        1 [1374455, 1438931]      * |     <NA>     <NA>     <NA>     <NA>
  [6]        1 [1593783, 1596048]      * |     <NA>     <NA>     <NA>     <NA>
  [7]        1 [1601455, 1630886]      * |     <NA>     <NA>     <NA>     <NA>
  [8]        1 [1634934, 1683496]      * |     <NA>     <NA>     <NA>     <NA>
         T2378    T2501    T2556
      <factor> <factor> <factor>
  [1]     <NA>     <NA>     <NA>
  [2]     <NA>     <NA>     <NA>
  [3]     <NA>     <NA>     <NA>
  [4]      CN2      CN2      CN2
  [5]      CN2      CN2      CN2
  [6]      CN2      CN2      CN2
  [7]      CN2      CN2      CN2
  [8]      CN3      CN1      CN1

( In the end, all NAs will be replaced by CN2)

If I apply reduce function on this, I lose all elementMetadata:

> reduce(cnv)
GRanges object with 2 ranges and 0 metadata columns:
      seqnames             ranges strand
         <Rle>          <IRanges>  <Rle>
  [1]        1 [  14620, 3417947]      *
  [2]        1 [7289685, 7737801]      *
  -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths

 

-----------------------------------------------------------------------------------------------------------------

Two problems here:

(1) This is too much collpased. I need to use same level of collapsing as cn.mops. I could not find in source code what cn.mops is using for "min.gapwidth".

(2) I need to create a new elementMetadata for cnv. The only thing I can figure out is comparing the ranges of each row in cnv1 and cnv2 with the ranges of cnv. Is there any easier way of doing the whole thing?

--------------------------------------------------------------------------------------------------------------------

Also, originally cnv1 and cnv2 did not have compatible elementMetadata, so I did:

n.col.1 = ncol(mcols(cnv1))
n.col.2 = ncol(mcols(cnv2))
n.col = n.col.1 + n.col.2
sam.names = c(names(mcols(cnv1)), names(mcols(cnv2)))

cn1 = as.data.frame(matrix(NA, nrow=length(seq_along(cnv1)), ncol=n.col))
names(cn1) = sam.names
cn1[ ,1:n.col.1] = as.data.frame(mcols(cnv1))
mcols(cnv1) = cn1

cn2 = as.data.frame(matrix(NA, nrow=length(seq_along(cnv2)), ncol=n.col))
names(cn2) = sam.names
cn2[ ,(n.col.1+1):n.col] = as.data.frame(mcols(cnv2))
mcols(cnv2) = cn2

Any short cut for this?

--------------------------------------------------------------------------------------------------------

Even when 2 GRanges objects are completely compatible, if I combine them with:

   cnv = c(cnv1, cnv2)

cnv is a list of 2 GRanges object.

GRanges cn.mops • 2.2k views
ADD COMMENT
2
Entering edit mode
@michael-lawrence-3846
Last seen 3.1 years ago
United States

See the with.revmap= argument for reduce(). It gives you the subscripts for merging the original input into the reduced output. I'm not sure what you mean by "too much collapsed". How much collapsed do you want it to be? For merging the tables, you could do this:

mcols(cnv1)$dummy <- seq_along(cnv1)
mcols(cnv2)$dummy <- seq_along(cnv2) + length(cnv1)
merge(mcols(cnv1), mcols(cnv2), by="dummy", all=TRUE)

 

 

ADD COMMENT
0
Entering edit mode

Thank you very much for your reply.

(1) I need 2 ways of combining. One is merging, one is concatenating for 2 GRanges objects with compatible elementMetadata.

  For the second case, if I run cnv = c(cnv1, cnv2) on GRanges objects created by cn.mops, this gives GRangesList object. If I run  cnv = c(cnv1, cnv2) on GRanges objects created by myself, this gives GRanges object. The only difference I can say is that in my GRanges objects, the columns in elementMetadata are characters, while those in the other case are factors.

(2) By "too much collapsed", I meant there are too few rows after merging. Probably "min.gapwidth" can adjust this? I have to give same value to this parameter as cn.mops.

ADD REPLY
0
Entering edit mode

Please post the actual code that produces that inconsistency. That could happen if you pass named arguments to c(). The min.gapwidth could make it so that only overlapping (not adjacent) ranges are merged.

ADD REPLY
0
Entering edit mode

> class(CNVs)
[1] "CNVDetectionResult"
attr(,"package")
[1] "cn.mops"
> cnv = cnvs(CNVs)[1:4, ]
> cnv
GRanges object with 4 ranges and 4 metadata columns:
      seqnames               ranges strand | sampleName     median      mean
         <Rle>            <IRanges>  <Rle> |   <factor>  <numeric> <numeric>
  [1]        1 [   65509,   721942]      * |     T10229 -0.9529843   -1.5198
  [2]        1 [ 1643790,  1645523]      * |     T10229 -4.1718849   -3.5928
  [3]        1 [21806582, 21809808]      * |     T10229 -3.0625941   -2.9810
  [4]        1 [24172257, 24192026]      * |     T10229 -0.9788206   -0.9915
               CN
      <character>
  [1]         CN1
  [2]         CN1
  [3]         CN1
  [4]         CN1
  -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths
> c(c(), cnv)
[[1]]
GRanges object with 4 ranges and 4 metadata columns:
      seqnames               ranges strand | sampleName     median      mean
         <Rle>            <IRanges>  <Rle> |   <factor>  <numeric> <numeric>
  [1]        1 [   65509,   721942]      * |     T10229 -0.9529843   -1.5198
  [2]        1 [ 1643790,  1645523]      * |     T10229 -4.1718849   -3.5928
  [3]        1 [21806582, 21809808]      * |     T10229 -3.0625941   -2.9810
  [4]        1 [24172257, 24192026]      * |     T10229 -0.9788206   -0.9915
               CN
      <character>
  [1]         CN1
  [2]         CN1
  [3]         CN1
  [4]         CN1
  -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths

 

ADD REPLY
1
Entering edit mode

I'm not sure that code is relevant, but anyway, that happens because c() produces NULL and NULL is not a GRanges, so it won't make a GRangesList. One could argue that c() should always drop NULL and dispatch on the first non-NULL argument. Btw, c() is a bit weird in that it is a variadic primitive, so dispatch happens only on the first argument, so maybe we should fix that, too. That would require changes in R though, so I would just not do what you're doing.

ADD REPLY
0
Entering edit mode

I tried, and it works now. Concatenating is working correctly now.

For merging with the function "reduce",  min.gapwidth cannot do what I am trying to do:

Ranges separated by a gap of at least min.gapwidth positions are not merged.

Is there any way to set a restriction about at least how much overlap the ranges should have in order to be merged?

Thank you very much for your help.

ADD REPLY
2
Entering edit mode
@michael-lawrence-3846
Last seen 3.1 years ago
United States

No, there is no way to do that with reduce() directly. You could use findOverlaps() and specify a minoverlap. Then get the Hits into a graph and find connected components like this:Select from Overlapping Genomic Ranges based on Score or other Metadata.

ADD COMMENT
0
Entering edit mode

Thanks again for your help.

findOverlaps() needs 2 GRanges objects, query and subject, and find overlaps between them. This is slightly different from what I am trying to do. I have only one GRanges object, and the ranges in this object has overlapping genomic regions. I am trying to collapse those with much overlapping regions.

This probably is possible that I do a loop. First, I compare the first row and the second row, and save the result, then compare the 3rd row with the result, and update the result, and 4th row, ........ Is there any easier way of doing this?

ADD REPLY
0
Entering edit mode

Check out that link in my answer. It shows the basics of this approach, including that you can call findOverlaps() with only a single argument.

ADD REPLY

Login before adding your answer.

Traffic: 537 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6