rtracklayer::lifOver one-to-many regions and not match UCSC
1
0
Entering edit mode
m.guelfi • 0
@mguelfi-13759
Last seen 7.3 years ago

Dear All,

I'm using the liftOver function to lift a genomic region (chr2:243181072-243181572) from hg19 to hg38. I have downloaded the file chain from here: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/liftOver/hg19ToHg38.over.chain.gz and the commands I'm using are the following: 

chain = import.chain("hg19ToHg38.over.chain")

liftOver(GRanges("chr2" , IRanges(243181072,243181572)), chain )

And this is the outcome:

> chain = import.chain("hg19ToHg38.over.chain")

Found more than one class "file" in cache; using the first, from namespace 'BiocGenerics'

Also defined by ‘filehash’

> liftOver(GRanges("chr2" , IRanges(243181072,243181572)), chain )

GRangesList object of length 1:

[[1]] 

GRanges object with 7 ranges and 0 metadata columns:

      seqnames         ranges strand

         <Rle>      <IRanges>  <Rle>

  [1]     chr1 [96548, 96552]      *

  [2]     chr1 [96554, 96625]      *

  [3]     chr1 [96627, 96848]      *

  [4]     chr1 [96850, 96893]      *

  [5]     chr1 [96895, 96944]      *

  [6]     chr1 [96946, 96991]      *

  [7]     chr1 [96993, 97048]      *


seqinfo: 1 sequence from an unspecified genome; no seqlengths

 

My region gets lifted to version hg38 but divided in smaller non-overlapping regions. However, if I use the UCSC liftover web tool the region gets lifted to one single region; chr1:96548-97048. The boundaries of the first region and the last region correspond to the outcome using the liftOver function, so I wonder why my region gets divided in smaller pieces? Is there something I'm missing here?

 

Thanks,

Seb  Guelfi

 

 

 

 

 

 

 

rtracklayer liftover bioconductor R • 3.2k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States

The inconsistency is probably due to the UCSC lift over "smoothing over" the gaps in the alignment. rtracklayer takes a more literal interpretation of the chain file. Does this cause a problem for your use case? If a good case is made, I could add an option for the smoothing to rtracklayer. It would have to be a pretty good case though.

ADD COMMENT
1
Entering edit mode

Not sure whether this is what you are aiming for with rtracklayer::liftOver, but I assume general consistency with the UCSC online liftOver (https://genome.ucsc.edu/cgi-bin/hgLiftOver) to be a desired feature for many users.

I can think of several good use cases, but to give you a specific one:

Consider hg19-based absolute copy number calls as obtained with ABSOLUTE on TCGA data (https://software.broadinstitute.org/software/cprg/?q=node/27).

Now, you might want to do a consistency check of these CNV calls with an alternative tool such as PureCN bioconductor.org/packages/PureCN which has been carried out hg38-based.

If you accordingly lift over an example hg19-based ABSOLUTE call

chr1:3218923-74907316

to hg38 using the online tool, you get a single region back

chr1:3302359-74441632

enabling straightforward comparison with the hg38-based PureCN calls.

However, if I use rtracklayer::liftOver

> chain.file <- "hg19ToHg38.over.chain"

> ch <- import.chain(chain.file)

> rtracklayer::liftOver(absGRL[[1]][1], ch)
GRangesList object of length 1:
[[1]]
GRanges object with 1072 ranges and 3 metadata columns:
         seqnames            ranges strand
            <Rle>         <IRanges>  <Rle>
     [1]     chr1   3302359-3928704      *
     [2]     chr1   3935210-3936533      *
     [3]     chr1   3936534-7366717      *
     [4]     chr1   7366718-7382536      *
     [5]     chr1   7382538-8504448      *
     ...      ...               ...    ... .          ...          ...
  [1068]     chr1 13203465-13203520      *
  [1069]     chr1 13203521-13203532      *
  [1070]     chr1 13203533-13203548      *
  [1071]    chr20 38461961-38462015      *
  [1072]     chr1 12893309-12893335      *

it would give me a hard time to figure out to which PureCN calls I should actually now compare to.

 

ADD REPLY
1
Entering edit mode

Consistency with the UCSC liftOver is going to be tough, since the algorithm is not well described and the code has a restricted license. It looks like it does some smoothing and filtering on top of the chain mappings. Exactly how would take a lot of reverse engineering.

ADD REPLY

Login before adding your answer.

Traffic: 867 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6