Dear All,
I'm using the liftOver function to lift a genomic region (chr2:243181072-243181572) from hg19 to hg38. I have downloaded the file chain from here: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/liftOver/hg19ToHg38.over.chain.gz and the commands I'm using are the following:
chain = import.chain("hg19ToHg38.over.chain")
liftOver(GRanges("chr2" , IRanges(243181072,243181572)), chain )
And this is the outcome:
> chain = import.chain("hg19ToHg38.over.chain")
Found more than one class "file" in cache; using the first, from namespace 'BiocGenerics'
Also defined by ‘filehash’
> liftOver(GRanges("chr2" , IRanges(243181072,243181572)), chain )
GRangesList object of length 1:
[[1]]
GRanges object with 7 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 [96548, 96552] *
[2] chr1 [96554, 96625] *
[3] chr1 [96627, 96848] *
[4] chr1 [96850, 96893] *
[5] chr1 [96895, 96944] *
[6] chr1 [96946, 96991] *
[7] chr1 [96993, 97048] *
seqinfo: 1 sequence from an unspecified genome; no seqlengths
My region gets lifted to version hg38 but divided in smaller non-overlapping regions. However, if I use the UCSC liftover web tool the region gets lifted to one single region; chr1:96548-97048. The boundaries of the first region and the last region correspond to the outcome using the liftOver function, so I wonder why my region gets divided in smaller pieces? Is there something I'm missing here?
Thanks,
Seb Guelfi
Not sure whether this is what you are aiming for with rtracklayer::liftOver, but I assume general consistency with the UCSC online liftOver (https://genome.ucsc.edu/cgi-bin/hgLiftOver) to be a desired feature for many users.
I can think of several good use cases, but to give you a specific one:
Consider hg19-based absolute copy number calls as obtained with ABSOLUTE on TCGA data (https://software.broadinstitute.org/software/cprg/?q=node/27).
Now, you might want to do a consistency check of these CNV calls with an alternative tool such as PureCN bioconductor.org/packages/PureCN which has been carried out hg38-based.
If you accordingly lift over an example hg19-based ABSOLUTE call
chr1:3218923-74907316
to hg38 using the online tool, you get a single region back
chr1:3302359-74441632
enabling straightforward comparison with the hg38-based PureCN calls.
However, if I use rtracklayer::liftOver
> chain.file <- "hg19ToHg38.over.chain"
> ch <- import.chain(chain.file)
> rtracklayer::liftOver(absGRL[[1]][1], ch)
GRangesList object of length 1:
[[1]]
GRanges object with 1072 ranges and 3 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 3302359-3928704 *
[2] chr1 3935210-3936533 *
[3] chr1 3936534-7366717 *
[4] chr1 7366718-7382536 *
[5] chr1 7382538-8504448 *
... ... ... ... . ... ...
[1068] chr1 13203465-13203520 *
[1069] chr1 13203521-13203532 *
[1070] chr1 13203533-13203548 *
[1071] chr20 38461961-38462015 *
[1072] chr1 12893309-12893335 *
it would give me a hard time to figure out to which PureCN calls I should actually now compare to.
Consistency with the UCSC liftOver is going to be tough, since the algorithm is not well described and the code has a restricted license. It looks like it does some smoothing and filtering on top of the chain mappings. Exactly how would take a lot of reverse engineering.