Question

how should I apply "cpg.annotate" to TCGA methylation data in hg38 for HM450K?

0

Entering edit mode

xiaofeiwang198266 • 0

@xiaofeiwang198266-13657

Last seen 4 weeks ago

United States

GDC legacy archive retired. So, I downloaded the TCGA methylation data in hg38 (450K) using TCGAbiolinks. I'd like to use DMRcate to find DMRs. But the problem is that the annotation is hg19 (IlluminaHumanMethylation450kanno.ilmn12.hg19). I am tring to modify the "cpg.annotate" function, but got stuck when playing with "makeGenomicRatioSetFromMatrix" function to use my home made annotation for methylation array in hg38 (450K) (based on files here http://zwdzwd.github.io/InfiniumAnnotation). Actually, I don't know how to modify the funciton in makeGenomicRatioSetFromMatrix for this part as below to use my annotation rather than the feeded value of "ilmn12.hg19". I was also trying to make a GenomicRatioSet using my homemade annotation, but failed.

out <- GenomicRatioSet(gr = gr[ind2, ], Beta = NULL, 
M = mat[ind1, , drop = FALSE], CN = NULL, colData = pData, 
annotation = c(array = array, annotation = annotation), 
preprocessMethod = preprocessing)

So, the question is that how should I apply "cpg.annotate" to TCGA methylation data in hg38 (450K)?

Another confusion is that I see "DMR.plot" has an option of "genome" which can be "hg38" (https://www.bioconductor.org/packages/devel/bioc/manuals/DMRcate/man/DMRcate.pdf). Is "hg38" only for EPICv2 in hg38?

See a related question here https://www.biostars.org/p/9587144/ .

Thanks a lot!

minfi DMRcate • 503 views

ADD COMMENT • link updated 5 hours ago by James W. MacDonald 67k • written 7 weeks ago by xiaofeiwang198266 • 0

0

Entering edit mode

Hi Xiaofei,

Thanks for this. If your data is from 450K, you'll have to call your DMRs in hg19, and then lift the DMR ranges over to hg38 post-hoc. DMRcate is one-to-one with regards to platform -> reference, since it follows the Illumina-provided annotation.

450K: IlluminaHumanMethylation450kanno.ilmn12.hg19

EPICv1: IlluminaHumanMethylationEPICanno.ilm10b4.hg19

EPICv2: IlluminaHumanMethylationEPICv2anno.20a1.hg38

cpg.annotate() isn't built for customised/homemade annotations, but you're more than welcome to fork the git (https://github.com/timpeters82/DMRcate-devel/) adapt it for your own needs.

Cheers, Tim

ADD REPLY • link 5 weeks ago Tim Peters ▴ 200

0

Entering edit mode

Also, re DMR.plot(), yes the understanding is that all DMRs from EPICv2 should be plotted in hg38. I've left that implied for the user since the same function uses sequencing data but if it gets too confusing I'll force the annotations from array data in a future commit.

ADD REPLY • link 5 weeks ago Tim Peters ▴ 200

0

Entering edit mode

Hi Tim, Thanks for your reply! Yes, my data is 450K, and it is TCGA methylation data downloaded by TCGAbiolinks. The problem is that I can only get the data in hg38 due to the GDC legacy archive retirement? I don't know how to retrieve them in hg19 anymore. Best, Xiaofei

ADD REPLY • link 5 weeks ago xiaofeiwang198266 • 0

0

Entering edit mode

Hi Xiaofei,

Is it possible to generate a matrix of beta values or M-values with rownames as probe IDs from this data? If so, cpg.annotate() will automatically reannotate them to hg19 and you can proceed as usual.

Cheers, Tim

ADD REPLY • link 5 weeks ago Tim Peters ▴ 200

0

Entering edit mode

Thanks for your reply!

ADD REPLY • link 5 weeks ago xiaofeiwang198266 • 0

0

Entering edit mode

I am in a similar situation. So, I have a matrix of beta values with row names as probe ids (which are annotated to hg38 as it is from harmonized TCGA dataset). I am using cpg.annotate with arraytype=450K. When I try to plot with DMR.plot using hg38 it does not work but it works with hg19. So, if I read Tim's suggestion correctly, does it mean that I should use hg19 with DMR.plot as cpg.annotate automatically re-annotate to hg19? Thank you! Prakash

ADD REPLY • link 6 hours ago Prakash Sah • 0

0

Entering edit mode

Yes. If you have the Illumina IDs, then all the annotation data that cpg.annotate uses will be based on hg19. It doesn't matter if TCGA lifted the data over to hg38, because you aren't using their location data, you are using Illumina's location data.

ADD REPLY • link 5 hours ago James W. MacDonald 67k