Hi everyone,
I am analyzing a Methylation EPIC array data using ChAMP package. I observed that after apply champ.load() function, some extra probes are getting added to my actual probes list. Those probes were not present in the file imported by champ.import function and also not added when I perform champ.filter.
Since I want to use "SWAN" normalization in champ.norm() function I had to use champ.load(). When I applied champ.load(), the number of probes matching with manifest (EPIC.mainfest.hg19) are less. Ideally manifest file should have all the probes and coordinates for array data. But in my case I found, NOT all the probes present in myLoad$beta (which is a file after champ.load) or myNorm match with manifest and there are few hundred probes remain unassigned. The doubt is champ.load() seems to add some extra probes which are not present in my data. Here is my command:
> myLoad_2<-champ.load(directory = getwd(),
> method="minfi",
> methValue="B",
> autoimpute=TRUE,
> filterDetP=TRUE,
> ProbeCutoff=0,
> SampleCutoff=0.1,
> detPcut=0.01,
> filterBeads=TRUE,
> beadCutoff=0.05,
> filterNoCG=TRUE,
> filterSNPs=TRUE,
> population=NULL,
> filterMultiHit=TRUE,
> filterXY=TRUE,
> force=FALSE,
> arraytype="EPIC")
Number of probes remained after filtering: 705948 Number of filterd probes matched with manifest: 705762 Some extra probes I checked manually which were not overlapped with manifest: 186 Also I tried to check the origin and Some of these 186 probes overlapped with human.450K.manifest.csv file. How it is possible that when I give EPIC and probes were not present in my EPIC.manifest and also in my original data gets incorporated after champ.load().
I tried to look for solution but could not find anything.
I request you to please help me with the same.
Thank you
Hi Yuan,
Thank you for the reply.
I tried to do as you suggested. Again there is a less overlap:
This is same as previous results. And those 186 probe still did not overlap with "AnnoEPIC".
I am not able to figure where is the problem.
The only observation I had is when I intersected this with 450K Anno file, 87 out of 186 overlapped.
I don't understand why this 450K probes + other probes are incorporated in my EPIC data.
Also, champ.import() + champ.filter() pipeline gives not extra probes added, What could be the issue only with champ.load()
Please help.
Thanks
I would like to add another observation:
The above two commands gives different number of filtered probes retained. I think it has to do with "minfi" method.
What is the issue? and what should I do?
Please suggest.
Thanks
Hi,
I tried checking with older versions also, but the results are same.
If any information, please let me know.
Thank you
Hi Ankit:
I did some test, with latest version ChAMP, and downloaded GSE137541 for testing, I saw the data was published recently, so I assume it is using the latest version EPIC.
1: All CpG in myLoad or myImport are matched with AnnoEPIC annotation if you use the default "ChAMP" method. AnnoEPIC is used in champ.import(), so which is what I expected. If you use minfi method, AnnoEPIC will not be used when loading.
2: champ.load() get exactly the same result as champ.import() + champ.filter(), which is what it should be, because if you check the champ.load() code, it's simply combined champ.import() and champ.filter() togather, without any modification. But one thing maybe worth notice is: If I load data with champ.import() then use champ.filter(), the
beadcount
parameter in champ.fiter() should be assigned asmyImport$beadcount
.3: I compared "minfi" method and "ChAMP" method in champ.load() function. In 450K data they get exactly the same result. In EPIC data, "minfi" method indeed included 100+ more CpGs, but if I compared "common CpGs" between two loading results, they are exactly the same. I did a quick check, the difference was caused when data are read by minfi's function
readmetharray()
. So I am thinking maybe there is some tiny EPIC annotation version difference between two methods. But I need time to find out the difference. However, the difference only influence 100+ CpGs, all the rest are exactly the same.If as you said, the old version is matched. I may assume "minfi" method get updated at some point because ChAMP's loading method/annotation has not been modified in past 2 years, so I guess maybe minfi slightly improved the annotation, included 100+ CpGs. However, I did not see any updating on Illumina website, the latest version was what published in 2017. So it's kind of wired...
I am now thinking your "extra" probes are imported by new "minfi" method right? (because ChAMP just simply employed minfi's function, so if minfi get upgraded, "minfi method"' get upgraded as well). And see if you add "beadcount" parameter in champ.filter(), the result would be the same. And third, you may select common CpGs between two loading methods, see if they are the same.
I am currently on vacation (with extremely poor internet), so will check it next week when I back to work.
Best Tian
Hi Yuan,
Thank you for the reply.
I tried to check the ChAMP pipeline with both 450K and two different EPIC data sets we have and with respect to your comments observations are as follows:
1: All CpG in myLoad or myImport are matched with AnnoEPIC annotation if you use the default "ChAMP" method.
-> Yes I agree. With "ChAMP" method, all CpG probes matched with AnnoEPIC annotation in BOTH the datasets.
2: champ.load() get exactly the same result as champ.import() + champ.filter().
-> Yes, I checked. The filtered probes in champ.load() or champ.import() + champ.filter() remains exactly the same in both of our EPIC datasets.
:Scripts used with EPIC dataset 1:
"minfi" method
---same script as previous---
"ChAMP" method (default)
:Scripts used with EPIC dataset 2:
"minfi" method
---same script as previous---
"ChAMP" method (default)
3: I compared "minfi" method and "ChAMP" method in champ.load() function. In 450K data they get exactly the same result.
-> Yes I tested it with one of our 450K data. The filtered probes remains same both with "minfi" and "ChAMP" method.
In EPIC data, "minfi" method indeed included 100+ more CpGs, but if I compared "common CpGs" between two loading results, they are exactly the same.
-> With one of the EPIC dataset, there is a complete overlap (except those extra CpGs) but with one other dataset one probe ID did not match + extra CpGs (venn diagram attached).
:a). EPIC dataset 1 (full match):
----Overlap----
::b). EPIC dataset 2 (one probe not matched):
----Overlap----
However, the difference only influence 100+ CpGs, all the rest are exactly the same.
-> Will it affect normalization while running champ.norm(), for example SWAN normalization?
If as you said, the old version is matched.
->For the current analysis I used the new version of ChAMP and minfi.
ChAMP2.16.1 minfi1.32.0
By old version I meant, ChAMP 2.8.9. I also downgraded "minfi" to 1.22.1 following ChAMP 2.8.9 version page and retested the pipeline. The results were similar as obtained with newer version.
I am not sure if something else need to be downgraded to remove discrepancy between number of CpGs between two methods.
**And see if you add "beadcount" parameter in champ.filter(), the result would be the same.
-> Yes the result were same.
And third, you may select common CpGs between two loading methods, see if they are the same.
-> As I mentioned, in one dataset there is a full match (except those extra CpGs) but not in another dataset (one probe is different + extra CpGs).
Please help.
Thank you
Kind regards
Ankit
Hi
Any update related to this ?
Thanks
Hi did you resolve this issue?
Sorry I am busying on something, will check it this this weekend. 在 2020年2月10日 +0000 PM5:17,Ankit [bioc] noreply@bioconductor.org,写道:
Hi:
A short update for my investigation. I tried to find the different this weekend, but not make it yet. I can confirm that if you use minfi to load EPIC data, it would have 232 CpGs extra then ChAMP method. The rest is exactly the same. I compared them right after loading (champ.import() and read.metharray.exp() functions), without any filtering.
And weirdly, seems the 232 extra CpGs can not be found in B4 version annotation package. But can be found in B2 version annotation package. So I still suspect the reason could be minfi added B2-extra CpGs into it's default annotation (maybe to reduce waste...), while ChAMP used the official B4 annotation.
But as I said, except those extra CpGs, for the rest CpGs. the two methods return exactly the same result. So both should be reliable to be used. Since minfi's method is very well structured and sealed, I need more time to figure out the reason.
Best Tian
Hi Tian, Thanks for your investigation. So can I continue using minfi loading with present settings and does not care about extra CpGs added? I urgently need to process data and in absence of exact reason behind extra CpGs it would be difficult to explain the numbers of filtered out CpGs during analysis.
Please let me know if you figure out the reason. May be you can update ChAMP package for minfi loading with new B4 annotation.
Thanks
Hi Any updates? Did u resolve the issue?
Did you update the version? Please look into the matter as I will have to use only minfi loading with SWAN normalization. I have some restrictions to use champ default loading. I am willing that each number of the output while. Minfi load should be explainable. Please correct if any bug in the current version and update asap. Thanks
I would try to identify the reason this week. I suspect that's because minfi did not use B4 version, but I cannot modify mini's function, as it's another package, and I just used it's function for loading.
Best Tian
Hi Let me know if you have any updates?
Hi:
I can't modify the minfi's loading function, and force other people to use my annotation. However, I suspect the only difference is minfi used another annotation, which contains about 200+ CpGs, but the rest should b exactly the same.
Could you email me to discuss this? tianyuan1991hit@gmail.com
Best Tian