Entering edit mode
I have performed gene set analysis (GSA) using gsameth from missMethyl package for EPICv2 array. I noticed that the probes annotated to more than one gene gets completely excluded from the analysis. I found the reason and corrected it but was curios if it is isolated case. The problem lies in one of the subfunctions of gsameth.
In gsameth => getMappedEntrezIDs => .getFlatAnnotation
# This is the way i tested it
Anno <- getAnnotation(IlluminaHumanMethylationEPICv2anno.20a1.hg38)
flat_test <- .getFlatAnnotation(array.type = "EPIC_V2", anno = Anno)
> head(rownames(flat_test))
[1] "cg25324105_BC111" "cg25383568_TC111" "cg25623721_TC111" "cg25898577_BC11" "cg25908985_BC11" "cg25910443_TC111"
# And this is the line where the problem is located within the .getFlatAnnotation
flat <- data.frame(symbol = unlist(geneslist), group = unlist(grouplist))
This results in inaccurate transformation of list into dataframe for probes with multiple genes. Probes change from cg25324105_BC11 to cg25324105_BC111
Then I decided to change it to
flat <- data.frame(
rowname = rep(names(geneslist), lengths(geneslist)),
symbol = unlist(geneslist),
group = unlist(grouplist))
> head(flat$rowname)
[1] "cg00381604_BC11" "cg00381604_BC11" "cg00381604_BC11" "cg00381604_BC11" "cg00381604_BC11" "cg21870274_BC21"
# And in subsequent lines I changed rownames(flat) to flat$rowname
