I have DNA methylation data that I have collected from male and female subjects using the Illumina 450K Array platform. I have read the data into R as a "methyLumiSet" from .IDAT files using the "methyLumi" package and I am planning on pre-processing the data using some of the normalization functions in the "wateRmelon" package:
suppressPackageStartupMessages(require(methylumi))
suppressPackageStartupMessages(require(wateRmelon))
suppressPackageStartupMessages(require(IlluminaHumanMethylation450kanno.ilmn12.hg19))
# Read sample data.
# phenoData is a data.frame with the row names = SentrixPosition_barcode
# and columns containing sample group, age and gender.
# Barcodes are pulled from a column in phenoData called 'barcodes'.
phenoData <- read.csv("/Users/Martens/Desktop/08272014/IDATs/Sample_Sheet.csv",header=TRUE)
barcodes <- subset(phenoData, select=barcodes)
# Import .IDAT files as methyLumiSet
methyLumiSet <- methylumIDAT(barcodes = barcodes, pdat = phenoData,
idatPath = "/Users/Martens/Desktop/08272014/IDATs")
As confirmation that all of the features were imported, I checked the number of rows in methyLumiSet:
nrow(methyLumiSet) Features 485577
The methyLumiSet is based off of the eSet class in Biobase. I would like to remove all features spanning X and Y chromosomes, as is common practice in DNA methylation analysis.
As an initial attempt, I tried to determine which probes fall on the Y chromosome using the following code with the idea that I would then remove those probes from the methyLumiSet.
methyLumiSet.ChrY <- methyLumiSet[fData(methyLumiSet)$CHROMOSOME=="Y", ]
however, when I check the number of probes, the result is 0 features:
nrow(methyLumiSet.ChrY) Features 0
I cannot figure out why I am unable to subset features of my methyLumiSet. However, a potential issue might be with the annotation. When I try to run the methyLumi function 'featureFilter' to remove the X chromosome, I get the following error messages:
methyLumiSet.Xfilt <- featureFilter(methyLumiSet, exclude.ChrX = TRUE)
Warning message: In .featureFilter(eset, require.entrez = require.entrez, require.GOBP = require.GOBP, : HumanMethylation450k probes annotate to multiple accessions(!) Error in mget(featureNames(eset), envir = annotate::getAnnMap("CHR", annChip), : error in evaluating the argument 'envir' in selecting a method for function 'mget': Error: getAnnMap: package IlluminaHumanMethylation450k not available
When I try to install IlluminaHumanMethylation450k, I get the following:
source("http://bioconductor.org/biocLite.R") biocLite("IlluminaHumanMethylation450k") BioC_mirror: http://bioconductor.org Using Bioconductor version 3.1 (BiocInstaller 1.18.3), R version 3.2.0. Installing package(s) ‘IlluminaHumanMethylation450k’ Old packages: 'stringi', 'VariantAnnotation' Update all/some/none? [a/s/n]:
I type 'a' to update all and after updating I get the following error message.
# Warning message: package ‘IlluminaHumanMethylation450k’ is not available (for R version 3.2.0)
My only guess is that my issue with subsetting probes by chromosome has to do with not being able to load the annotation information, but I am stuck on trying to figure out how to fix it. According to the reference manual for methyLumi, the package should be compatible with R version 3.2.0 and depends on IlluminaHumanMethylation450kanno.ilmn12.hg19 but even when I require this package I don't understand how to link the annotation data to the methyLumiSet.
any advice on properly annotating a methyLumiSet and/or removing X,Y chromosomes from a methyLumiSet would be great. I am pretty new to R.
I would prefer to do this without coercing to another structure (e.g., SummarizedExperiment/minfi type of object) if possible. Thanks!
-Chris
1) why would you want to toss out the X chromosome? There are an awful lot of sites on chrX that matter
2) why not add a term for gender in the regression fit / DMRcate / whatever instead?
3) why not check that the sex of samples (from X copy number and/or X inactivation) matches the putative sex of the subjects?
I should provide more information in ?methylumIDAT, to be sure, and I'll patch that as soon as I have a moment (to point to the somewhat extensive vignette, and perhaps add a few functions for the above, which have been languishing in another package). However, I would like to caution that simply throwing out sex chromosomes is often not the best idea. If you do want to do so, a good time to perform this task would be after normalization and mapping to the genome, which is explored to a degree in http://www.bioconductor.org/packages/release/bioc/vignettes/methylumi/inst/doc/methylumi450k.pdf .
Best,
--t
Thanks Tim,
I hear you on keeping the chrX probes and using gender as a covariate instead and will keep that in mind. For my current analysis, I am trying to validate targets from a previously published dataset in my own independent set of samples and thus, want to follow the exact analysis pipeline used in the original paper. Because they removed the sex chromosomes, there is no point in me keeping them.
One thing I'm still stuck on. Any idea why the featureFilter function will not work for me?
The error message I get is:
Thanks again,
-Chris