reading with lumi

0

Entering edit mode

Tim Triche ★ 4.2k

@tim-triche-3561

Last seen 4.5 years ago

United States

This is not going to be as short as I wish it would be, but here goes... 1) that data is NOT in Final Report Format (although I did run KIRC 450k through GenomeStudio to check our work, our internal pipeline has been to use methylumIDAT on raw scanner output); older archives like 27k COAD are structured in a format that can be extracted from level 3 (masked beta values), or else level 1 data (M and U intensities, which are fed as matrices to methylumi or minfi). Each file represents one sample (tumor or normal or cell line control), the details of which are included in the mage-tab directory. 2) newer archives (primarily 450k data, but also some 27k data, such as LAML and KIRC) include IDAT files and a mapping from sample name to IDAT barcode, both in the MAGE-tab experiment description and in the AUX directory. I always suggest using those. 3) older archives (primarily 27k data, and I can't think of a single 450k archive like this) are most easily processed using the Level 3 data, which is to say, beta values that have been masked to NA for SNPs and detection p-values > 0.05. Converting beta values to M-values (log2(M/U)) is trivial; the only objection I have to using level 3 data is that it doesn't recapitulate the entire process. My preference would have been to use IDAT files right from the beginning, but I only got involved in packaging last summer, and at that point there were a number of "data freeze" events that needed to be taken care of. By the time we put BRCA up (the largest 450k dataset within TCGA), the levels (IDATs as level 1, M/U/p as level 2, betas as level 3) had solidified to the current structure. The IDF and SDRF files do represent the experimental design as faithfully as we are able, given the time constraints and the MAGE-tab spec. One sensible thing to do here is to make sure that an up-to-date vignette, using one 27k and 450k tumor each for preprocessing, is included in methylumi for the upcoming release. The primary goal here is for every step in any TCGA paper to be easily reproducible. Packages that predate my involvement in packaging did not (in my opinion) make that very easy, so I lobbied for the format changes. On Wed, Feb 29, 2012 at 2:48 PM, Ed Siefker <ebs15242@gmail.com> wrote: > I am trying to read Level 1 methylation data > from the TCGA into bioconductor. The > platform is HumanMethylation27, which > is supported by lumi, right? > > Here is my R session: > > > library(lumi) > Loading required package: methylumi > Loading required package: Biobase > > Welcome to Bioconductor > > Vignettes contain introductory material. To view, type > 'browseVignettes()'. To cite Bioconductor, see > 'citation("Biobase")' and for packages 'citation("pkgname")'. > > Loading required package: nleqslv > KernSmooth 2.23 loaded > Copyright M. P. Wand 1997-2009 > > Attaching package: lumi > > The following object(s) are masked from package:methylumi: > > estimateM, getHistory > > Warning message: > found methods to import for function as.list but not the generic itself > > > > fileName <- > 'jhu-usc.edu_COAD.HumanMethylation27.1.lvl-1.TCGA-AA-3555-01A- 01D-0820-05.txt' > > example.lumi <- lumiR(fileName) > Error in gregexpr("\t", dataLine1)[[1]] : subscript out of bounds > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]

Preprocessing PROcess lumi methylumi Preprocessing PROcess lumi methylumi • 1.9k views

ADD COMMENT • link 13.0 years ago Tim Triche ★ 4.2k

Login before adding your answer.