frma normalization and batch effects
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 10.3 years ago
Dear all, I am working on expression classifiers for leukemic subtypes using Affymetrix Plus2 arrays. The training data consists of several batches. The developed classifier will be used to predict the subtype of new sets of samples as well as single samples. So far, I co- normalized new arrays with the training set, but this is not ideal. I have read the frma paper by McCall et al, and it seems the perfect solutions. Before I start, I have a few conceptual questions: 1. The training data consists of several batches of different sizes, some of them biased towards a single subtype. Does normalization per batch using summarize=???random_effect??? remove biology in this case? ComBat clearly did, and I ended up not correcting for batch effect, which worked fine for the classifiers I am using. Any suggestion which summarization would be best to use in this case? 2. Is there a minimum of arrays to use with summarize=???random_effect???? Any suggestions on how to best implement frma in this project are very welcome! Cheers, Judith -- output of sessionInfo(): R version 2.15.2 (2012-10-26) Platform: i386-w64-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base -- Sent via the guest posting facility at bioconductor.org.
Normalization frma Normalization frma • 1.8k views
ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 3 months ago
EMBL European Molecular Biology Laborat…
Hi Judith I am sure the frma people will have more specific recommendations, but in addition, both your questions below could be interpreted as questions of parameter choice in a (somewhat complex, since it includes the preprocessing and batch adjustment) classifier. An often useful way of making such choices is by cross-validation on a dataset that mimics the kind of data you expect to see in the future. I guess you might also enjoy Jeff Leek's recent talk: http://www.birs.ca/events/2013/5-day- workshops/13w5083/videos/watch/201308151110-Leek.mp4 with frozen sva, and top scoring pairs Best wishes Wolfgang On 23 Aug 2013, at 10:55, Judith Boer [guest] <guest at="" bioconductor.org=""> wrote: > > Dear all, > > I am working on expression classifiers for leukemic subtypes using Affymetrix Plus2 arrays. The training data consists of several batches. The developed classifier will be used to predict the subtype of new sets of samples as well as single samples. So far, I co- normalized new arrays with the training set, but this is not ideal. > > I have read the frma paper by McCall et al, and it seems the perfect solutions. Before I start, I have a few conceptual questions: > > 1. The training data consists of several batches of different sizes, some of them biased towards a single subtype. Does normalization per batch using summarize=???random_effect??? remove biology in this case? ComBat clearly did, and I ended up not correcting for batch effect, which worked fine for the classifiers I am using. Any suggestion which summarization would be best to use in this case? > > 2. Is there a minimum of arrays to use with summarize=???random_effect???? > > Any suggestions on how to best implement frma in this project are very welcome! > > Cheers, Judith > > > -- output of sessionInfo(): > > R version 2.15.2 (2012-10-26) > Platform: i386-w64-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 > [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C > [5] LC_TIME=English_United Kingdom.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Judith, You're probably fine using the default frma summarization unless your data are in some way atypical. The random effect summarization just allows the probe effects in your data to differ slightly from the global (frozen) probe effects. Also if you're going to have singletons for which you want to predict in the future, then the default summarization is definitely the way to go. Depending on how big your dataset is going to be, you might consider creating your own custom frma implementation using the frmaTools package. Finally frma addresses one type of batch effect functioning at the probe level. It does nothing when the batch effect exists at the probeset level. So something like fSVA, after preprocessing with frma, would certainly be a good idea as well. Best, Matt On Aug 23, 2013 7:56 AM, "Wolfgang Huber" <whuber@embl.de> wrote: > Hi Judith > > I am sure the frma people will have more specific recommendations, but in > addition, both your questions below could be interpreted as questions of > parameter choice in a (somewhat complex, since it includes the > preprocessing and batch adjustment) classifier. An often useful way of > making such choices is by cross-validation on a dataset that mimics the > kind of data you expect to see in the future. > > I guess you might also enjoy Jeff Leek's recent talk: > http://www.birs.ca/events/2013/5-day- workshops/13w5083/videos/watch/201308151110-Leek.mp4with frozen sva, and top scoring pairs > > Best wishes > Wolfgang > > On 23 Aug 2013, at 10:55, Judith Boer [guest] <guest@bioconductor.org> > wrote: > > > > > Dear all, > > > > I am working on expression classifiers for leukemic subtypes using > Affymetrix Plus2 arrays. The training data consists of several batches. The > developed classifier will be used to predict the subtype of new sets of > samples as well as single samples. So far, I co-normalized new arrays with > the training set, but this is not ideal. > > > > I have read the frma paper by McCall et al, and it seems the perfect > solutions. Before I start, I have a few conceptual questions: > > > > 1. The training data consists of several batches of different sizes, > some of them biased towards a single subtype. Does normalization per batch > using summarize=†random_effect†remove biology in this case? ComBat > clearly did, and I ended up not correcting for batch effect, which worked > fine for the classifiers I am using. Any suggestion which summarization > would be best to use in this case? > > > > 2. Is there a minimum of arrays to use with summarize=†random_effect†> ? > > > > Any suggestions on how to best implement frma in this project are very > welcome! > > > > Cheers, Judith > > > > > > -- output of sessionInfo(): > > > > R version 2.15.2 (2012-10-26) > > Platform: i386-w64-mingw32/i386 (32-bit) > > > > locale: > > [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United > Kingdom.1252 > > [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C > > [5] LC_TIME=English_United Kingdom.1252 > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > > > > -- > > Sent via the guest posting facility at bioconductor.org. > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 621 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6