Question

ComBat

0

Entering edit mode

W. Evan Johnson ▴ 870

@w-evan-johnson-5447

Last seen 10 months ago

United States

Hi Tam, Sorry about the confusion. Two items: 1. Adding a column of numbers (row number) is a a known "harmless" bug in the ComBat function. It is basically adding a row of column numbers to your dataset, which can easily just be deleted in Excel (and the data can be shifted over). However I recognize that your case is a little different, hence: 2. On line 3154, your gene description has some character formatting that is causing issues with R's "read.table" function--which is used by ComBat. I think its reading the apostrophe as quoted text, so it is then concatenating everything after that as text until the next apostrophe. Anyway, here is how you fix it: open up your dataset ('12arraysCombatImputed_2.txt') in Excel, then save it as a .csv. Then run ComBat using the option: type='csv'. Alternatively, you can remove the second column from your dataset for ComBat and then add them back in after adjustment. I just did this myself on your data and it worked. However, you still need to delete the column numbers from item #1 before the data are ready to go! Also, I noticed that your variances are also not well-behaved (see the plot that comes up), so I'd recommend that you use a non-parametric prior (par.prior=F). Note that this may take an hour or so to run so make sure that the parametric prior is working before you try the non- parametric one. Thanks! Evan Okay, I looked at your data. On line 3154, R's "read.table" On Oct 31, 2012, at 10:33 AM, SSc Array Core wrote: > I am running Combat on the attached files. 2 channel array (with reference). Problem is the adjusted file returns an added column of numbers where CLID should be. This column then stops delivering said numbers around line 3154, returning to CLID, shifting all the information and data to the left. I am stumped as to why this is happening. Please advise. > > thanks, > > tam > > Reading Sample Information File > Reading Expression Data File > Found 2 batches > Found 1 covariate(s) > Found 260 Missing Data Values > Standardizing Data across genes > Fitting L/S model and finding priors > Finding parametric adjustments > Adjusting the Data > Adjusted data saved in file: Adjusted_12arraysCombatImputed_2.txt_.xls > > ComBat('12arraysCombatImputed_2.txt','sample_info_file_mouse.txt', skip=2,write=T) > <12arraysCombatImputed_2.txt><sample_info_file_mouse.txt><adjusted_1 2arrayscombatimputed_2.txt_.xls="">

• 1.7k views

ADD COMMENT • link 12.5 years ago • updated 12.4 years ago W. Evan Johnson ▴ 870

score 0 · Answer 1 · 2012-11-02

0

Entering edit mode

Gerhard Thallinger ▴ 180

@gerhard-thallinger-1552

Last seen 6 months ago

Austria

Hi, > 2. On line 3154, your gene description has some character formatting > that is causing issues with R's "read.table" function--which is used > by ComBat. I think its reading the apostrophe as quoted text, so it > is then concatenating everything after that as text until the next > apostrophe. > Anyway, here is how you fix it: open up your dataset > ('12arraysCombatImputed_2.txt') in Excel, then save it as a .csv. > Then run ComBat using the option: type='csv'. > Alternatively, you can remove the second column from your > dataset for ComBat and then add them back in after adjustment. To make life easier for ComBat users and to avoid similar problems in the future, quote="" should be added to the read.table() call in ComBat. Regards Gerhard ---------------------------------------------------------------------- -- Dr. Gerhard Thallinger E-mail: Gerhard.Thallinger at tugraz.at Institute for Genomics and Bioinformatics Web: http://genome.tugraz.at Graz University of Technology Tel: +43 316 873 5343 Petersgasse 14/V Fax: +43 316 873 105343 8010 Graz, Austria Map: http://genome.tugraz.at/Loc.html

ADD COMMENT • link 12.5 years ago Gerhard Thallinger ▴ 180

0

Entering edit mode

Gerhard, Thanks for the suggestion. I will definitely do this. I will also try to fix the column names bug in the ComBat.R script as well. I don't believe that the Bioconductor version of ComBat has problems with either of these bugs. Thanks! Evan On Nov 2, 2012, at 3:54 AM, Gerhard Thallinger wrote: > Hi, > >> 2. On line 3154, your gene description has some character formatting >> that is causing issues with R's "read.table" function--which is used >> by ComBat. I think its reading the apostrophe as quoted text, so it >> is then concatenating everything after that as text until the next >> apostrophe. >> Anyway, here is how you fix it: open up your dataset >> ('12arraysCombatImputed_2.txt') in Excel, then save it as a .csv. >> Then run ComBat using the option: type='csv'. >> Alternatively, you can remove the second column from your >> dataset for ComBat and then add them back in after adjustment. > > To make life easier for ComBat users and to avoid similar problems > in the future, quote="" should be added to the read.table() call > in ComBat. > > Regards > > Gerhard > > -------------------------------------------------------------------- ---- > Dr. Gerhard Thallinger E-mail: Gerhard.Thallinger at tugraz.at > Institute for Genomics and Bioinformatics Web: http://genome.tugraz.at > Graz University of Technology Tel: +43 316 873 5343 > Petersgasse 14/V Fax: +43 316 873 105343 > 8010 Graz, Austria Map: http://genome.tugraz.at/Loc.html > >

ADD REPLY • link 12.5 years ago W. Evan Johnson ▴ 870

score 0 · Answer 2 · 2012-11-12

0

Entering edit mode

W. Evan Johnson ▴ 870

@w-evan-johnson-5447

Last seen 10 months ago

United States

Hey Simona, Thanks for your email. This is a difficult question. If the number of samples and number of miRNAs in the data are large, then I think it would be safe to apply ComBat to the data. The non-parametrix prior option might be best. For small sample or miRNA sizes, I don't think there is a good tool out there at the moment. I'd be interested to see how it works on your data. Thanks! Evan On Nov 12, 2012, at 3:45 PM, Simona Rossi wrote: > Hello, > > I'm Simona and I'm a Bioinformatician. > > I'm studying some miRNASeq data that are affected by a > strong batch effect, in your opinion, ComBat might be > applied to miRNASeq data? > > Thank you very much in advance, > > Best Regards, > > Simona Rossi > > -- > -- > Dr. Simona Rossi > Bioinformatician > > SIB | Swiss Institute of Bioinformatics > > Quartier Sorge > Bâtiment Génopode > CH-1015 Lausanne > Switzerland > > simona.rossi@isb-sib.ch > www.isb-sib.ch > [[alternative HTML version deleted]]

ADD COMMENT • link 12.4 years ago W. Evan Johnson ▴ 870

0

Entering edit mode

Hello Evan, thank you very much for the prompt answer! Well, I might give it a try then: I have 755 miRNAs (after filtering ~600 of them will survive) and 400 samples, what do you think? Best, Simona On Mon, Nov 12, 2012 at 9:55 PM, W. Evan Johnson <wej@bu.edu> wrote: > Hey Simona, > > Thanks for your email. This is a difficult question. If the number of > samples and number of miRNAs in the data are large, then I think it would > be safe to apply ComBat to the data. The non-parametrix prior option might > be best. For small sample or miRNA sizes, I don't think there is a good > tool out there at the moment. > > I'd be interested to see how it works on your data. Thanks! > > Evan > > > > On Nov 12, 2012, at 3:45 PM, Simona Rossi wrote: > > Hello, > > I'm Simona and I'm a Bioinformatician. > > I'm studying some miRNASeq data that are affected by a > strong batch effect, in your opinion, ComBat might be > applied to miRNASeq data? > > Thank you very much in advance, > > Best Regards, > > Simona Rossi > > -- > -- > Dr. Simona Rossi > Bioinformatician > > SIB | Swiss Institute of Bioinformatics > > Quartier Sorge > Bâtiment Génopode > CH-1015 Lausanne > Switzerland > > simona.rossi@isb-sib.ch > www.isb-sib.ch > > > -- -- Dr. Simona Rossi Bioinformatician SIB | Swiss Institute of Bioinformatics Quartier Sorge Bâtiment Génopode CH-1015 Lausanne Switzerland t +41 21 692 40 83 f +41 21 692 40 55 simona.rossi@isb-sib.ch www.isb-sib.ch [[alternative HTML version deleted]]

ADD REPLY • link 12.4 years ago Simona Rossi ▴ 40

0

Entering edit mode

Simona, Yes, these sample and miRNA sizes look great. My greatest concern was due to the discrete nature of sequencing data as well as the potentially small number of miRNAs in a given sample. The problem with parametric ComBat is that it is based on a Normal-Normal hierarchical model, and even the non-parametric version assumes normal data (but non-parametric priors). For small sample sizes, say 50 miRNAs and say 5-10 samples, the Normal assumptions might be a bad idea. However with your large numbers I think ComBat should work fine. Let me know how it goes! Thanks! Evan On Nov 12, 2012, at 4:01 PM, Simona Rossi wrote: > Hello Evan, > > thank you very much for the prompt answer! > > Well, I might give it a try then: I have 755 miRNAs (after filtering ~600 of them will survive) and 400 samples, what do you think? > > Best, Simona > > On Mon, Nov 12, 2012 at 9:55 PM, W. Evan Johnson <wej@bu.edu> wrote: > Hey Simona, > > Thanks for your email. This is a difficult question. If the number of samples and number of miRNAs in the data are large, then I think it would be safe to apply ComBat to the data. The non-parametrix prior option might be best. For small sample or miRNA sizes, I don't think there is a good tool out there at the moment. > > I'd be interested to see how it works on your data. Thanks! > > Evan > > > > On Nov 12, 2012, at 3:45 PM, Simona Rossi wrote: > >> Hello, >> >> I'm Simona and I'm a Bioinformatician. >> >> I'm studying some miRNASeq data that are affected by a >> strong batch effect, in your opinion, ComBat might be >> applied to miRNASeq data? >> >> Thank you very much in advance, >> >> Best Regards, >> >> Simona Rossi >> >> -- >> -- >> Dr. Simona Rossi >> Bioinformatician >> >> SIB | Swiss Institute of Bioinformatics >> >> Quartier Sorge >> Bâtiment Génopode >> CH-1015 Lausanne >> Switzerland >> >> simona.rossi@isb-sib.ch >> www.isb-sib.ch >> > > > > > -- > -- > Dr. Simona Rossi > Bioinformatician > > SIB | Swiss Institute of Bioinformatics > > Quartier Sorge > Bâtiment Génopode > CH-1015 Lausanne > Switzerland > > t +41 21 692 40 83 > f +41 21 692 40 55 > > simona.rossi@isb-sib.ch > www.isb-sib.ch > [[alternative HTML version deleted]]

ADD REPLY • link 12.4 years ago W. Evan Johnson ▴ 870