Question

questions on the ImaGene data using limma package

0

Entering edit mode

Ming YI [Contr] ▴ 30

@ming-yi-contr-3108

Last seen 9.7 years ago

United States

Hi, Dear Gordon: I tried to use limma to deal with ImaGene dataset I downloaded from ArrayExpress. I never deal with ImaGene data before and not familiar with ImaGene data format except knowing that the Cy5 and Cy3 signals are stored in two separate files for the same sample. I tried to read the data into limma and normalize them in the context of limma. and I keep running into issues and errors. and I wish you can help me with this regard: I did attach a file (E-NCMF-8_sdrf.txt) that was download from ArrayExpress can be potentially used for making the target file, and also I attached two raw data files of the ImaGene dataset as examples. The thing bothering me is as followed: Extract 3538 and Extract 3526 (see column "Extract Name" of E-NCMF-8_sdrf.txt file) , they do have one Cy5 and one matched Cy3 files, so that's fine with me. but in particular, for "Extract reference pool of 61 HNSCC" (see E-NCMF-8_sdrf.txt file), there are multiple Cy3 and Cy5 for such samples, how should we incorporate that into the target file? I intended to use the following code to deal with this ImaGene data targets<-readTargets() files<-targets[,c("FileNameCy3", "FileNameCy5")' RG<-read.maimages(files, source="imagene") but I need the right target file to start with particularly with the issue I mentioned above. Also for normalization, the RG<-backgroundCorrect(RG, method="normexp", offset=50) still appropiate for ImaGene data? Thanks so much for your help! Ming Yi ABCC P.O.Box B, Bldg 430 National Cancer Institute/SAIC-Frederick, Inc Frederick,Maryland USA

Normalization Cancer limma Normalization Cancer limma • 937 views

ADD COMMENT • link 15.9 years ago Ming YI [Contr] ▴ 30

score 0 · Answer 1 · 2008-10-31

>Dear Gordon: I re-posted my issue as below, I think I can not cc to "Bioconductor mailing list"<bioconductor at="" stat.math.ethz.ch="">, which is why I failed to post it to the mailing list. >Thanks a lot for your comments and suggestions. I already >successfully read all the data into limma objects based on your >suggestion using the generic method by using the attached target >file I edited from their annotation file as I sent to you earlier. I >did assume that the Cy3 channel is the common reference as you guessed. > >But the issue remained as you mentioned how actually they did the >experiment. Based on their E-NCMF-8.idf.txt file from >arrayExpress, it appears to be dye_swap_design, which is exactly >what you guessed. So the data appears to be collated by ArrayExpress >into data matrices with the Cy3 and Cy5 intensities in the same file >for each sample. But the concern is in the column of "Label" in the >file E-NCMF-8_sdrf.txt I sent to you in last email, what does those >Cy3 and Cy5 mean for each sample, it looks like this column may tell >for each sample (and corresponding raw data file), what is dye for >the sample and the other dye would be used for the common reference, >which was not mentioned in their annotation file. What do you think? >if this is true, I may need to change my target file coordinately to >accommodate this information. This assumption makes more sense at >least to explain the repeated samples in the dataset, which should >be the dye-swapping data. > >I tried to contact with them for details of the experiment design, >that should help to sort this out. > >By the way, I am not sure why my post not go to the mailing list. I >changed a bit the address this time, hope it works. > >Thanks again for your help. Any additional suggestion would be >appreciated as well. > >Best regards, > >Ming Yi ABCC P.O.Box B, Bldg 430 National Cancer Institute/SAIC-Frederick, Inc Frederick,Maryland USA >At 09:25 PM 10/29/2008, Gordon K Smyth wrote: >>Dear Ming, >>Thank you for mailing me example data sets and the annotation >>spreadsheet from ArrayExpress. >>You are assuming that the data from ArrayExpress are in ImaGene >>format. This is incorrect. The reason that limma gives a special >>treatment to ImaGene files is that, unlike other image analysis >>software, ImaGene writes the Cy3 and Cy5 channels into separate >>files. However ArrayExpress has collated the original data into >>data matrices with the Cy3 and Cy5 intensities in the same file for >>each sample. Therefore you should ignore all references to ImaGene >>in the limma manual, and instead use the instructions for generic >>two-color platforms. >>The data sets you sent me can easily be read into limma using the >>instructions in the limma User's Guide starting page 14 "What >>should you do if your image analysis program is not in the above >>list?" I demonstrate this below. >>Your emails suggest that you have not yet read any two-color data >>into limma. It is essential that you try some simple examples >>before trying a large dataset from ArrayExpress, which will have a >>complex structure you might not fully understand. >>I don't fully understand the sample annotation file from >>ArrayExpress that you sent me, but I doubt that you are >>interpretting it correctly. It is not in the format you need for a >>limma targets file. My guess is that each row of the file >>corresponds to one array, and that each array has been hybridized >>with a common reference that is not mentioned in the annotation >>file. This means that the repeated sample names you have noted do >>not represent matched Cy3 and Cy5 channels, but rather represent >>dye-swap technical replicates. That is, they are separate arrays. >>If my guess is correct, then a targets file would be something like below. >>Let me emphasize that I do not offer a plug-in service to read >>experimental data posted to ArrayExpress. It is your >>responsibility to figure out the experimental design and the >>ArrayExpression data formats. I am just guessing. >>Best wishes >>Gordon >> >>READING YOUR DATA FILES >> >>>f >>[1] "E-NCMF-8-raw-data-1363346838.txt" "E-NCMF-8-raw- data-1363346856.txt" >> >>>ann <- c("Database NCMF:DB:omadhuman","Database >>ebi.ac.uk:Database:ens_trscrpt_id","Feature coordinates: >>metaColumn","metaRow","column","row","Reporter >>identifier","Reporter sequence type") >> >>>columns <- list(Rf="ImaGene:Signal Mean_Cy5",Rb="ImaGene:Background >>Median_Cy5",Gf="ImaGene:Signal Mean_Cy3",Gb="ImaGene:Background Median_Cy3") >> >>>RG <- read.maimages(files=f,annotation=ann,columns=columns) >>Read E-NCMF-8-raw-data-1363346838.txt >>Read E-NCMF-8-raw-data-1363346856.txt >> >>>dim(RG) >>[1] 37632 2 >> >>A POSSIBLE TARGETS FILE >> >>>targets <- readTargets() >>>targets >> Source DiseaseState >> ArrayDataMatrixFile Cy3 Cy5 >>1 3560 Squamous Cell Carcinoma >>E-NCMF-8-raw-data-1363346838.txt Reference SCC3560 >>2 reference pool of 61 HNSCC Squamous Cell Carcinoma >>E-NCMF-8-raw-data-1363346856.txt Reference PoolHNSCC >> >>On Wed, 29 Oct 2008, Ming YI [Contr] wrote: >> >>>Hi, Dear Gordon: >>>I tried to use limma to deal with ImaGene dataset I downloaded >>>from ArrayExpress. I never deal with ImaGene data before and not >>>familiar with ImaGene data format except knowing that the Cy5 and >>>Cy3 signals are stored in two separate files for the same sample. >>>I tried to read the data into limma and normalize them in the >>>context of limma. and I keep running into issues and errors. and I >>>wish you can help me with this regard: >>>I did attach a file (E-NCMF-8_sdrf.txt) that was download from >>>ArrayExpress can be potentially used for making the target file, >>>and also I attached two raw data files of the ImaGene dataset as >>>examples. The thing bothering me is as followed: >>>Extract 3538 and Extract 3526 (see column "Extract Name" of >>>E-NCMF-8_sdrf.txt file) , they do have one Cy5 and one matched Cy3 >>>files, so that's fine with me. but in particular, for "Extract >>>reference pool of 61 HNSCC" (see E-NCMF-8_sdrf.txt file), there >>>are multiple Cy3 and Cy5 for such samples, how should we >>>incorporate that into the target file? >>>I intended to use the following code to deal with this ImaGene data >>>targets<-readTargets() >>>files<-targets[,c("FileNameCy3", "FileNameCy5")' >>>RG<-read.maimages(files, source="imagene") >>>but I need the right target file to start with particularly with >>>the issue I mentioned above. >>>Also for normalization, the >>>RG<-backgroundCorrect(RG, method="normexp", offset=50) still >>>appropiate for ImaGene data? >>>Thanks so much for your help! >>>Ming Yi >>>ABCC >>>P.O.Box B, Bldg 430 >>>National Cancer Institute/SAIC-Frederick, Inc >>>Frederick,Maryland >>>USA