single-channel GenePix data

0

Entering edit mode

Svetlana Bulashevska ▴ 20

@svetlana-bulashevska-1761

Last seen 10.2 years ago

Dear colleagues, I have single-channel GenePix data, I have managed to read it in with the package limma, which is designed for two-color data. Could you please give me a tip what can I do further to normalize the data and to find differentially expressed genes? Thank you very much for the help, Svetlana Bulashevska.

limma limma • 1.2k views

ADD COMMENT • link updated 18.4 years ago by Henrik Bengtsson ★ 2.4k • written 18.5 years ago by Svetlana Bulashevska ▴ 20

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 3 months ago

United States

On 6/20/06 8:18 AM, "Svetlana Bulashevska" <s.bulashevska at="" dkfz-heidelberg.de=""> wrote: > Dear colleagues, > I have single-channel GenePix data, I have managed to read it in with > the package limma, > which is designed for two-color data. > Could you please give me a tip what can I do further to normalize the data > and to find differentially expressed genes? > Thank you very much for the help, Svetlana, The major difference between one- and two-channel data is at the normalization stage. For one-channel data, there typically needs to be a "between array" normalization, rather than a "within array" normalization. Limma offers some between array normalization methods, but you could also look at some of the methods in the affy packages. Quantile normalization is one that has gotten a fair amount of good press and use. There are others, though. You will probably need to evaluate them in your own context before you can make a decision. Sean

ADD COMMENT • link 18.4 years ago Sean Davis 21k

0

Entering edit mode

Dear Bioconductor mailing list, I am explorying the two packages ScISI and y2hStat: the aim of these packages is to build an interactome starting from the available protein protein interaction datasets and combining them following a certain model. I would like to compare the interactome obtained with these packages with interactomes produced with other approaches and with a my own approach. These other putatives interactomes are generally in the format "proteinA proteinB"; now,my question is: how to go from the incidence matrix (the final output of the package ScISI) to this other kind of format without having the information about who was the bait and who was the prey?(I am interested in doing this with the merging result, the object ScISI.rda). maybe a "solution" could be to transform the incidence matrix in a list of lists by doing.... for (i in 1:ncol(incidence.matrix)) { complexes.list$i <- list() comp <- rownames(exam[which(incidence.matrix[, i] == 1), i, drop =FALSE]) complexes.list[[i]]<-comp } and then apply the "matrix model" to every list component? but in this way the result is not comparable with the other interactomes generally based on the so called spoke model... thanks for your attention, regards, maria Maria Persico, PhD. student http://cbm.bio.uniroma2.it/~maria/ MINT database group Universita' di Tor Vergata, via della Ricerca scientifica 11 00133 Roma, Italy Tel +39 0672594315 (Supervisor's room) Fax +39 0672594766 Mobile phone: +393479715662 e-mail maria at cbm.bio.uniroma2.it

ADD REPLY • link 18.4 years ago Maria Persico ▴ 100

0

Entering edit mode

Hi Maria, Let me try and answer your questions here pretty concisely which will probably generate more questions from you... First, we developed the ScISI package so as to get away from other interactomes with the format "protein A : protein B". One reason is that ScISI uses the hypergraph model (equivalently, the bipartite graph model) for protein complex membership. The relationship is protein membership in a protein complex, and so this relationship is not one to one as format for "protein A : protein B" entails. The incidence matrix represents the hypergraph for the protein complex interactome of the organism: the rows are indexed by the genes and the columns are indexed by the protein complexes. A one in the (i,j) position of the matrix signifies protein i is a memberber of protein complex j. The matrix and spoke models are methods to use one to one (or binary) relationships to model protein complex co-membership (i.e are two proteins common to any complex at all in the interactome) loosely based on the affinity purification-mass spectrometry technology. The problem with these models is that they don't offer any insights to non-binary relationships. One thing to note when you analyze the data is that protein co- membership between 2 proteins ($p_1$ and $p_2$) does not imply that these two proteins will directly, physically interact; it means they are constituent members of some protein complex. So please do not compare protein co-membership binary data with protein physical interaction data. The are related by not the same. Before you can convert the data to a list you want, you need to convert the incidence matrix (hypergraph model) to an adjacency matrix (graph model) which does model binary relationships. What you will need to do is this: library("ScISI") data(ScISI) IM <- ScISI AM <- IM %*% t(IM) At this point, the matrix AM will be an adjacency matrix where both rows and columns are indexed by the genes. The (i,j) entry is a nonnegative entry which counts the number of distinct protein complexes to which protein i and protein j are co-members. Therefore any non-zero entry of AM gives you a protein co-membership relationship "protein A : protein B". If you don't care about the multiplicity, you can run these two lines of code mode(AM) <- "logical" mode(AM) <- "numeric" This will make AM into a {0,1}-matrix where the entry 1 implies co-membership and 0 implies not. From here you can generate the "protein A : protein B" relationships fairly easily with code you have given. One last thing is that ScISI does not have any information about baits and preys. ScISI.rda estimates some true state of nature within an organism which will not have bait and prey relationships. Bait and prey information is only relevant to experimental data (actaully pretty important). If you want the bait to prey data for AP-MS experiments, 5 empirical data sets can be found in the R-packagee apComplex (TAP.rda, HMSPCI.rda, Krogan.rda, gavinBP2006.rda, and kroganBPMat2006.rda). The above is only really valid for protein complex data. If you are looking for physical interaction information, y2hStat has both small and large scale data sets from Y2H experiments: library("y2hStat") data(y2h) names(y2h) The structure is a list of list of list. The top level is a list of 42 experiments. Each of the 42 is a list of bait to prey interactions. Each sub-list of each experiment lists represents a bait (the name of the this sub-list is the gene name of the bait), and the contents of this list is a character vector of the prey found by this bait. These data set can be comparable to the physical interactions. Cheers, --Tony On Thu, 22 Jun 2006 maria at cbm.bio.uniroma2.it wrote: > Dear Bioconductor mailing list, > > I am explorying the two packages ScISI and y2hStat: the aim of these > packages is to build an interactome starting from the available protein > protein interaction datasets and combining them following a certain model. > I would like to compare the interactome obtained with these packages with > interactomes produced with other approaches and with a my own approach. > These other putatives interactomes are generally in the format "proteinA proteinB"; > now,my question is: > how to go from the incidence matrix (the final output of the package > ScISI) to this other kind of format without having the information about who was the > bait and who was the prey?(I am interested in doing this with the merging > result, the object ScISI.rda). > > maybe a "solution" could be to transform the incidence matrix in a > list of lists by doing.... > > for (i in 1:ncol(incidence.matrix)) { > complexes.list$i <- list() > comp <- rownames(exam[which(incidence.matrix[, i] == 1), i, drop =FALSE]) > complexes.list[[i]]<-comp > } > > > and then apply the "matrix model" to every list > component? but in this way the result is not comparable > with the other interactomes generally based on the so called spoke > model... > > thanks for your attention, > > regards, > > maria > > Maria Persico, PhD. student > http://cbm.bio.uniroma2.it/~maria/ > MINT database group > Universita' di Tor Vergata, via della Ricerca scientifica 11 > 00133 Roma, Italy > Tel +39 0672594315 (Supervisor's room) > Fax +39 0672594766 > Mobile phone: +393479715662 > e-mail maria at cbm.bio.uniroma2.it > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 18.4 years ago Tony Chiang ▴ 570

0

Entering edit mode

Henrik Bengtsson ★ 2.4k

@henrik-bengtsson-4333

Last seen 6 months ago

United States

Hi. if you can assume that your spot intensities "should" be roughly the same for all your arrays, the I suggest you to: 1) Check the empirical distributions of the log (base 2)-intensities for all your arrays. Do they differ a lot at higher intensities? That indicates a difference in scale between arrays (different scanner settings, labelling or hybridization efficiency, ...). Don't worry too much about the lower intensities for now, because even small differences will blow up there due to the log-scale. 2) Do some MA plots for random array pairs. Do you see a curvature at lower intensities? That indicates that you have a background/offset in your data. Do you see a shift away from M=0 at high intensities? That is because of different scales. Note that you might see a curvature even when the offsets are the same in all array, because of different scales. You might want plot all possible pairs in the same MA plot. If it looks like the MA clouds converge to the same "point" at the lower-end of the intensity range, that indicates a common offset in all arrays. 3) Plot pairs of raw (=non-log) spot signals from two random arrays in a XY plot. Zoom in at the lower intensities too, i.e. 0-500 or so. You might want to plot all possible pairs in different colors in the same scatter plot. Add a diagonal line to, i.e. abline(a=0,b=1). To the different data clouds (rays) converge toward the origin (0,0) or not? If toward (0,0) you have little background/offset in your data. If toward a different point, you have offset. If toward a point along the diagonal you might have an offset in your scanner. 4) If you have a common offset in all arrays, you might have identified a scanner offset [1]. This you can calibrate for if you scan your arrays at multiple PMT-levels. See my reply to "[BioC] multi PMT scan combination" on June 14, 2006 [http://article.gmane.org/gmane.science.biology.informatics.conductor/ 8998]. If you already scanned your arrays, a second best option is to scan one array multiple times and estimate the offset in the scanner. Subtract this offset from the spot signals in all your arrays. This should work, because we found that the scanner offset was very stable across arrays [1]. 5) Look at (1)-(3) again. Even when the scanner offset is as low as 10-15 units (on 0-65535) you will see a difference at the lower intensities. At this point it might be enough to just rescale the spot signals to the same average intensity. Verify by (1)-(3). 6) If there is still offset effects remaining in (1)-(3) such may have been added somewhere in the process up to (but excluding) the scanning. To correct for such background we have to turn to less reliable assumptions/modelling. The simplest model is to assume a background plus a scale difference (but no higher order terms). Mathematically this is modelled by an affine function f(x)=a+bx+noise. Thus, try affine normalization [2] of all your arrays at once. Since such a model is not fully identifiable (without spike-ins) there is one parameter you have to tune by hand/visually. In practice, the parameter specifies how much background you allow to subtract or alternatively how many non-positive signals you allow. 7) Look at (1)-(3) again. It should look better now. Note however that at lower log-intensities the non-log signals are very weak and small shifts may look huge on the log scale. Don't be afraid of those. 8) If it still not look good, we have to turn to other assumption beyond the offset and scale differences. At this point I would try out the quantile normalization methods (in addition). If possible, try one that allows you to set the smoothness of the estimated quantiles. This will roughly correspond to estimating f(x)=a + bx + cx^2 + dx^3 + ... with more and more coefficients. All of the above is explain more or less explicitly in [1] and [2]. Also, I prefer to work with foreground signals only and not do background subtraction based on image-analysis background estimates. References: [1] H. Bengtsson, J. Vallon-Christersson and G. J?nsson, Calibration and assessment of channel-specific biases in microarray data with extended dynamical range, BMC Bioinformatics, 5:177, 2004. [2] Bengtsson, H. J?nsson, G. and Christersson, J.V. Calibration and assessment of channel-specific biases in microarray data with extended dynamical range BMCBioinfo, 2004, 5. Talks: http://www.maths.lth.se/bioinformatics/talks/ Software: The aroma.* packages at http://www.braju.com/R/. Hope this give you some ideas how to proceed. Henrik On 6/20/06, Svetlana Bulashevska <s.bulashevska at="" dkfz-heidelberg.de=""> wrote: > Dear colleagues, > I have single-channel GenePix data, I have managed to read it in with > the package limma, > which is designed for two-color data. > Could you please give me a tip what can I do further to normalize the data > and to find differentially expressed genes? > Thank you very much for the help, > Svetlana Bulashevska. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD COMMENT • link 18.4 years ago Henrik Bengtsson ★ 2.4k

Login before adding your answer.