Order in which ReadAffy() and read.affybatch()

0

Entering edit mode

hrishikesh deshmukh ▴ 420

@hrishikesh-deshmukh-1008

Last seen 10.6 years ago

Hello All, I have questions about the order in which ReadAffy() and read.affybatch() reads in affy CEL files. I need this piece of information because i want to label the arrays when i look at hist() and boxplot(). I want to make sure that right labels (filenames) are displayed for its corresponding lines/boxplots. Is there a book specifically on BioC, this would be a big help? In general on what basis does one accept/reject arrays from a pool of replicates! The hist() and boxplot() shows clearly that all the arrays (replicates) do not show the same "behaviour". Here are the code fragments: library(affy) library(hgu95av2cdf) library(hgu95av2probe) library(matchprobes) data(hgu95av2probe) summary(hgu95av2probe) file.names<-c("1.CEL", "2.CEL", "3.CEL", "4.CEL", "5.CEL","6.CEL","7.CEL", "8.CEL", "9.CEL", "10.CEL", "11.CEL","12.CEL","13.CEL",14.CEL","15.CEL","16.CEL","17.CEL") M<-read.affybatch(filenames=file.names, description=NULL,notes="",compress=F, m.mask=F,rm.outliers=F,rm.extra=F,verbose=T) hist(M) legend(12,1.2,sampleNames(M),col=1:17,lty=1:17) When i run the legend line i see hist() displays different "lines" and legend does not match correctly! Thanks in advance. Hrishi

affy affy • 2.3k views

ADD COMMENT • link updated 20.1 years ago by Matthew Hannah ▴ 940 • written 20.1 years ago by hrishikesh deshmukh ▴ 420

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 5 days ago

United States

Hrishikesh Deshmukh wrote: > Hello All, > > I have questions about the order in which ReadAffy() > and read.affybatch() reads in affy CEL files. I need > this piece of information because i want to label the > arrays when i look at hist() and boxplot(). I want to > make sure that right labels (filenames) are displayed > for its corresponding lines/boxplots. Samples are read in alphanumeric order. In your case this would be 1.CEL, 10.CEL, 11.CEL, 12.CEL,... If you want them to be read in the same order as your file.names list, you need to prepend a zero on all the celfile names that have a single digit (e.g., 01.CEL, 02.CEL, etc). > > Is there a book specifically on BioC, this would be a > big help? > > In general on what basis does one accept/reject arrays > from a pool of replicates! The hist() and boxplot() > shows clearly that all the arrays (replicates) do not > show the same "behaviour". > > Here are the code fragments: > library(affy) > library(hgu95av2cdf) > library(hgu95av2probe) > library(matchprobes) > data(hgu95av2probe) > summary(hgu95av2probe) > file.names<-c("1.CEL", "2.CEL", "3.CEL", "4.CEL", > "5.CEL","6.CEL","7.CEL", "8.CEL", "9.CEL", > "10.CEL", > "11.CEL","12.CEL","13.CEL",14.CEL","15.CEL","16.CEL","17.CEL") > M<-read.affybatch(filenames=file.names, > description=NULL,notes="",compress=F, > m.mask=F,rm.outliers=F,rm.extra=F,verbose=T) > hist(M) > legend(12,1.2,sampleNames(M),col=1:17,lty=1:17) Note that col=1:17 is the same as col=c(1:8, 1:8, 1). If you want more colors, you will have to resort to actual color names (there are hundreds of those, see colors()). Also note that if you want the density plots and legend to match, you have to supply the same variables to both. hist(M, lty=c(rep(1,8), rep(2,8), 3), col=1:17) legend(12, 1.2, sampleNames(M), col=1:17, lty=c(rep(1,8), rep(2,8), 3)) > > When i run the legend line i see hist() displays > different "lines" and legend does not match correctly! > > Thanks in advance. > Hrishi > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor -- James W. MacDonald Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623

ADD COMMENT • link 20.1 years ago James W. MacDonald 68k

0

Entering edit mode

Adaikalavan Ramasamy ★ 1.8k

@adaikalavan-ramasamy-675

Last seen 10.6 years ago

See comments below. On Fri, 2005-03-18 at 08:26 -0800, Hrishikesh Deshmukh wrote: > Hello All, > > I have questions about the order in which ReadAffy() > and read.affybatch() reads in affy CEL files. I need Alphabetically, but the behaviour may vary between Windows and Linux due to case sensitivity. > this piece of information because i want to label the > arrays when i look at hist() and boxplot(). I want to This is a dangerous practice as you will be assuming that filenames are read alphabetically. If you work on multiple OS, this might be a nightmare. Besides, since the filenames are used as the column names in ReadAffy you do not need to need to care about which order it reads in the files. raw <- ReadAffy() head( exprs( raw ) ) a.CEL b.CEL c.CEL d.CEL [1,] 253.8 335.8 176.5 238.3 [2,] 19607.3 19437.5 11239.5 20985.5 [3,] 218.0 275.3 169.5 263.5 [4,] 20284.5 19956.8 11324.8 21180.5 [5,] 87.5 94.8 100.3 78.5 [6,] 224.5 237.8 186.5 165.8 Then you can do a strsplit() the column names or match() it to something else. > make sure that right labels (filenames) are displayed > for its corresponding lines/boxplots. > > Is there a book specifically on BioC, this would be a > big help? > > In general on what basis does one accept/reject arrays > from a pool of replicates! The hist() and boxplot() > shows clearly that all the arrays (replicates) do not > show the same "behaviour". This is before preprocessing right ? There could be systematic noises that preprocessing algorithms can handle. I think people usually reject on the basis of biological evidence such as housekeeping genes, RNA degradation plots or eye-balling the chip. > Here are the code fragments: > library(affy) > library(hgu95av2cdf) > library(hgu95av2probe) > library(matchprobes) > data(hgu95av2probe) > summary(hgu95av2probe) > file.names<-c("1.CEL", "2.CEL", "3.CEL", "4.CEL", > "5.CEL","6.CEL","7.CEL", "8.CEL", "9.CEL", > "10.CEL", > "11.CEL","12.CEL","13.CEL",14.CEL","15.CEL","16.CEL","17.CEL") > M<-read.affybatch(filenames=file.names, > description=NULL,notes="",compress=F, > m.mask=F,rm.outliers=F,rm.extra=F,verbose=T) Why not just do ReadAffy() ? It will return the filenames as column names. > hist(M) > legend(12,1.2,sampleNames(M),col=1:17,lty=1:17) Interesting. Why do I get a density plot when I call hist() on an Affybatch class ? > When i run the legend line i see hist() displays > different "lines" and legend does not match correctly! > > Thanks in advance. > Hrishi > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor >

ADD COMMENT • link 20.1 years ago Adaikalavan Ramasamy ★ 1.8k

0

Entering edit mode

hrishikesh deshmukh ▴ 420

@hrishikesh-deshmukh-1008

Last seen 10.6 years ago

Hi All, I am not that familiar with BioC terms, i know readaffy() and read.affybatch() makes it easy to read CEL file and two "different" kinds of objects are created and typically some "kinds" of analyses for example boxplot() may work with readaffy() and not with read.affybatch() and hist() might work well with read.affybatch()!! But these are the kinds of questions for which docs are non-existant! Vignettes help but they only whet your appetite but do not satisfy your hunger!! Sorry went in different direction. I do work with multiple OS and thanks for the piece of very important information. One simple way to make sure no matter what order the files are read, doing simple hist() and/or boxplot() and then making sure that right labels(filenames) are given for the lines/plots.....now to do this simple thing one has to go through lot of documentation! Ahhh!!! Is there a book on BioC specifically which will help people be conversant with terms and use it efficiently!!! But hats off to the mailing list members for answering my simple/naive questions. Regards, Hrishi --- Adaikalavan Ramasamy <ramasamy@cancer.org.uk> wrote: > See comments below. > > On Fri, 2005-03-18 at 08:26 -0800, Hrishikesh > Deshmukh wrote: > > Hello All, > > > > I have questions about the order in which > ReadAffy() > > and read.affybatch() reads in affy CEL files. I > need > > Alphabetically, but the behaviour may vary between > Windows and Linux due > to case sensitivity. > > > this piece of information because i want to label > the > > arrays when i look at hist() and boxplot(). I want > to > > This is a dangerous practice as you will be assuming > that filenames are > read alphabetically. If you work on multiple OS, > this might be a > nightmare. > > Besides, since the filenames are used as the column > names in ReadAffy > you do not need to need to care about which order it > reads in the files. > > raw <- ReadAffy() > head( exprs( raw ) ) > > a.CEL b.CEL c.CEL d.CEL > [1,] 253.8 335.8 176.5 238.3 > [2,] 19607.3 19437.5 11239.5 20985.5 > [3,] 218.0 275.3 169.5 263.5 > [4,] 20284.5 19956.8 11324.8 21180.5 > [5,] 87.5 94.8 100.3 78.5 > [6,] 224.5 237.8 186.5 165.8 > > Then you can do a strsplit() the column names or > match() it to something > else. > > > > make sure that right labels (filenames) are > displayed > > for its corresponding lines/boxplots. > > > > Is there a book specifically on BioC, this would > be a > > big help? > > > > In general on what basis does one accept/reject > arrays > > from a pool of replicates! The hist() and > boxplot() > > shows clearly that all the arrays (replicates) do > not > > show the same "behaviour". > > This is before preprocessing right ? There could be > systematic noises > that preprocessing algorithms can handle. I think > people usually reject > on the basis of biological evidence such as > housekeeping genes, RNA > degradation plots or eye-balling the chip. > > > > Here are the code fragments: > > library(affy) > > library(hgu95av2cdf) > > library(hgu95av2probe) > > library(matchprobes) > > data(hgu95av2probe) > > summary(hgu95av2probe) > > file.names<-c("1.CEL", "2.CEL", "3.CEL", > "4.CEL", > > "5.CEL","6.CEL","7.CEL", "8.CEL", "9.CEL", > > "10.CEL", > > > "11.CEL","12.CEL","13.CEL",14.CEL","15.CEL","16.CEL","17.CEL") > > M<-read.affybatch(filenames=file.names, > > description=NULL,notes="",compress=F, > > m.mask=F,rm.outliers=F,rm.extra=F,verbose=T) > > Why not just do ReadAffy() ? It will return the > filenames as column > names. > > > hist(M) > > legend(12,1.2,sampleNames(M),col=1:17,lty=1:17) > > Interesting. Why do I get a density plot when I call > hist() on an > Affybatch class ? > > > When i run the legend line i see hist() displays > > different "lines" and legend does not match > correctly! > > > > Thanks in advance. > > Hrishi > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > >

ADD COMMENT • link 20.1 years ago hrishikesh deshmukh ▴ 420

0

Entering edit mode

See comments below. On Fri, 2005-03-18 at 11:18 -0800, Hrishikesh Deshmukh wrote: > Hi All, > > I am not that familiar with BioC terms, i know > readaffy() and read.affybatch() makes it easy to read > CEL file and two "different" kinds of objects are You need help() ! From details section in help("ReadAffy") or help ("read.affybatch") : 'ReadAffy' is a wrapper for 'read.affybatch' that permits the user to read in phenoData, MIAME information, and CEL files using widgets. One can also define files where to read phenoData and MIAME information. BTW, there is no such function as readaffy(). Remember that R is case sensitive. > created and typically some "kinds" of analyses for > example boxplot() may work with readaffy() and not > with read.affybatch() and hist() might work well with > read.affybatch()!! But these are the kinds of These are two different ways of reading in the data. They both read CEL files into AffyBatch class. > questions for which docs are non-existant! Vignettes Try looking up the appropriate help pages. Or search via help.search() and/or the mailing archives. > help but they only whet your appetite but do not > satisfy your hunger!! Sorry went in different > direction. > > I do work with multiple OS and thanks for the piece of > very important information. One simple way to make > sure no matter what order the files are read, doing > simple hist() and/or boxplot() and then making sure > that right labels(filenames) are given for the Err, how does looking at histograms tell you which columns belong to which file, especially considering that many thousands of points make them look very similar. Often a simple head() and a check into CEL files would be sufficient. > lines/plots.....now to do this simple thing one has to > go through lot of documentation! Ahhh!!! This is the process of learning and it is not guaranteed to be easy. > Is there a book on BioC specifically which will help > people be conversant with terms and use it > efficiently!!! a) this is a rather very dynamic field and b) IMHO, most BioC members are busy improving the techniques used for design and analysis I am not sure a book on BioConductor would be available. If it is, it may grow outdated fairly rapidly. Your best bet is to either look at the help() or look under "Documentation" on the BioConductor website. I have benefited from the documents from Short Courses, Lab Materials, Research Talks, ... > But hats off to the mailing list members for answering > my simple/naive questions. > > Regards, > Hrishi > > > --- Adaikalavan Ramasamy <ramasamy@cancer.org.uk> > wrote: > > See comments below. > > > > On Fri, 2005-03-18 at 08:26 -0800, Hrishikesh > > Deshmukh wrote: > > > Hello All, > > > > > > I have questions about the order in which > > ReadAffy() > > > and read.affybatch() reads in affy CEL files. I > > need > > > > Alphabetically, but the behaviour may vary between > > Windows and Linux due > > to case sensitivity. > > > > > this piece of information because i want to label > > the > > > arrays when i look at hist() and boxplot(). I want > > to > > > > This is a dangerous practice as you will be assuming > > that filenames are > > read alphabetically. If you work on multiple OS, > > this might be a > > nightmare. > > > > Besides, since the filenames are used as the column > > names in ReadAffy > > you do not need to need to care about which order it > > reads in the files. > > > > raw <- ReadAffy() > > head( exprs( raw ) ) > > > > a.CEL b.CEL c.CEL d.CEL > > [1,] 253.8 335.8 176.5 238.3 > > [2,] 19607.3 19437.5 11239.5 20985.5 > > [3,] 218.0 275.3 169.5 263.5 > > [4,] 20284.5 19956.8 11324.8 21180.5 > > [5,] 87.5 94.8 100.3 78.5 > > [6,] 224.5 237.8 186.5 165.8 > > > > Then you can do a strsplit() the column names or > > match() it to something > > else. > > > > > > > make sure that right labels (filenames) are > > displayed > > > for its corresponding lines/boxplots. > > > > > > Is there a book specifically on BioC, this would > > be a > > > big help? > > > > > > In general on what basis does one accept/reject > > arrays > > > from a pool of replicates! The hist() and > > boxplot() > > > shows clearly that all the arrays (replicates) do > > not > > > show the same "behaviour". > > > > This is before preprocessing right ? There could be > > systematic noises > > that preprocessing algorithms can handle. I think > > people usually reject > > on the basis of biological evidence such as > > housekeeping genes, RNA > > degradation plots or eye-balling the chip. > > > > > > > Here are the code fragments: > > > library(affy) > > > library(hgu95av2cdf) > > > library(hgu95av2probe) > > > library(matchprobes) > > > data(hgu95av2probe) > > > summary(hgu95av2probe) > > > file.names<-c("1.CEL", "2.CEL", "3.CEL", > > "4.CEL", > > > "5.CEL","6.CEL","7.CEL", "8.CEL", "9.CEL", > > > "10.CEL", > > > > > > "11.CEL","12.CEL","13.CEL",14.CEL","15.CEL","16.CEL","17.CEL") > > > M<-read.affybatch(filenames=file.names, > > > description=NULL,notes="",compress=F, > > > m.mask=F,rm.outliers=F,rm.extra=F,verbose=T) > > > > Why not just do ReadAffy() ? It will return the > > filenames as column > > names. > > > > > hist(M) > > > legend(12,1.2,sampleNames(M),col=1:17,lty=1:17) > > > > Interesting. Why do I get a density plot when I call > > hist() on an > > Affybatch class ? > > > > > When i run the legend line i see hist() displays > > > different "lines" and legend does not match > > correctly! > > > > > > Thanks in advance. > > > Hrishi > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor@stat.math.ethz.ch > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com >

ADD REPLY • link 20.1 years ago Adaikalavan Ramasamy ★ 1.8k

0

Entering edit mode

Matthew Hannah ▴ 940

@matthew-hannah-621

Last seen 10.6 years ago

Hrishi, The bioC documentation, "introduction to R", short courses, BioC archive search and normal google reward the persistent. Here's a couple of pointers that may also help. #r ignores anything on the same line following a # help.start() # starts html help #?any command - opens the documentation eg: ?par #all you need to know for controlling and labelling graphs library(affy) data <- ReadAffy() #Getting your filenames and removing .CEL sampleNames(data) un <- ".CEL" sampleNames(data) <- gsub(un, "", sampleNames(data)) #Look at plots of log2 raw intensity hist(data, lwd=2, lty=1, col=rainbow(25), main = "Chip raw intensity") legend(13, 0.4, legend=sampleNames(data), lty=1, lwd=2, col=rainbow(25)) boxplot(data, col=rainbow(25),las=2) #now look at RMA normalized data eset <- rma(data) boxplot(exprs(eset)) boxplot(as.data.frame(exprs(eset)),col=rainbow(25),las=2) #compare expression between samples par(mfrow = c(2, 2)) x <- exprs(eset) plot(x[,1], x[,2], pch=".", main="Sample 1 vs 2", xlab="RMA exprs sample 1", ylab="RMA exprs sample 2") abline(0,1,col="Red") plot(x[,1], x[,3], pch=".", main="Sample 1 vs 2", xlab="RMA exprs sample 1", ylab="RMA exprs sample 3") abline(0,1,col="Red") plot(x[,2], x[,3], pch=".", main="Sample 1 vs 2", xlab="RMA exprs sample 2", ylab="RMA exprs sample 3") abline(0,1,col="Red") history(50) ############## Hi All, I am not that familiar with BioC terms, i know readaffy() and read.affybatch() makes it easy to read CEL file and two "different" kinds of objects are created and typically some "kinds" of analyses for example boxplot() may work with readaffy() and not with read.affybatch() and hist() might work well with read.affybatch()!! But these are the kinds of questions for which docs are non-existant! Vignettes help but they only whet your appetite but do not satisfy your hunger!! Sorry went in different direction. I do work with multiple OS and thanks for the piece of very important information. One simple way to make sure no matter what order the files are read, doing simple hist() and/or boxplot() and then making sure that right labels(filenames) are given for the lines/plots.....now to do this simple thing one has to go through lot of documentation! Ahhh!!! Is there a book on BioC specifically which will help people be conversant with terms and use it efficiently!!! But hats off to the mailing list members for answering my simple/naive questions. Regards, Hrishi

ADD COMMENT • link 20.1 years ago Matthew Hannah ▴ 940

Login before adding your answer.