Limma toptable output using write.table and column names

0

Entering edit mode

Ken Termiso ▴ 250

@ken-termiso-1087

Last seen 10.6 years ago

I apologize in advance if this is confusing... When I use write.exprs (which, as I understand makes a call to write.table) to write expression data to a text file, the output text file has one less column name (the probe ID column does not get a name), and the other column names are shifted all the way to the left margin in the text file. When this text file is read into R using the command read.table(file="exprs.txt",header=TRUE), R converts the file into a data frame, and correctly displays the row labels as probeset IDs. (the spacing may be a little off here, depending on the display font, but here you can see that the probeset name is the row label) 6187.CEL 6188.CEL 6189.CEL 6190.CEL 6191.CEL 6192.CEL 1007_s_at 8.779289 8.732751 8.822360 8.743272 8.768605 8.813886 1053_at 3.508310 3.389342 3.434458 3.410836 3.373940 3.387063 117_at 3.139897 3.105285 3.114203 3.131865 3.073855 3.038960 However, with the limma toptables, each column has a name, including the probeset column ("ID"). When I write a toptable to a textfile, and then read it back into R, R thinks that the probeset IDs are a column of data (since it is labelled with "ID"), and then adds row numbers to this data frame. This makes it difficult to do other operations (at least in my novice hands!!) >tt[1:3,] ID M A t P.Value B 1 1007_s_at -0.002879009 8.776694 -0.09459093 0.9999627 -6.721547 2 1053_at -0.053423214 3.417325 -1.60706334 0.9999627 -5.499340 3 117_at -0.038235209 3.100678 -1.42248721 0.9999627 -5.724391 If I open up the toptable text file in excel, and delete the "ID" column name and do not shift over the other ones, this is what happens: >tt_spc[1:3,] X M A t P.Value B 1 1007_s_at -0.002879009 8.776694 -0.09459093 0.9999627 -6.721547 2 1053_at -0.053423210 3.417325 -1.60706300 0.9999627 -5.499340 3 117_at -0.038235210 3.100678 -1.42248700 0.9999627 -5.724391 R silently appended an "X" to the "ID" column name.. If I open the toptable file in excel, delete the "ID" column name, and then shift the other column names over one all the way to the left, and then open the text file in R it looks perfect: >tt_shft[1:3,] M A t P.Value B 1007_s_at -0.00288 8.776694 -0.0946 0.9999627 -6.721547 1053_at -0.05340 3.417325 -1.6100 0.9999627 -5.499340 117_at -0.03820 3.100678 -1.4200 0.9999627 -5.724391 BUT, I don't want to have to edit each toptable file in excel before re-opening it in R. I also tried setting the column name to "", and also giving the toptable data frame a string of names without the ID, but neither one worked...in both cases R filled in an "NA" for the column name... Is there any way for me to avoid having to edit the file in excel so that I can write it to a text file, read it back into R, and have it display the probeset names as the row labels??? I guess what I'm asking is this -- is there are way for me to modify the toptable data frame so that the "ID" is removed and R uses the "ID" column as the row labels?? Thanks in advance, -Ken

probe limma probe limma • 3.2k views

ADD COMMENT • link 20.2 years ago Ken Termiso ▴ 250

0

Entering edit mode

Julia Engelmann ▴ 130

@julia-engelmann-559

Last seen 10.6 years ago

Hi Ken, Ken Termiso wrote: > I apologize in advance if this is confusing... > > When I use write.exprs (which, as I understand makes a call to > write.table) to write expression data to a text file, the output text > file has one less column name (the probe ID column does not get a name), > and the other column names are shifted all the way to the left margin in > the text file. When this text file is read into R using the command > read.table(file="exprs.txt",header=TRUE), R converts the file into a > data frame, and correctly displays the row labels as probeset IDs. > > (the spacing may be a little off here, depending on the display font, > but here you can see that the probeset name is the row label) > 6187.CEL 6188.CEL 6189.CEL 6190.CEL 6191.CEL 6192.CEL > 1007_s_at 8.779289 8.732751 8.822360 8.743272 8.768605 8.813886 > 1053_at 3.508310 3.389342 3.434458 3.410836 3.373940 3.387063 > 117_at 3.139897 3.105285 3.114203 3.131865 3.073855 3.038960 > > > However, with the limma toptables, each column has a name, including the > probeset column ("ID"). When I write a toptable to a textfile, and then > read it back into R, R thinks that the probeset IDs are a column of data > (since it is labelled with "ID"), and then adds row numbers to this data > frame. This makes it difficult to do other operations (at least in my > novice hands!!) > When you read the toptable-textfile back into R, try setting the row.names-option of read.table: read.table(file.txt, row.names=1, ...) will use the first column of your textfile as rownames. Hope that helps, Julia >> tt[1:3,] > > ID M A t P.Value B > 1 1007_s_at -0.002879009 8.776694 -0.09459093 0.9999627 -6.721547 > 2 1053_at -0.053423214 3.417325 -1.60706334 0.9999627 -5.499340 > 3 117_at -0.038235209 3.100678 -1.42248721 0.9999627 -5.724391 > > If I open up the toptable text file in excel, and delete the "ID" column > name and do not shift over the other ones, this is what happens: > >> tt_spc[1:3,] > > X M A t P.Value B > 1 1007_s_at -0.002879009 8.776694 -0.09459093 0.9999627 -6.721547 > 2 1053_at -0.053423210 3.417325 -1.60706300 0.9999627 -5.499340 > 3 117_at -0.038235210 3.100678 -1.42248700 0.9999627 -5.724391 > > R silently appended an "X" to the "ID" column name.. > > > If I open the toptable file in excel, delete the "ID" column name, and > then shift the other column names over one all the way to the left, and > then open the text file in R it looks perfect: > >> tt_shft[1:3,] > > M A t P.Value B > 1007_s_at -0.00288 8.776694 -0.0946 0.9999627 -6.721547 > 1053_at -0.05340 3.417325 -1.6100 0.9999627 -5.499340 > 117_at -0.03820 3.100678 -1.4200 0.9999627 -5.724391 > > > BUT, I don't want to have to edit each toptable file in excel before > re-opening it in R. > > I also tried setting the column name to "", and also giving the toptable > data frame a string of names without the ID, but neither one worked...in > both cases R filled in an "NA" for the column name... > > Is there any way for me to avoid having to edit the file in excel so > that I can write it to a text file, read it back into R, and have it > display the probeset names as the row labels??? > > I guess what I'm asking is this -- is there are way for me to modify the > toptable data frame so that the "ID" is removed and R uses the "ID" > column as the row labels?? > > Thanks in advance, > -Ken > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > > -- -------------------------------------------------------------------- Julia Engelmann Bioinformatics Tel ++49 (931) 888 - 4558 Am Hubland mail julia.engelmann@biozentrum.uni- wuerzburg.de University of Wuerzburg 97074 Wuerzburg, Germany

ADD COMMENT • link 20.2 years ago Julia Engelmann ▴ 130

0

Entering edit mode

Ken Termiso ▴ 250

@ken-termiso-1087

Last seen 10.6 years ago

Excellent. Thanks for your help. What I did was : >tt <- data.frame(tt,row.names=tt$ID) #make row names probeset IDs >tt$ID <- NULL #to get rid of the ID column (since it is now redundant) This produced output text files that were read back into R as intended. >write.table(tt, file="tt", row.names = TRUE, col.names = TRUE, sep ="\t") >tp <- read.table(file="tt",header=TRUE) >tp[1:3,] M A t P.Value B 1007_s_at -0.002879009 8.776694 -0.09459093 0.9417878 -6.721547 1053_at -0.053423214 3.417325 -1.60706334 0.3285045 -5.499340 117_at -0.038235209 3.100678 -1.42248721 0.3308744 -5.724391 >From: Julia Engelmann <julia.engelmann@biozentrum.uni-wuerzburg.de> >To: Ken Termiso <jerk_alert@hotmail.com> >CC: bioconductor@stat.math.ethz.ch >Subject: Re: [BioC] Limma toptable output using write.table and column >names >Date: Wed, 09 Feb 2005 10:22:36 +0100 > >Hi Ken, > > >Ken Termiso wrote: >>I apologize in advance if this is confusing... >> >>When I use write.exprs (which, as I understand makes a call to >>write.table) to write expression data to a text file, the output text file >>has one less column name (the probe ID column does not get a name), and >>the other column names are shifted all the way to the left margin in the >>text file. When this text file is read into R using the command >>read.table(file="exprs.txt",header=TRUE), R converts the file into a data >>frame, and correctly displays the row labels as probeset IDs. >> >>(the spacing may be a little off here, depending on the display font, but >>here you can see that the probeset name is the row label) >> 6187.CEL 6188.CEL 6189.CEL 6190.CEL 6191.CEL 6192.CEL >>1007_s_at 8.779289 8.732751 8.822360 8.743272 8.768605 8.813886 >>1053_at 3.508310 3.389342 3.434458 3.410836 3.373940 3.387063 >>117_at 3.139897 3.105285 3.114203 3.131865 3.073855 3.038960 >> >> >>However, with the limma toptables, each column has a name, including the >>probeset column ("ID"). When I write a toptable to a textfile, and then >>read it back into R, R thinks that the probeset IDs are a column of data >>(since it is labelled with "ID"), and then adds row numbers to this data >>frame. This makes it difficult to do other operations (at least in my >>novice hands!!) >> > >When you read the toptable-textfile back into R, try setting the >row.names-option of read.table: >read.table(file.txt, row.names=1, ...) >will use the first column of your textfile as rownames. > >Hope that helps, >Julia > >>>tt[1:3,] >> >> ID M A t P.Value B >>1 1007_s_at -0.002879009 8.776694 -0.09459093 0.9999627 -6.721547 >>2 1053_at -0.053423214 3.417325 -1.60706334 0.9999627 -5.499340 >>3 117_at -0.038235209 3.100678 -1.42248721 0.9999627 -5.724391 >> >>If I open up the toptable text file in excel, and delete the "ID" column >>name and do not shift over the other ones, this is what happens: >> >>>tt_spc[1:3,] >> >> X M A t P.Value B >>1 1007_s_at -0.002879009 8.776694 -0.09459093 0.9999627 -6.721547 >>2 1053_at -0.053423210 3.417325 -1.60706300 0.9999627 -5.499340 >>3 117_at -0.038235210 3.100678 -1.42248700 0.9999627 -5.724391 >> >>R silently appended an "X" to the "ID" column name.. >> >> >>If I open the toptable file in excel, delete the "ID" column name, and >>then shift the other column names over one all the way to the left, and >>then open the text file in R it looks perfect: >> >>>tt_shft[1:3,] >> >> M A t P.Value B >>1007_s_at -0.00288 8.776694 -0.0946 0.9999627 -6.721547 >>1053_at -0.05340 3.417325 -1.6100 0.9999627 -5.499340 >>117_at -0.03820 3.100678 -1.4200 0.9999627 -5.724391 >> >> >>BUT, I don't want to have to edit each toptable file in excel before >>re-opening it in R. >> >>I also tried setting the column name to "", and also giving the toptable >>data frame a string of names without the ID, but neither one worked...in >>both cases R filled in an "NA" for the column name... >> >>Is there any way for me to avoid having to edit the file in excel so that >>I can write it to a text file, read it back into R, and have it display >>the probeset names as the row labels??? >> >>I guess what I'm asking is this -- is there are way for me to modify the >>toptable data frame so that the "ID" is removed and R uses the "ID" column >>as the row labels?? >> >>Thanks in advance, >>-Ken >> >>_______________________________________________ >>Bioconductor mailing list >>Bioconductor@stat.math.ethz.ch >>https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> > >-- >-------------------------------------------------------------------- > >Julia Engelmann >Bioinformatics Tel ++49 (931) 888 - 4558 >Am Hubland mail julia.engelmann@biozentrum.uni- wuerzburg.de >University of Wuerzburg >97074 Wuerzburg, Germany

ADD COMMENT • link 20.2 years ago Ken Termiso ▴ 250

Login before adding your answer.