Question

modify colClasses in read.columns?

0

Entering edit mode

Henrik Parn ▴ 20

@henrik-parn-2779

Last seen 10.4 years ago

Dear all, I have received some data sets with some variables that certainly looks like numeric: they are individual IDs that are composed of some numbers separated by ".", e.g. 6534231.18, 8783234.20. Not surprisingly they are treated as numeric by read.columns, and 8783234.20 ends up like 8783234.2 when read to R. When I used read.table I specified in colClasses that these variables should be read as |characters. However, in read.columns| |required.col| and |text.to.search| is used to set up the |colClasses| argument of |read.table|.| Does anyone have a suggestion of how I can modify the read.columns function so I can specify the colClasses myself? Thanks in advance! | -- Henrik P?rn Centre for Conservation Biology Department of Biology Norwegian University of Science and Technology NO-7491 Trondheim Norway Office: +47 73596285 Fax: +47 73596100 Mobile: +47 90989255 E-mail: henrik.parn at bio.ntnu.no

• 1.1k views

ADD COMMENT • link updated 16.8 years ago by Hervé Pagès 16k • written 16.8 years ago by Henrik Parn ▴ 20

score 0 · Answer 1 · 2008-04-25

0

Entering edit mode

Hervé Pagès 16k

@herve-pages-1542

Last seen 28 days ago

Seattle, WA, United States

Hi Henrik, I don't have read.columns() when I start a fresh R session so it looks like it's not part of the default R installation. Which package does it belong to? Providing your sessionInfo() is always a good idea as it would at least give us a clue of where to look for the read.columns() function. Also a small example (with code) of what you are trying to do would be very useful. Thanks! H. Henrik Parn wrote: > Dear all, > > I have received some data sets with some variables that certainly looks > like numeric: they are individual IDs that are composed of some numbers > separated by ".", e.g. 6534231.18, 8783234.20. Not surprisingly they are > treated as numeric by read.columns, and 8783234.20 ends up like > 8783234.2 when read to R. When I used read.table I specified in > colClasses that these variables should be read as |characters. However, > in read.columns| |required.col| and |text.to.search| is used to set up > the |colClasses| argument of |read.table|.| Does anyone have a > suggestion of how I can modify the read.columns function so I can > specify the colClasses myself? > > Thanks in advance! | >

ADD COMMENT • link 16.8 years ago Hervé Pagès 16k

0

Entering edit mode

Dear Herve, Thanks for your rapid answer! Sorry, I forgot to paste the sessionInfo into my previous mail: > sessionInfo() R version 2.7.0 (2008-04-22) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] coda_0.13-1 limma_2.13.8 lme4_0.99875-9 Matrix_0.999375-9 lattice_0.17-6 loaded via a namespace (and not attached): [1] grid_2.7.0 tools_2.7.0 > sessionInfo() The read.columns function is a part of the limma package in Bioconductor: source("http://bioconductor.org/biocLite.R") biocLite("limma") I would like to use the read.columns function to read a subset of columns from several data files. Here is some example columns (out of many) and rows of the data: ID i ID j Ni Nj S A R1 B R2 C R3 D R4 8414341.20 8414342.20 1 2 -1 1 0.425183 1 0.758413 1 0.551275 1 0.543045 8414341.20 8414343.20 1 3 -1 1 0.128981 1 0.034859 1 -0.001998 1 0.002093 In this example, there are 13 tab-delimited columns of which I want to use only ID i, ID i, R1, R2, R3 and R4. The problem with the data in its current form is the unfortunate format of the ID i and ID j columns: I need ID i and ID j to be treated as characters although they look like numeric (if they are read as numeric the .20 will become a .2). When I have used read.table(), I have first read all columns, and by using the argument colClasses = c("character", "character",...), I have preserved the format of ID i and ID j. In the next step I have selected only the relevant columns. I thought read.columns could be a convenient alternative to select only the relevant columns when reading the data, by using e.g. required.col = c("ID i", "ID j"), text.to.search = "R". However, in read.columns I cannot specify colClasses. As it says in the help text "It uses |required.col| and |text.to.search| to set up the |colClasses| argument of |read.table|.". So, I wonder anyone could advice me on how to modify the read.columns code to be able to specify colClasses, if it is not to complicated. Thanks in advance! Henrik Herve Pages wrote: > Hi Henrik, > > I don't have read.columns() when I start a fresh R session so it looks > like it's > not part of the default R installation. Which package does it belong to? > Providing your sessionInfo() is always a good idea as it would at > least give > us a clue of where to look for the read.columns() function. Also a > small example > (with code) of what you are trying to do would be very useful. > > Thanks! > H. > > > Henrik Parn wrote: > >> Dear all, >> >> I have received some data sets with some variables that certainly >> looks like numeric: they are individual IDs that are composed of some >> numbers separated by ".", e.g. 6534231.18, 8783234.20. Not >> surprisingly they are treated as numeric by read.columns, and >> 8783234.20 ends up like 8783234.2 when read to R. When I used >> read.table I specified in colClasses that these variables should be >> read as |characters. However, in read.columns| |required.col| and >> |text.to.search| is used to set up the |colClasses| argument of >> |read.table|.| Does anyone have a suggestion of how I can modify the >> read.columns function so I can specify the colClasses myself? >> >> Thanks in advance! | >> > -- Henrik P?rn Centre for Conservation Biology Department of Biology Norwegian University of Science and Technology NO-7491 Trondheim Norway Office: +47 73596285 Fax: +47 73596100 Mobile: +47 90989255 E-mail: henrik.parn at bio.ntnu.no

ADD REPLY • link 16.8 years ago Henrik Parn ▴ 20

0

Entering edit mode

Dear Henrik, with a file test.txt as follows: A B C 1 4711 34.50 2 ZAZA 01.40 and the call z=read.table("test.txt", colClasses=c("integer", "NULL", "character"), header=TRUE, sep="\t") I get > str(z) 'data.frame': 2 obs. of 2 variables: $ A: int 1 2 $ C: chr "34.50" "01.40" so maybe the functionality you wish is already provided by read.table? From looking at its code and man page, I don't think read.columns is designed to accept user input for what it takes as colClasses. In fact, when I try to supply colClasses to read.columns, I get: Errore in read.table(file = file, header = TRUE, col.names = allcnames: l'argumento formale "colClasses" ? associato a diversi argomenti passati Best wishes Wolfgang > sessionInfo() R version 2.8.0 Under development (unstable) (2008-04-27 r45517) x86_64-unknown-linux-gnu locale: LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT .UTF-8;LC_MONETARY=C;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_N AME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTI FICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] fortunes_1.3-4 ------------------------------------------------------------------ Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber Henrik Parn a ?crit 25/04/2008 21:21: > Dear Herve, > > Thanks for your rapid answer! > > Sorry, I forgot to paste the sessionInfo into my previous mail: > > > sessionInfo() > R version 2.7.0 (2008-04-22) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United > Kingdom.1252;LC_MONETARY=English_United > Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] coda_0.13-1 limma_2.13.8 lme4_0.99875-9 > Matrix_0.999375-9 lattice_0.17-6 > > loaded via a namespace (and not attached): > [1] grid_2.7.0 tools_2.7.0 > > sessionInfo() > > > The read.columns function is a part of the limma package in Bioconductor: > source("http://bioconductor.org/biocLite.R") > biocLite("limma") > > I would like to use the read.columns function to read a subset of > columns from several data files. Here is some example columns (out of > many) and rows of the data: > > ID i ID j Ni Nj S A R1 B R2 > C R3 D R4 > 8414341.20 8414342.20 1 2 -1 1 0.425183 1 0.758413 > 1 0.551275 1 0.543045 > 8414341.20 8414343.20 1 3 -1 1 0.128981 1 0.034859 > 1 -0.001998 1 0.002093 > > In this example, there are 13 tab-delimited columns of which I want to > use only ID i, ID i, R1, R2, R3 and R4. The problem with the data in its > current form is the unfortunate format of the ID i and ID j columns: I > need ID i and ID j to be treated as characters although they look like > numeric (if they are read as numeric the .20 will become a .2). When I > have used read.table(), I have first read all columns, and by using the > argument colClasses = c("character", "character",...), I have preserved > the format of ID i and ID j. In the next step I have selected only the > relevant columns. > > I thought read.columns could be a convenient alternative to select only > the relevant columns when reading the data, by using e.g. required.col = > c("ID i", "ID j"), text.to.search = "R". However, in read.columns I > cannot specify colClasses. As it says in the help text "It uses > |required.col| and |text.to.search| to set up the |colClasses| argument > of |read.table|.". So, I wonder anyone could advice me on how to modify > the read.columns code to be able to specify colClasses, if it is not to > complicated. > > Thanks in advance! > > > Henrik > > > > Herve Pages wrote: > >> Hi Henrik, >> >> I don't have read.columns() when I start a fresh R session so it looks >> like it's >> not part of the default R installation. Which package does it belong to? >> Providing your sessionInfo() is always a good idea as it would at >> least give >> us a clue of where to look for the read.columns() function. Also a >> small example >> (with code) of what you are trying to do would be very useful. >> >> Thanks! >> H. >> >> >> Henrik Parn wrote: >> >>> Dear all, >>> >>> I have received some data sets with some variables that certainly >>> looks like numeric: they are individual IDs that are composed of some >>> numbers separated by ".", e.g. 6534231.18, 8783234.20. Not >>> surprisingly they are treated as numeric by read.columns, and >>> 8783234.20 ends up like 8783234.2 when read to R. When I used >>> read.table I specified in colClasses that these variables should be >>> read as |characters. However, in read.columns| |required.col| and >>> |text.to.search| is used to set up the |colClasses| argument of >>> |read.table|.| Does anyone have a suggestion of how I can modify the >>> read.columns function so I can specify the colClasses myself? >>> >>> Thanks in advance! | >>> >

ADD REPLY • link 16.8 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Dear Henrik, Continuing on from Wolfgang's reply ... The main reason for the read.columns() function in the limma package is to avoid having to go through the rigmarole of setting up the colClassses argument to read.table(). If you want to set up colClasses yourself, it is expected that you will use read.table() directly. I will add some comments to the read.columns help page to make this clearer. Best wishes Gordon On Sun, 27 Apr 2008, Wolfgang Huber wrote: > Dear Henrik, > > with a file test.txt as follows: > > A B C > 1 4711 34.50 > 2 ZAZA 01.40 > > and the call > > z=read.table("test.txt", colClasses=c("integer", "NULL", "character"), > header=TRUE, sep="\t") > > I get > >> str(z) > 'data.frame': 2 obs. of 2 variables: > $ A: int 1 2 > $ C: chr "34.50" "01.40" > > > so maybe the functionality you wish is already provided by read.table? > > From looking at its code and man page, I don't think read.columns is designed > to accept user input for what it takes as colClasses. In fact, when I try to > supply colClasses to read.columns, I get: > > Errore in read.table(file = file, header = TRUE, col.names = allcnames: > l'argumento formale "colClasses" ? associato a diversi argomenti passati > > Best wishes > Wolfgang > > > >> sessionInfo() > R version 2.8.0 Under development (unstable) (2008-04-27 r45517) > x86_64-unknown-linux-gnu > > locale: > LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_ IT.UTF-8;LC_MONETARY=C;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC _NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDEN TIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] fortunes_1.3-4 > > > ------------------------------------------------------------------ > Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber > > > Henrik Parn a ?crit 25/04/2008 21:21: >> Dear Herve, >> >> Thanks for your rapid answer! >> >> Sorry, I forgot to paste the sessionInfo into my previous mail: >> >> > sessionInfo() >> R version 2.7.0 (2008-04-22) >> i386-pc-mingw32 >> >> locale: >> LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United >> Kingdom.1252;LC_MONETARY=English_United >> Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> other attached packages: >> [1] coda_0.13-1 limma_2.13.8 lme4_0.99875-9 Matrix_0.999375-9 >> lattice_0.17-6 >> loaded via a namespace (and not attached): >> [1] grid_2.7.0 tools_2.7.0 >> > sessionInfo() >> >> >> The read.columns function is a part of the limma package in Bioconductor: >> source("http://bioconductor.org/biocLite.R") >> biocLite("limma") >> >> I would like to use the read.columns function to read a subset of columns >> from several data files. Here is some example columns (out of many) and >> rows of the data: >> >> ID i ID j Ni Nj S A R1 B R2 C >> R3 D R4 8414341.20 8414342.20 1 2 -1 1 >> 0.425183 1 0.758413 1 0.551275 1 0.543045 >> 8414341.20 8414343.20 1 3 -1 1 0.128981 1 0.034859 1 >> -0.001998 1 0.002093 >> >> In this example, there are 13 tab-delimited columns of which I want to use >> only ID i, ID i, R1, R2, R3 and R4. The problem with the data in its >> current form is the unfortunate format of the ID i and ID j columns: I need >> ID i and ID j to be treated as characters although they look like numeric >> (if they are read as numeric the .20 will become a .2). When I have used >> read.table(), I have first read all columns, and by using the argument >> colClasses = c("character", "character",...), I have preserved the format >> of ID i and ID j. In the next step I have selected only the relevant >> columns. >> >> I thought read.columns could be a convenient alternative to select only the >> relevant columns when reading the data, by using e.g. required.col = c("ID >> i", "ID j"), text.to.search = "R". However, in read.columns I cannot >> specify colClasses. As it says in the help text "It uses |required.col| and >> |text.to.search| to set up the |colClasses| argument of |read.table|.". So, >> I wonder anyone could advice me on how to modify the read.columns code to >> be able to specify colClasses, if it is not to complicated. >> >> Thanks in advance! >> >> >> Henrik >> >> >> Herve Pages wrote: >> >>> Hi Henrik, >>> >>> I don't have read.columns() when I start a fresh R session so it looks >>> like it's >>> not part of the default R installation. Which package does it belong to? >>> Providing your sessionInfo() is always a good idea as it would at least >>> give >>> us a clue of where to look for the read.columns() function. Also a small >>> example >>> (with code) of what you are trying to do would be very useful. >>> >>> Thanks! >>> H. >>> >>> >>> Henrik Parn wrote: >>> >>>> Dear all, >>>> >>>> I have received some data sets with some variables that certainly looks >>>> like numeric: they are individual IDs that are composed of some numbers >>>> separated by ".", e.g. 6534231.18, 8783234.20. Not surprisingly they are >>>> treated as numeric by read.columns, and 8783234.20 ends up like 8783234.2 >>>> when read to R. When I used read.table I specified in colClasses that >>>> these variables should be read as |characters. However, in read.columns| >>>> |required.col| and |text.to.search| is used to set up the |colClasses| >>>> argument of |read.table|.| Does anyone have a suggestion of how I can >>>> modify the read.columns function so I can specify the colClasses myself? >>>> >>>> Thanks in advance! | >>>> >> >

ADD REPLY • link 16.8 years ago Gordon Smyth 52k