regarding package ArrayExpress

0

Entering edit mode

Amit Kumar ▴ 70

@amit-kumar-3678

Last seen 10.2 years ago

Hello! List, I am trying to build an object from Array Express processed data using bioconductor package ArrayExpress. I did following:- CAGE99d = getAE("E-GAGE-99",type="processed") colname = getcolproc(CAGE99d) CAGE99p = procset(CAGE99d, colname[3]) and I got following error:- Error in `row.names<-.data.frame`(`*tmp*`, value = c(6995L, 7017L, 7006L, : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': R:A-MEXP-58:210099, R:A-MEXP-58:210100, R:A-MEXP-58:210111, R:A-MEXP-58:210123,R:A-MEXP- [... truncated] I am not able to figure out mistake I am making. Please Help! Amit [[alternative HTML version deleted]]

ArrayExpress ArrayExpress • 2.0k views

ADD COMMENT • link updated 15.2 years ago by audrey ▴ 280 • written 15.2 years ago by Amit Kumar ▴ 70

0

Entering edit mode

audrey ▴ 280

@audrey-2551

Last seen 10.2 years ago

Dear Amit, You are not making any mistakes. This is the proper way of calling the functions to create an object from a processed dataset. However the problem comes from the dataset itself. It contains duplicate probe identifiers as row names, which is not allowed by the function read.table that is used in the procset function. Unfortunately I do not have an idea on how to prevent this. Does someone know how I could allow duplicate row names in my function? Best regards, Audrey -- Audrey Kauffmann EMBL - EBI Cambridge UK +44 (0) 1223 492 631 http://www.ebi.ac.uk/~audrey > Hello! List, > > I am trying to build an object from Array Express processed data using > bioconductor package ArrayExpress. I did following:- > > CAGE99d = getAE("E-GAGE-99",type="processed") > colname = getcolproc(CAGE99d) > CAGE99p = procset(CAGE99d, colname[3]) > > and I got following error:- > Error in `row.names<-.data.frame`(`*tmp*`, value = c(6995L, 7017L, 7006L, > : > > duplicate 'row.names' are not allowed > In addition: Warning message: > non-unique values when setting 'row.names': ?R:A-MEXP-58:210099?, > ?R:A-MEXP-58:210100?, ?R:A-MEXP-58:210111?, > ?R:A-MEXP-58:210123?,?R:A-MEXP- > [... truncated] > > I am not able to figure out mistake I am making. Please Help! > Amit > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 15.2 years ago audrey ▴ 280

0

Entering edit mode

Hi, Without tweaking read.table, you'd have to read row names as one of the data columns, then make.names on that set of names and set the row names to the modified ones. So, something like d <- read.table("foo.tab") ## if read.table("foo.tab", row.names=1) fails rownames(d) <- make.names(d[,1], unique=TRUE) d <- d[,-1] ## to remove the column used Whether these newly made "unique" row names are what you need is a good question... :) --Misha On Thu, 10 Sep 2009, audrey at ebi.ac.uk wrote: > Dear Amit, > > You are not making any mistakes. This is the proper way of calling the > functions to create an object from a processed dataset. However the > problem comes from the dataset itself. It contains duplicate probe > identifiers as row names, which is not allowed by the function read.table > that is used in the procset function. > Unfortunately I do not have an idea on how to prevent this. Does someone > know how I could allow duplicate row names in my function? > > Best regards, > Audrey > > -- > Audrey Kauffmann > EMBL - EBI > Cambridge UK > +44 (0) 1223 492 631 > http://www.ebi.ac.uk/~audrey > >> Hello! List, >> >> I am trying to build an object from Array Express processed data using >> bioconductor package ArrayExpress. I did following:- >> >> CAGE99d = getAE("E-GAGE-99",type="processed") >> colname = getcolproc(CAGE99d) >> CAGE99p = procset(CAGE99d, colname[3]) >> >> and I got following error:- >> Error in `row.names<-.data.frame`(`*tmp*`, value = c(6995L, 7017L, 7006L, >> : >> >> duplicate 'row.names' are not allowed >> In addition: Warning message: >> non-unique values when setting 'row.names': ?R:A-MEXP-58:210099?, >> ?R:A-MEXP-58:210100?, ?R:A-MEXP-58:210111?, >> ?R:A-MEXP-58:210123?,?R:A-MEXP- >> [... truncated] >> >> I am not able to figure out mistake I am making. Please Help! >> Amit >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 15.2 years ago Misha Kapushesky ▴ 130

0

Entering edit mode

Hi, This would work assuming the featureData is kept synchronised with the assayData. I guess the alternative would be to take mean or median for the duplicated reporters, which might be more useful in some cases. Perhaps that could be added as an option? I know quite a few custom-printed arrays had duplicated reporter identifiers such as these; it should be less of a problem for the commercial arrays. Cheers, Tim 2009/9/10 Misha Kapushesky <ostolop at="" ebi.ac.uk="">: > Hi, > > Without tweaking read.table, you'd have to read row names as one of the data > columns, then make.names on that set of names and set the row names to the > modified ones. So, something like > > d <- read.table("foo.tab") ## if read.table("foo.tab", row.names=1) fails > > rownames(d) <- make.names(d[,1], unique=TRUE) > > d <- d[,-1] ? ? ? ? ? ? ? ?## to remove the column used > > Whether these newly made "unique" row names are what you need is a good > question... :) > > --Misha > > On Thu, 10 Sep 2009, audrey at ebi.ac.uk wrote: > >> Dear Amit, >> >> You are not making any mistakes. This is the proper way of calling the >> functions to create an object from a processed dataset. However the >> problem comes from the dataset itself. It contains duplicate probe >> identifiers as row names, which is not allowed by the function read.table >> that is used in the procset function. >> Unfortunately I do not have an idea on how to prevent this. Does someone >> know how I could allow duplicate row names in my function? >> >> Best regards, >> Audrey >> >> -- >> Audrey Kauffmann >> EMBL - EBI >> Cambridge UK >> +44 (0) 1223 492 631 >> http://www.ebi.ac.uk/~audrey >> >>> Hello! List, >>> >>> I am trying to build an object from Array Express processed data using >>> bioconductor package ArrayExpress. I did following:- >>> >>> CAGE99d = getAE("E-GAGE-99",type="processed") >>> colname = getcolproc(CAGE99d) >>> CAGE99p = procset(CAGE99d, colname[3]) >>> >>> and I got following error:- >>> Error in `row.names<-.data.frame`(`*tmp*`, value = c(6995L, 7017L, 7006L, >>> : >>> >>> ?duplicate 'row.names' are not allowed >>> In addition: Warning message: >>> non-unique values when setting 'row.names': ?R:A-MEXP-58:210099?, >>> ?R:A-MEXP-58:210100?, ?R:A-MEXP-58:210111?, >>> ?R:A-MEXP-58:210123?,?R:A-MEXP- >>> [... truncated] >>> >>> I am not able to figure out mistake I am making. Please Help! >>> Amit >>> >>> ? ? ? ?[[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 15.2 years ago Tim Rayner ▴ 270

0

Entering edit mode

Tim Rayner wrote: > Hi, > > This would work assuming the featureData is kept synchronised with the > assayData. I guess the alternative would be to take mean or median for > the duplicated reporters, which might be more useful in some cases. > Perhaps that could be added as an option? I know quite a few > custom-printed arrays had duplicated reporter identifiers such as > these; it should be less of a problem for the commercial arrays. This comes up fairly regularly when using ExpressionSets with custom arrays. The rationale for having unique row names (and consequently featureNames) is that non-unique names imply some kind of software 'decision', e.g., that reporters with the same id should be averaged, or that their names should be mangled. There doesn't seem to be a universally right answer, so in my own work I usually put duplicate reporter names into a column of featureData, and leave the rows un-named. I then have to think explicitly about what to do with the duplicates, at each stage of the analysis where this is important. The problem with this for ArrayExpress is that the appropriate column of featureData is an ad-hoc convention ('column X of featureData') rather than enforced by the software. Martin > > Cheers, > > Tim > > > 2009/9/10 Misha Kapushesky <ostolop at="" ebi.ac.uk="">: >> Hi, >> >> Without tweaking read.table, you'd have to read row names as one of the data >> columns, then make.names on that set of names and set the row names to the >> modified ones. So, something like >> >> d <- read.table("foo.tab") ## if read.table("foo.tab", row.names=1) fails >> >> rownames(d) <- make.names(d[,1], unique=TRUE) >> >> d <- d[,-1] ## to remove the column used >> >> Whether these newly made "unique" row names are what you need is a good >> question... :) >> >> --Misha >> >> On Thu, 10 Sep 2009, audrey at ebi.ac.uk wrote: >> >>> Dear Amit, >>> >>> You are not making any mistakes. This is the proper way of calling the >>> functions to create an object from a processed dataset. However the >>> problem comes from the dataset itself. It contains duplicate probe >>> identifiers as row names, which is not allowed by the function read.table >>> that is used in the procset function. >>> Unfortunately I do not have an idea on how to prevent this. Does someone >>> know how I could allow duplicate row names in my function? >>> >>> Best regards, >>> Audrey >>> >>> -- >>> Audrey Kauffmann >>> EMBL - EBI >>> Cambridge UK >>> +44 (0) 1223 492 631 >>> http://www.ebi.ac.uk/~audrey >>> >>>> Hello! List, >>>> >>>> I am trying to build an object from Array Express processed data using >>>> bioconductor package ArrayExpress. I did following:- >>>> >>>> CAGE99d = getAE("E-GAGE-99",type="processed") >>>> colname = getcolproc(CAGE99d) >>>> CAGE99p = procset(CAGE99d, colname[3]) >>>> >>>> and I got following error:- >>>> Error in `row.names<-.data.frame`(`*tmp*`, value = c(6995L, 7017L, 7006L, >>>> : >>>> >>>> duplicate 'row.names' are not allowed >>>> In addition: Warning message: >>>> non-unique values when setting 'row.names': ?R:A-MEXP-58:210099?, >>>> ?R:A-MEXP-58:210100?, ?R:A-MEXP-58:210111?, >>>> ?R:A-MEXP-58:210123?,?R:A-MEXP- >>>> [... truncated] >>>> >>>> I am not able to figure out mistake I am making. Please Help! >>>> Amit >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 15.2 years ago Martin Morgan 25k

0

Entering edit mode

Hi Tim, This seems like a good alternative. I will have a look into this. Thank you, Audrey > Hi, > > This would work assuming the featureData is kept synchronised with the > assayData. I guess the alternative would be to take mean or median for > the duplicated reporters, which might be more useful in some cases. > Perhaps that could be added as an option? I know quite a few > custom-printed arrays had duplicated reporter identifiers such as > these; it should be less of a problem for the commercial arrays. > > Cheers, > > Tim > > > 2009/9/10 Misha Kapushesky <ostolop at="" ebi.ac.uk="">: >> Hi, >> >> Without tweaking read.table, you'd have to read row names as one of the >> data >> columns, then make.names on that set of names and set the row names to >> the >> modified ones. So, something like >> >> d <- read.table("foo.tab") ## if read.table("foo.tab", row.names=1) >> fails >> >> rownames(d) <- make.names(d[,1], unique=TRUE) >> >> d <- d[,-1] ? ? ? ? ? ? ? ?## to remove the column used >> >> Whether these newly made "unique" row names are what you need is a good >> question... :) >> >> --Misha >> >> On Thu, 10 Sep 2009, audrey at ebi.ac.uk wrote: >> >>> Dear Amit, >>> >>> You are not making any mistakes. This is the proper way of calling the >>> functions to create an object from a processed dataset. However the >>> problem comes from the dataset itself. It contains duplicate probe >>> identifiers as row names, which is not allowed by the function >>> read.table >>> that is used in the procset function. >>> Unfortunately I do not have an idea on how to prevent this. Does >>> someone >>> know how I could allow duplicate row names in my function? >>> >>> Best regards, >>> Audrey >>> >>> -- >>> Audrey Kauffmann >>> EMBL - EBI >>> Cambridge UK >>> +44 (0) 1223 492 631 >>> http://www.ebi.ac.uk/~audrey >>> >>>> Hello! List, >>>> >>>> I am trying to build an object from Array Express processed data using >>>> bioconductor package ArrayExpress. I did following:- >>>> >>>> CAGE99d = getAE("E-GAGE-99",type="processed") >>>> colname = getcolproc(CAGE99d) >>>> CAGE99p = procset(CAGE99d, colname[3]) >>>> >>>> and I got following error:- >>>> Error in `row.names<-.data.frame`(`*tmp*`, value = c(6995L, 7017L, >>>> 7006L, >>>> : >>>> >>>> ?duplicate 'row.names' are not allowed >>>> In addition: Warning message: >>>> non-unique values when setting 'row.names': ?R:A-MEXP-58:210099?, >>>> ?R:A-MEXP-58:210100?, ?R:A-MEXP-58:210111?, >>>> ?R:A-MEXP-58:210123?,?R:A-MEXP- >>>> [... truncated] >>>> >>>> I am not able to figure out mistake I am making. Please Help! >>>> Amit >>>> >>>> ? ? ? ?[[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >

ADD REPLY • link 15.2 years ago audrey ▴ 280

0

Entering edit mode

Hi Misha, Thanks for the trick, it seems to work. But this brings another problem: the identifier in the array design file and the row names in the expression file are now different. So maybe the right question is why do we have duplicate identifiers in the expression file and do we really want to read a file with those duplicates? Sorry, maybe we should continue this conversation off list. To the people using the package ArrayExpress on the processed data, I am sorry for the problems that are still to be fixed. Thank you for your feedback, it helps to identify the problems and I can try to fix them. Audrey > Hi, > > Without tweaking read.table, you'd have to read row names as one of the > data columns, then make.names on that set of names and set the row names > to the modified ones. So, something like > > d <- read.table("foo.tab") ## if read.table("foo.tab", row.names=1) fails > > rownames(d) <- make.names(d[,1], unique=TRUE) > > d <- d[,-1] ## to remove the column used > > Whether these newly made "unique" row names are what you need is a good > question... :) > > --Misha > > On Thu, 10 Sep 2009, audrey at ebi.ac.uk wrote: > >> Dear Amit, >> >> You are not making any mistakes. This is the proper way of calling the >> functions to create an object from a processed dataset. However the >> problem comes from the dataset itself. It contains duplicate probe >> identifiers as row names, which is not allowed by the function >> read.table >> that is used in the procset function. >> Unfortunately I do not have an idea on how to prevent this. Does someone >> know how I could allow duplicate row names in my function? >> >> Best regards, >> Audrey >> >> -- >> Audrey Kauffmann >> EMBL - EBI >> Cambridge UK >> +44 (0) 1223 492 631 >> http://www.ebi.ac.uk/~audrey >> >>> Hello! List, >>> >>> I am trying to build an object from Array Express processed data using >>> bioconductor package ArrayExpress. I did following:- >>> >>> CAGE99d = getAE("E-GAGE-99",type="processed") >>> colname = getcolproc(CAGE99d) >>> CAGE99p = procset(CAGE99d, colname[3]) >>> >>> and I got following error:- >>> Error in `row.names<-.data.frame`(`*tmp*`, value = c(6995L, 7017L, >>> 7006L, >>> : >>> >>> duplicate 'row.names' are not allowed >>> In addition: Warning message: >>> non-unique values when setting 'row.names': ?R:A-MEXP-58:210099?, >>> ?R:A-MEXP-58:210100?, ?R:A-MEXP-58:210111?, >>> ?R:A-MEXP-58:210123?,?R:A-MEXP- >>> [... truncated] >>> >>> I am not able to figure out mistake I am making. Please Help! >>> Amit >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >

ADD REPLY • link 15.2 years ago audrey ▴ 280

Login before adding your answer.