Question

Help with sub setting data frame of DE genes

0

Entering edit mode

Scott Ochsner ▴ 300

@scott-ochsner-599

Last seen 10.3 years ago

Dear list, I have a data frame with three columns. First column is probe set IDs, Second column is associated gene symbol, and, third column is a p-value stat: hgu133a ID Gene Symbol Combined p-value 217757_at A2M 0.787923912 214440_at NAT1 0.240689023 206797_at NAT2 0.497092074 202376_at SERPINA3 3.88E-13 Etc.... I would like to end up with a data frame where each row is a unique Gene Symbol. In the case of multiple gene symbols I want to include the row with the lowest Combined p-value. The above case would transform into: hgu133a ID Gene Symbol Combined p-value 217757_at A2M 0.787923912 214440_at NAT1 0.240689023 202376_at SERPINA3 3.88E-13 Etc.... Could someone point me to a function which would help me in this regard? If this is more of an R mailing list post I apologize and will post there. Thanks, > sessionInfo() R version 2.6.0 (2007-10-03) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] splines tools stats graphics grDevices utils datasets [8] methods base other attached packages: [1] lumi_1.4.0 mgcv_1.3-29 affycoretools_1.10.0 [4] annaffy_1.10.0 KEGG_2.0.0 GO_2.0.0 [7] gcrma_2.10.0 matchprobes_1.10.0 biomaRt_1.12.0 [10] RCurl_0.8-1 GOstats_2.4.0 Category_2.4.0 [13] genefilter_1.16.0 survival_2.32 RBGL_1.14.0 [16] annotate_1.16.0 xtable_1.5-1 GO.db_2.0.0 [19] AnnotationDbi_1.0.4 RSQLite_0.6-3 DBI_0.2-3 [22] graph_1.16.1 affy_1.16.0 preprocessCore_1.0.0 [25] affyio_1.6.0 Biobase_1.16.0 limma_2.12.0 loaded via a namespace (and not attached): [1] cluster_1.11.10 XML_1.93-2.2 Scott A. Ochsner, Ph.D. NURSA Bioinformatics Molecular and Cellular Biology Baylor College of Medicine Houston, TX. 77030 phone: 713-798-6227

GO hgu133a probe GO hgu133a probe • 808 views

ADD COMMENT • link updated 16.7 years ago by Joern Toedling ▴ 730 • written 16.7 years ago by Scott Ochsner ▴ 300

score 0 · Answer 1 · 2008-04-04

Hi Scott, taking the issue aside whether this is the ideal way of combining the multiple probe-sets per gene, I do not think that you would need a special function for this purpose. Basic R functions will suffice. Let A be your data.frame, then # first reorder the rows of your data.frame by p-value A <- A[order(A$"Combined p-value"),] # and remove any rows containing a gene symbol mentioned in a previous row B <- A[!duplicated(A$"Gene Symbol"),] Regards, Joern Ochsner, Scott A wrote: > Dear list, > > I have a data frame with three columns. First column is probe set IDs, Second column is associated gene symbol, and, third column is a p-value stat: > > hgu133a ID Gene Symbol Combined p-value > 217757_at A2M 0.787923912 > 214440_at NAT1 0.240689023 > 206797_at NAT2 0.497092074 > 202376_at SERPINA3 3.88E-13 > Etc.... > > I would like to end up with a data frame where each row is a unique Gene Symbol. In the case of multiple gene symbols I want to include the row with the lowest Combined p-value. The above case would transform into: > > hgu133a ID Gene Symbol Combined p-value > 217757_at A2M 0.787923912 > 214440_at NAT1 0.240689023 > 202376_at SERPINA3 3.88E-13 > Etc.... > > Could someone point me to a function which would help me in this regard? If this is more of an R mailing list post I apologize and will post there. > > Thanks, > > >> sessionInfo() >> > R version 2.6.0 (2007-10-03) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] splines tools stats graphics grDevices utils datasets > [8] methods base > > other attached packages: > [1] lumi_1.4.0 mgcv_1.3-29 affycoretools_1.10.0 > [4] annaffy_1.10.0 KEGG_2.0.0 GO_2.0.0 > [7] gcrma_2.10.0 matchprobes_1.10.0 biomaRt_1.12.0 > [10] RCurl_0.8-1 GOstats_2.4.0 Category_2.4.0 > [13] genefilter_1.16.0 survival_2.32 RBGL_1.14.0 > [16] annotate_1.16.0 xtable_1.5-1 GO.db_2.0.0 > [19] AnnotationDbi_1.0.4 RSQLite_0.6-3 DBI_0.2-3 > [22] graph_1.16.1 affy_1.16.0 preprocessCore_1.0.0 > [25] affyio_1.6.0 Biobase_1.16.0 limma_2.12.0 > > loaded via a namespace (and not attached): > [1] cluster_1.11.10 XML_1.93-2.2 > > Scott A. Ochsner, Ph.D. > NURSA Bioinformatics > Molecular and Cellular Biology > Baylor College of Medicine > Houston, TX. 77030 > phone: 713-798-6227 >