Help with sub setting data frame of DE genes
1
0
Entering edit mode
Scott Ochsner ▴ 300
@scott-ochsner-599
Last seen 10.3 years ago
Dear list, I have a data frame with three columns. First column is probe set IDs, Second column is associated gene symbol, and, third column is a p-value stat: hgu133a ID Gene Symbol Combined p-value 217757_at A2M 0.787923912 214440_at NAT1 0.240689023 206797_at NAT2 0.497092074 202376_at SERPINA3 3.88E-13 Etc.... I would like to end up with a data frame where each row is a unique Gene Symbol. In the case of multiple gene symbols I want to include the row with the lowest Combined p-value. The above case would transform into: hgu133a ID Gene Symbol Combined p-value 217757_at A2M 0.787923912 214440_at NAT1 0.240689023 202376_at SERPINA3 3.88E-13 Etc.... Could someone point me to a function which would help me in this regard? If this is more of an R mailing list post I apologize and will post there. Thanks, > sessionInfo() R version 2.6.0 (2007-10-03) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] splines tools stats graphics grDevices utils datasets [8] methods base other attached packages: [1] lumi_1.4.0 mgcv_1.3-29 affycoretools_1.10.0 [4] annaffy_1.10.0 KEGG_2.0.0 GO_2.0.0 [7] gcrma_2.10.0 matchprobes_1.10.0 biomaRt_1.12.0 [10] RCurl_0.8-1 GOstats_2.4.0 Category_2.4.0 [13] genefilter_1.16.0 survival_2.32 RBGL_1.14.0 [16] annotate_1.16.0 xtable_1.5-1 GO.db_2.0.0 [19] AnnotationDbi_1.0.4 RSQLite_0.6-3 DBI_0.2-3 [22] graph_1.16.1 affy_1.16.0 preprocessCore_1.0.0 [25] affyio_1.6.0 Biobase_1.16.0 limma_2.12.0 loaded via a namespace (and not attached): [1] cluster_1.11.10 XML_1.93-2.2 Scott A. Ochsner, Ph.D. NURSA Bioinformatics Molecular and Cellular Biology Baylor College of Medicine Houston, TX. 77030 phone: 713-798-6227
GO hgu133a probe GO hgu133a probe • 808 views
ADD COMMENT
0
Entering edit mode
@joern-toedling-1244
Last seen 10.3 years ago
Hi Scott, taking the issue aside whether this is the ideal way of combining the multiple probe-sets per gene, I do not think that you would need a special function for this purpose. Basic R functions will suffice. Let A be your data.frame, then # first reorder the rows of your data.frame by p-value A <- A[order(A$"Combined p-value"),] # and remove any rows containing a gene symbol mentioned in a previous row B <- A[!duplicated(A$"Gene Symbol"),] Regards, Joern Ochsner, Scott A wrote: > Dear list, > > I have a data frame with three columns. First column is probe set IDs, Second column is associated gene symbol, and, third column is a p-value stat: > > hgu133a ID Gene Symbol Combined p-value > 217757_at A2M 0.787923912 > 214440_at NAT1 0.240689023 > 206797_at NAT2 0.497092074 > 202376_at SERPINA3 3.88E-13 > Etc.... > > I would like to end up with a data frame where each row is a unique Gene Symbol. In the case of multiple gene symbols I want to include the row with the lowest Combined p-value. The above case would transform into: > > hgu133a ID Gene Symbol Combined p-value > 217757_at A2M 0.787923912 > 214440_at NAT1 0.240689023 > 202376_at SERPINA3 3.88E-13 > Etc.... > > Could someone point me to a function which would help me in this regard? If this is more of an R mailing list post I apologize and will post there. > > Thanks, > > >> sessionInfo() >> > R version 2.6.0 (2007-10-03) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] splines tools stats graphics grDevices utils datasets > [8] methods base > > other attached packages: > [1] lumi_1.4.0 mgcv_1.3-29 affycoretools_1.10.0 > [4] annaffy_1.10.0 KEGG_2.0.0 GO_2.0.0 > [7] gcrma_2.10.0 matchprobes_1.10.0 biomaRt_1.12.0 > [10] RCurl_0.8-1 GOstats_2.4.0 Category_2.4.0 > [13] genefilter_1.16.0 survival_2.32 RBGL_1.14.0 > [16] annotate_1.16.0 xtable_1.5-1 GO.db_2.0.0 > [19] AnnotationDbi_1.0.4 RSQLite_0.6-3 DBI_0.2-3 > [22] graph_1.16.1 affy_1.16.0 preprocessCore_1.0.0 > [25] affyio_1.6.0 Biobase_1.16.0 limma_2.12.0 > > loaded via a namespace (and not attached): > [1] cluster_1.11.10 XML_1.93-2.2 > > Scott A. Ochsner, Ph.D. > NURSA Bioinformatics > Molecular and Cellular Biology > Baylor College of Medicine > Houston, TX. 77030 > phone: 713-798-6227 >
ADD COMMENT

Login before adding your answer.

Traffic: 722 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6