newbie subset question
1
0
Entering edit mode
Tom Keller ▴ 70
@tom-keller-4959
Last seen 10.2 years ago
Greetings, I have a dataframe: > str(traces) 'data.frame': 2366 obs. of 14 variables: $ sample.name : chr "leechi_CH001_" "leechi_CH002" "leechi_CH003" "leechi_CH004" ... $ well.id : Factor w/ 96 levels "A1","A10","A11",..: 1 13 25 37 49 61 73 85 5 17 ... $ clear.range.length : int 807 188 825 779 853 864 0 776 369 50 ... $ signal.noise : num 195.98 9.22 169.21 126.44 158.65 ... $ contiguous.read.length : int 976 502 990 923 976 979 -1 966 439 621 ... $ clear.range.start : int 15 168 14 27 8 11 0 11 12 268 ... $ clear.range.stop : int 822 356 839 806 861 875 0 787 381 318 ... $ num.low.quality.bases : int 155 286 181 242 144 161 5 192 470 216 ... $ num.high.quality.bases : int 907 343 923 832 918 918 0 897 389 358 ... $ num.medium.quality.bases: int 42 46 30 56 42 19 0 35 14 73 ... $ sample.score : num 53.6 41.9 53.7 44.2 54.8 ... $ comment : Factor w/ 1787 levels "","162194","162195",..: 2 3 4 5 6 7 8 9 10 11 ... $ container_name : Factor w/ 37 levels "111201a","111201arr",..: 1 1 1 1 1 1 1 1 1 1 ... $ file.name : chr "/Users/kellert/Desktop/1112/111201a/leechi_CH001__A01.ab1" "/Users/kellert/Desktop/1112/111201a/leechi_CH002_B01.ab1" "/Users/kellert/Desktop/1112/111201a/leechi_CH003_C01.ab1" "/Users/kellert/Desktop/1112/111201a/leechi_CH004_D01.ab1" ... I would like to compare the $ num.high.quality.bases for all rows where $ well.id is for example a member of c("H1","H3","H5","H7","H9","H11") I thought this would work: cap1 = traces[traces$well.id = c("H1","H3","H5","H7","H9","H11"), ] or cap1 = traces[traces$well.id == match("H1","H3","H5","H7","H9","H11"), ] but both give errors. The data itself looks like: sample.name well.id clear.range.length signal.noise contiguous.read.length clear.range.start clear.range.stop num.low.quality.bases num.high.quality.bases num.medium.quality.bases sample.score comment container_name 1 leechi_CH001_ A1 807 195.983 976 15 822 155 907 42 53.629 162194 111201a 2 leechi_CH002 B1 188 9.220 502 168 356 286 343 46 41.940 162195 111201a 3 leechi_CH003 C1 825 169.206 990 14 839 181 923 30 53.665 162196 111201a 4 leechi_CH004 D1 779 126.441 923 27 806 242 832 56 44.197 162197 111201a 5 leechi_CH005 E1 853 158.646 976 8 861 144 918 42 54.815 162198 111201a 6 leechi_CH006 F1 864 161.874 979 11 875 161 918 19 54.474 162199 111201a 7 leechi_CH007 G1 0 3.916 -1 0 0 5 0 0 0.000 162200 111201a 8 leechi_CH008 H1 776 156.605 966 11 787 192 897 35 53.025 162201 111201a 9 leechi_CH009 A2 369 177.872 439 12 381 470 389 14 52.632 162202 111201a 10 leechi_CH010 B2 50 6.514 621 268 318 216 358 73 33.080 162203 111201a 11 leechi_CH011 C2 853 154.255 998 12 865 177 917 42 53.154 162204 111201a 12 leechi_CH012 D2 773 121.261 933 32 805 232 840 57 43.304 162205 111201a 13 leechi_CH013 E2 850 201.700 923 10 860 176 872 29 55.949 162206 111201a 14 leechi_CH014 F2 863 186.988 980 11 874 162 922 30 53.485 162207 111201a 15 leechi_CH015 G2 0 4.001 -1 0 0 5 0 0 0.000 162208 111201a ........... How do I subset based on a match to specific values of $well.id? thanks, Tom kellert@ohsu.edu<mailto:kellert@ohsu.edu> 503-494-2442 [[alternative HTML version deleted]]
• 1.1k views
ADD COMMENT
0
Entering edit mode
Ben Tupper ▴ 60
@ben-tupper-5045
Last seen 10.2 years ago
Hi, You can use %in% cap1 = traces[traces$well.id %in% c("H1","H3","H5","H7","H9","H11"), ] or %in% with subset() cap1 <- subset(traces, traces$well.id %in% c("H1","H3","H5","H7","H9","H11")) Cheers, Ben P.S. The easiest way to share example data is to paste the output of dput(traces) in your email. If it is very large then consider using dput on a small subset of the original data. Others can then cut-and- paste into their own R session - you'll get waaaaay better assistance by doing that than simply dumping your data into the email. dput() a great tool and fits the purpose perfectly! On Feb 27, 2012, at 3:49 PM, Tom Keller wrote: > Greetings, > I have a dataframe: >> str(traces) > 'data.frame': 2366 obs. of 14 variables: > $ sample.name : chr "leechi_CH001_" "leechi_CH002" "leechi_CH003" "leechi_CH004" ... > $ well.id : Factor w/ 96 levels "A1","A10","A11",..: 1 13 25 37 49 61 73 85 5 17 ... > $ clear.range.length : int 807 188 825 779 853 864 0 776 369 50 ... > $ signal.noise : num 195.98 9.22 169.21 126.44 158.65 ... > $ contiguous.read.length : int 976 502 990 923 976 979 -1 966 439 621 ... > $ clear.range.start : int 15 168 14 27 8 11 0 11 12 268 ... > $ clear.range.stop : int 822 356 839 806 861 875 0 787 381 318 ... > $ num.low.quality.bases : int 155 286 181 242 144 161 5 192 470 216 ... > $ num.high.quality.bases : int 907 343 923 832 918 918 0 897 389 358 ... > $ num.medium.quality.bases: int 42 46 30 56 42 19 0 35 14 73 ... > $ sample.score : num 53.6 41.9 53.7 44.2 54.8 ... > $ comment : Factor w/ 1787 levels "","162194","162195",..: 2 3 4 5 6 7 8 9 10 11 ... > $ container_name : Factor w/ 37 levels "111201a","111201arr",..: 1 1 1 1 1 1 1 1 1 1 ... > $ file.name : chr "/Users/kellert/Desktop/1112/111201a/leechi_CH001__A01.ab1" "/Users/kellert/Desktop/1112/111201a/leechi_CH002_B01.ab1" "/Users/kellert/Desktop/1112/111201a/leechi_CH003_C01.ab1" "/Users/kellert/Desktop/1112/111201a/leechi_CH004_D01.ab1" ... > > I would like to compare the $ num.high.quality.bases for all rows where $ well.id is for example a member of > c("H1","H3","H5","H7","H9","H11") > > I thought this would work: > cap1 = traces[traces$well.id = c("H1","H3","H5","H7","H9","H11"), ] > or > cap1 = traces[traces$well.id == match("H1","H3","H5","H7","H9","H11"), ] > but both give errors. > The data itself looks like: > sample.name well.id clear.range.length signal.noise contiguous.read.length clear.range.start clear.range.stop num.low.quality.bases num.high.quality.bases num.medium.quality.bases sample.score comment container_name > 1 leechi_CH001_ A1 807 195.983 976 15 822 155 907 42 53.629 162194 111201a > 2 leechi_CH002 B1 188 9.220 502 168 356 286 343 46 41.940 162195 111201a > 3 leechi_CH003 C1 825 169.206 990 14 839 181 923 30 53.665 162196 111201a > 4 leechi_CH004 D1 779 126.441 923 27 806 242 832 56 44.197 162197 111201a > 5 leechi_CH005 E1 853 158.646 976 8 861 144 918 42 54.815 162198 111201a > 6 leechi_CH006 F1 864 161.874 979 11 875 161 918 19 54.474 162199 111201a > 7 leechi_CH007 G1 0 3.916 -1 0 0 5 0 0 0.000 162200 111201a > 8 leechi_CH008 H1 776 156.605 966 11 787 192 897 35 53.025 162201 111201a > 9 leechi_CH009 A2 369 177.872 439 12 381 470 389 14 52.632 162202 111201a > 10 leechi_CH010 B2 50 6.514 621 268 318 216 358 73 33.080 162203 111201a > 11 leechi_CH011 C2 853 154.255 998 12 865 177 917 42 53.154 162204 111201a > 12 leechi_CH012 D2 773 121.261 933 32 805 232 840 57 43.304 162205 111201a > 13 leechi_CH013 E2 850 201.700 923 10 860 176 872 29 55.949 162206 111201a > 14 leechi_CH014 F2 863 186.988 980 11 874 162 922 30 53.485 162207 111201a > 15 leechi_CH015 G2 0 4.001 -1 0 0 5 0 0 0.000 162208 111201a > ........... > How do I subset based on a match to specific values of $well.id? > thanks, > Tom > kellert at ohsu.edu<mailto:kellert at="" ohsu.edu=""> > 503-494-2442 > > > > > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor Ben Tupper Bigelow Laboratory for Ocean Sciences 180 McKown Point Rd. P.O. Box 475 West Boothbay Harbor, Maine 04575-0475 http://www.bigelow.org
ADD COMMENT

Login before adding your answer.

Traffic: 585 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6