newbie subset question

0

Entering edit mode

Tom Keller ▴ 70

@tom-keller-4959

Last seen 10.6 years ago

Greetings, I have a dataframe: > str(traces) 'data.frame': 2366 obs. of 14 variables: $ sample.name : chr "leechi_CH001_" "leechi_CH002" "leechi_CH003" "leechi_CH004" ... $ well.id : Factor w/ 96 levels "A1","A10","A11",..: 1 13 25 37 49 61 73 85 5 17 ... $ clear.range.length : int 807 188 825 779 853 864 0 776 369 50 ... $ signal.noise : num 195.98 9.22 169.21 126.44 158.65 ... $ contiguous.read.length : int 976 502 990 923 976 979 -1 966 439 621 ... $ clear.range.start : int 15 168 14 27 8 11 0 11 12 268 ... $ clear.range.stop : int 822 356 839 806 861 875 0 787 381 318 ... $ num.low.quality.bases : int 155 286 181 242 144 161 5 192 470 216 ... $ num.high.quality.bases : int 907 343 923 832 918 918 0 897 389 358 ... $ num.medium.quality.bases: int 42 46 30 56 42 19 0 35 14 73 ... $ sample.score : num 53.6 41.9 53.7 44.2 54.8 ... $ comment : Factor w/ 1787 levels "","162194","162195",..: 2 3 4 5 6 7 8 9 10 11 ... $ container_name : Factor w/ 37 levels "111201a","111201arr",..: 1 1 1 1 1 1 1 1 1 1 ... $ file.name : chr "/Users/kellert/Desktop/1112/111201a/leechi_CH001__A01.ab1" "/Users/kellert/Desktop/1112/111201a/leechi_CH002_B01.ab1" "/Users/kellert/Desktop/1112/111201a/leechi_CH003_C01.ab1" "/Users/kellert/Desktop/1112/111201a/leechi_CH004_D01.ab1" ... I would like to compare the $ num.high.quality.bases for all rows where $ well.id is for example a member of c("H1","H3","H5","H7","H9","H11") I thought this would work: cap1 = traces[traces$well.id = c("H1","H3","H5","H7","H9","H11"), ] or cap1 = traces[traces$well.id == match("H1","H3","H5","H7","H9","H11"), ] but both give errors. The data itself looks like: sample.name well.id clear.range.length signal.noise contiguous.read.length clear.range.start clear.range.stop num.low.quality.bases num.high.quality.bases num.medium.quality.bases sample.score comment container_name 1 leechi_CH001_ A1 807 195.983 976 15 822 155 907 42 53.629 162194 111201a 2 leechi_CH002 B1 188 9.220 502 168 356 286 343 46 41.940 162195 111201a 3 leechi_CH003 C1 825 169.206 990 14 839 181 923 30 53.665 162196 111201a 4 leechi_CH004 D1 779 126.441 923 27 806 242 832 56 44.197 162197 111201a 5 leechi_CH005 E1 853 158.646 976 8 861 144 918 42 54.815 162198 111201a 6 leechi_CH006 F1 864 161.874 979 11 875 161 918 19 54.474 162199 111201a 7 leechi_CH007 G1 0 3.916 -1 0 0 5 0 0 0.000 162200 111201a 8 leechi_CH008 H1 776 156.605 966 11 787 192 897 35 53.025 162201 111201a 9 leechi_CH009 A2 369 177.872 439 12 381 470 389 14 52.632 162202 111201a 10 leechi_CH010 B2 50 6.514 621 268 318 216 358 73 33.080 162203 111201a 11 leechi_CH011 C2 853 154.255 998 12 865 177 917 42 53.154 162204 111201a 12 leechi_CH012 D2 773 121.261 933 32 805 232 840 57 43.304 162205 111201a 13 leechi_CH013 E2 850 201.700 923 10 860 176 872 29 55.949 162206 111201a 14 leechi_CH014 F2 863 186.988 980 11 874 162 922 30 53.485 162207 111201a 15 leechi_CH015 G2 0 4.001 -1 0 0 5 0 0 0.000 162208 111201a ........... How do I subset based on a match to specific values of $well.id? thanks, Tom kellert@ohsu.edu<mailto:kellert@ohsu.edu> 503-494-2442 [[alternative HTML version deleted]]

• 1.1k views

ADD COMMENT • link updated 13.2 years ago by Ben Tupper ▴ 60 • written 13.2 years ago by Tom Keller ▴ 70

0

Entering edit mode

Ben Tupper ▴ 60

@ben-tupper-5045

Last seen 10.6 years ago

Hi, You can use %in% cap1 = traces[traces$well.id %in% c("H1","H3","H5","H7","H9","H11"), ] or %in% with subset() cap1 <- subset(traces, traces$well.id %in% c("H1","H3","H5","H7","H9","H11")) Cheers, Ben P.S. The easiest way to share example data is to paste the output of dput(traces) in your email. If it is very large then consider using dput on a small subset of the original data. Others can then cut-and- paste into their own R session - you'll get waaaaay better assistance by doing that than simply dumping your data into the email. dput() a great tool and fits the purpose perfectly! On Feb 27, 2012, at 3:49 PM, Tom Keller wrote: > Greetings, > I have a dataframe: >> str(traces) > 'data.frame': 2366 obs. of 14 variables: > $ sample.name : chr "leechi_CH001_" "leechi_CH002" "leechi_CH003" "leechi_CH004" ... > $ well.id : Factor w/ 96 levels "A1","A10","A11",..: 1 13 25 37 49 61 73 85 5 17 ... > $ clear.range.length : int 807 188 825 779 853 864 0 776 369 50 ... > $ signal.noise : num 195.98 9.22 169.21 126.44 158.65 ... > $ contiguous.read.length : int 976 502 990 923 976 979 -1 966 439 621 ... > $ clear.range.start : int 15 168 14 27 8 11 0 11 12 268 ... > $ clear.range.stop : int 822 356 839 806 861 875 0 787 381 318 ... > $ num.low.quality.bases : int 155 286 181 242 144 161 5 192 470 216 ... > $ num.high.quality.bases : int 907 343 923 832 918 918 0 897 389 358 ... > $ num.medium.quality.bases: int 42 46 30 56 42 19 0 35 14 73 ... > $ sample.score : num 53.6 41.9 53.7 44.2 54.8 ... > $ comment : Factor w/ 1787 levels "","162194","162195",..: 2 3 4 5 6 7 8 9 10 11 ... > $ container_name : Factor w/ 37 levels "111201a","111201arr",..: 1 1 1 1 1 1 1 1 1 1 ... > $ file.name : chr "/Users/kellert/Desktop/1112/111201a/leechi_CH001__A01.ab1" "/Users/kellert/Desktop/1112/111201a/leechi_CH002_B01.ab1" "/Users/kellert/Desktop/1112/111201a/leechi_CH003_C01.ab1" "/Users/kellert/Desktop/1112/111201a/leechi_CH004_D01.ab1" ... > > I would like to compare the $ num.high.quality.bases for all rows where $ well.id is for example a member of > c("H1","H3","H5","H7","H9","H11") > > I thought this would work: > cap1 = traces[traces$well.id = c("H1","H3","H5","H7","H9","H11"), ] > or > cap1 = traces[traces$well.id == match("H1","H3","H5","H7","H9","H11"), ] > but both give errors. > The data itself looks like: > sample.name well.id clear.range.length signal.noise contiguous.read.length clear.range.start clear.range.stop num.low.quality.bases num.high.quality.bases num.medium.quality.bases sample.score comment container_name > 1 leechi_CH001_ A1 807 195.983 976 15 822 155 907 42 53.629 162194 111201a > 2 leechi_CH002 B1 188 9.220 502 168 356 286 343 46 41.940 162195 111201a > 3 leechi_CH003 C1 825 169.206 990 14 839 181 923 30 53.665 162196 111201a > 4 leechi_CH004 D1 779 126.441 923 27 806 242 832 56 44.197 162197 111201a > 5 leechi_CH005 E1 853 158.646 976 8 861 144 918 42 54.815 162198 111201a > 6 leechi_CH006 F1 864 161.874 979 11 875 161 918 19 54.474 162199 111201a > 7 leechi_CH007 G1 0 3.916 -1 0 0 5 0 0 0.000 162200 111201a > 8 leechi_CH008 H1 776 156.605 966 11 787 192 897 35 53.025 162201 111201a > 9 leechi_CH009 A2 369 177.872 439 12 381 470 389 14 52.632 162202 111201a > 10 leechi_CH010 B2 50 6.514 621 268 318 216 358 73 33.080 162203 111201a > 11 leechi_CH011 C2 853 154.255 998 12 865 177 917 42 53.154 162204 111201a > 12 leechi_CH012 D2 773 121.261 933 32 805 232 840 57 43.304 162205 111201a > 13 leechi_CH013 E2 850 201.700 923 10 860 176 872 29 55.949 162206 111201a > 14 leechi_CH014 F2 863 186.988 980 11 874 162 922 30 53.485 162207 111201a > 15 leechi_CH015 G2 0 4.001 -1 0 0 5 0 0 0.000 162208 111201a > ........... > How do I subset based on a match to specific values of $well.id? > thanks, > Tom > kellert at ohsu.edu<mailto:kellert at="" ohsu.edu=""> > 503-494-2442 > > > > > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor Ben Tupper Bigelow Laboratory for Ocean Sciences 180 McKown Point Rd. P.O. Box 475 West Boothbay Harbor, Maine 04575-0475 http://www.bigelow.org

ADD COMMENT • link 13.2 years ago Ben Tupper ▴ 60

Login before adding your answer.