adding factors to a data frame from a dataframe

0

Entering edit mode

Tom Keller ▴ 70

@tom-keller-4959

Last seen 10.5 years ago

Greetings, I read a table as a dataframe that contains read metadata for DNA sequences. Each row contains the well.id and various parameters like signal2noise, etc.. e.g. > welldfrm[1:3,] well.id signal.noise contiguous.read.length num.high.quality.bases sample.score comment container_name 1 A1 195.983 976 907 53.629 162194 111201a 3 C1 169.206 990 923 53.665 162196 111201a 4 D1 126.441 923 832 44.197 162197 111201a What I don't have and would like to add is the capillary that each well was loaded into. So I created a dataframe with those groupings. I would like to analyze the add the capillary, e.g. cap1, cap2, ... cap16 to each row based on whether the well.id was a member of the wells that capillary draws from. I can't quite figure out how to do that. > capillaries$cap1 [1] A1 A3 A5 A7 A9 A11 Levels: A1 A11 A3 A5 A7 A9 > capillaries$cap5 [1] C1 C3 C5 C7 C9 C11 Levels: C1 C11 C3 C5 C7 C9 So for example, every row with a well.id in the cap1 list would have the factor "cap1": E.G. well.id signal.noise crl num score ... capillary 1 A1 195.983 976 907 53.629 ... cap1 3 C1 169.206 990 923 53.665 ... cap5 I hope that makes sense. I think one of the 'apply' functions is the way to go, or perhaps rearrange capillaries with stack (?) but I'm stumbling with the syntax. (not to mention thinking in terms of complex data structures 8-) thanks for any suggestions Tom MMI DNA Services Core Facility<http: www.ohsu.edu="" xd="" research="" research-cores="" dna-analysis=""/> 503-494-2442 kellert at ohsu.edu<http: ohsu.edu=""> Office: 6588 RJH (CROET/BasicScience) OHSU Shared Resources<http: www.ohsu.edu="" xd="" research="" research-="" cores="" index.cfm="">

GO GO • 1.1k views

ADD COMMENT • link updated 13.0 years ago by Martin Morgan 25k • written 13.0 years ago by Tom Keller ▴ 70

0

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 4 weeks ago

United States

On 02/29/2012 02:29 PM, Tom Keller wrote: > Greetings, > I read a table as a dataframe that contains read metadata for DNA sequences. Each row contains the well.id and various parameters like signal2noise, etc.. > e.g. >> welldfrm[1:3,] > well.id signal.noise contiguous.read.length num.high.quality.bases sample.score comment container_name > 1 A1 195.983 976 907 53.629 162194 111201a > 3 C1 169.206 990 923 53.665 162196 111201a > 4 D1 126.441 923 832 44.197 162197 111201a > > > What I don't have and would like to add is the capillary that each well was loaded into. So I created a dataframe with those groupings. > > I would like to analyze the add the capillary, e.g. cap1, cap2, ... cap16 to each row based on whether the well.id was a member of the wells that capillary draws from. I can't quite figure out how to do that. >> capillaries$cap1 > [1] A1 A3 A5 A7 A9 A11 > Levels: A1 A11 A3 A5 A7 A9 >> capillaries$cap5 > [1] C1 C3 C5 C7 C9 C11 > Levels: C1 C11 C3 C5 C7 C9 > > So for example, every row with a well.id in the cap1 list would have the factor "cap1": > E.G. > well.id signal.noise crl num score ... capillary > 1 A1 195.983 976 907 53.629 ... cap1 > 3 C1 169.206 990 923 53.665 ... cap5 Hi Tom, Not exactly sure that I've got it, but maybe Biobase::reverseSplit can help? > library(Biobase) > capilaries = data.frame(cap1=c("A1", "A3"), cap5=c("C1", "C3")) > map = unlist(reverseSplit(capliaries)) > map A1 A3 C1 C3 "cap1" "cap1" "cap5" "cap5" and then welldfrm$capilary = map[welldfrm$well.id] (if well.id is a factor then perhaps you'll want to coerce it to a character, map[as.character(welldfrm$well.id0]) It might have been more straight-forward to create 'map' directly, along the lines of map = setNames(paste("cap", rep(c(1, 5), each=2), sep=""), paste(rep(c("A", "C"), each=2), c(1, 3), sep="")) Martin > I hope that makes sense. I think one of the 'apply' functions is the way to go, or perhaps rearrange capillaries with stack (?) but I'm stumbling with the syntax. (not to mention thinking in terms of complex data structures 8-) > > thanks for any suggestions > > Tom > MMI DNA Services Core Facility<http: www.ohsu.edu="" xd="" research="" research-cores="" dna-analysis=""/> > 503-494-2442 > kellert at ohsu.edu<http: ohsu.edu=""> > Office: 6588 RJH (CROET/BasicScience) > > OHSU Shared Resources<http: www.ohsu.edu="" xd="" research="" research-="" cores="" index.cfm=""> > > > > > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793

ADD COMMENT • link 13.0 years ago Martin Morgan 25k

Login before adding your answer.