Average based on group

0

Entering edit mode

Fabrice Tourre ▴ 970

@fabrice-tourre-4394

Last seen 10.6 years ago

Dear list, I have dataframe, the second column is groups factor, each group has 10 items. The data as fellow. chr10 rs9971029 71916552 0.1 chr10 rs9971029 71916553 0.4 chr10 rs9971029 71916554 0.3 chr10 rs9971029 71916555 0.9 chr10 rs9971029 71916556 1 chr10 rs9971029 71916557 2 chr10 rs9971029 71916558 4 chr10 rs9971029 71916559 0.8 chr10 rs9971029 71916560 0.9 chr10 rs9971029 71916561 0.8 chr10 rs9971030 71916726 0.6 chr10 rs9971030 71916727 0.5 chr10 rs9971030 71916728 0.4 chr10 rs9971030 71916729 0.7 chr10 rs9971030 71916730 0 chr10 rs9971030 71916731 0 chr10 rs9971030 71916732 0.6 chr10 rs9971030 71916733 0.8 chr10 rs9971030 71916734 0.9 chr10 rs9971030 71916735 1 I want to get a average of each item based on the group factor. So at last I want to get a vector which length is 10. The value calculated as this: (0.1+0.6)/2 (0.4+0.5)/2 ? (0.8+1)/2 Thank you very much in advance.

• 1.9k views

ADD COMMENT • link updated 13.9 years ago by Moshe Olshansky ▴ 260 • written 13.9 years ago by Fabrice Tourre ▴ 970

0

Entering edit mode

Achilleas Pitsillides ▴ 170

@achilleas-pitsillides-4316

Last seen 10.6 years ago

Hi, Assuming I understood the question correctly you can use the "by" function from the base package i.e: by(MyData[,4],MyData[,2],mean) where the factor is in the second column and the numerical data is in the fourth column. cheers, Achilleas On Thu, May 12, 2011 at 11:20 AM, Fabrice Tourre <fabrice.ciup@gmail.com>wrote: > Dear list, > I have dataframe, the second column is groups factor, each group has > 10 items. The data as fellow. > chr10 rs9971029 71916552 0.1 > chr10 rs9971029 71916553 0.4 > chr10 rs9971029 71916554 0.3 > chr10 rs9971029 71916555 0.9 > chr10 rs9971029 71916556 1 > chr10 rs9971029 71916557 2 > chr10 rs9971029 71916558 4 > chr10 rs9971029 71916559 0.8 > chr10 rs9971029 71916560 0.9 > chr10 rs9971029 71916561 0.8 > chr10 rs9971030 71916726 0.6 > chr10 rs9971030 71916727 0.5 > chr10 rs9971030 71916728 0.4 > chr10 rs9971030 71916729 0.7 > chr10 rs9971030 71916730 0 > chr10 rs9971030 71916731 0 > chr10 rs9971030 71916732 0.6 > chr10 rs9971030 71916733 0.8 > chr10 rs9971030 71916734 0.9 > chr10 rs9971030 71916735 1 > > I want to get a average of each item based on the group factor. So at > last I want to get a vector which length is 10. > The value calculated as this: > > (0.1+0.6)/2 > (0.4+0.5)/2 > > (0.8+1)/2 > > Thank you very much in advance. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 13.9 years ago Achilleas Pitsillides ▴ 170

0

Entering edit mode

I think this is not what I want. aggregate(MyData[,4],MyData[,2],mean) tapply(MyData[,4],MyData[,2],mean) cannot for my purpose. The return value is a vector of 10. On Thu, May 12, 2011 at 5:51 PM, Achilleas Pitsillides <anp4r at="" virginia.edu=""> wrote: > Hi, > Assuming I understood the question correctly ?you can use the "by" function > from the base package i.e: > by(MyData[,4],MyData[,2],mean) > where the factor is in the second column and the numerical data is in the > fourth column. > cheers, > Achilleas > > On Thu, May 12, 2011 at 11:20 AM, Fabrice Tourre <fabrice.ciup at="" gmail.com="">wrote: > >> Dear list, >> I have dataframe, the second column is groups factor, each group has >> 10 items. The data as fellow. >> chr10 ? rs9971029 ? 71916552 ? ?0.1 >> chr10 ? rs9971029 ? 71916553 ? ?0.4 >> chr10 ? rs9971029 ? 71916554 ? ?0.3 >> chr10 ? rs9971029 ? 71916555 ? ?0.9 >> chr10 ? rs9971029 ? 71916556 ? ?1 >> chr10 ? rs9971029 ? 71916557 ? ?2 >> chr10 ? rs9971029 ? 71916558 ? ?4 >> chr10 ? rs9971029 ? 71916559 ? ?0.8 >> chr10 ? rs9971029 ? 71916560 ? ?0.9 >> chr10 ? rs9971029 ? 71916561 ? ?0.8 >> chr10 ? rs9971030 ? 71916726 ? ?0.6 >> chr10 ? rs9971030 ? 71916727 ? ?0.5 >> chr10 ? rs9971030 ? 71916728 ? ?0.4 >> chr10 ? rs9971030 ? 71916729 ? ?0.7 >> chr10 ? rs9971030 ? 71916730 ? ?0 >> chr10 ? rs9971030 ? 71916731 ? ?0 >> chr10 ? rs9971030 ? 71916732 ? ?0.6 >> chr10 ? rs9971030 ? 71916733 ? ?0.8 >> chr10 ? rs9971030 ? 71916734 ? ?0.9 >> chr10 ? rs9971030 ? 71916735 ? ?1 >> >> I want to get a average of each item based on the group factor. So at >> last I want to get a vector which length is 10. >> The value calculated as this: >> >> (0.1+0.6)/2 >> (0.4+0.5)/2 >> ? >> (0.8+1)/2 >> >> Thank you very much in advance. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > ? ? ? ?[[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 13.9 years ago Fabrice Tourre ▴ 970

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 2.1 years ago

United States

Hi, On Thu, May 12, 2011 at 11:20 AM, Fabrice Tourre <fabrice.ciup at="" gmail.com=""> wrote: > Dear list, > I have dataframe, the second column is groups factor, each group has > 10 items. The data as fellow. > chr10 ? rs9971029 ? 71916552 ? ?0.1 > chr10 ? rs9971029 ? 71916553 ? ?0.4 > chr10 ? rs9971029 ? 71916554 ? ?0.3 > chr10 ? rs9971029 ? 71916555 ? ?0.9 > chr10 ? rs9971029 ? 71916556 ? ?1 > chr10 ? rs9971029 ? 71916557 ? ?2 > chr10 ? rs9971029 ? 71916558 ? ?4 > chr10 ? rs9971029 ? 71916559 ? ?0.8 > chr10 ? rs9971029 ? 71916560 ? ?0.9 > chr10 ? rs9971029 ? 71916561 ? ?0.8 > chr10 ? rs9971030 ? 71916726 ? ?0.6 > chr10 ? rs9971030 ? 71916727 ? ?0.5 > chr10 ? rs9971030 ? 71916728 ? ?0.4 > chr10 ? rs9971030 ? 71916729 ? ?0.7 > chr10 ? rs9971030 ? 71916730 ? ?0 > chr10 ? rs9971030 ? 71916731 ? ?0 > chr10 ? rs9971030 ? 71916732 ? ?0.6 > chr10 ? rs9971030 ? 71916733 ? ?0.8 > chr10 ? rs9971030 ? 71916734 ? ?0.9 > chr10 ? rs9971030 ? 71916735 ? ?1 > > I want to get a average of each item based on the group factor. So at > last I want to get a vector which length is 10. > The value calculated as this: > > (0.1+0.6)/2 > (0.4+0.5)/2 > ? > (0.8+1)/2 > > Thank you very much in advance. In addition to the great plyr package, if your data.frame is at all large you could also look into using the data.table package -- it's generally much faster[*]. I don't see how your data.frame corresponds to what you say, though -- ie. you mention that the second column is the group factor and that you expect an answer of lenght 10, but I only see 1 snp_id in your 2nd column ... Anyway. Assuming your data.frame was named `df` and had columns like: seqnames, snp.id, position, score. Do get the average score over all snps using data.table, you do: R> library(data.table) R> dt <- data.table(df, key='snp.id') R> avg <- dt[, list(avg=mean(score), by=snp.id] (instead of mean(score), you might want to do .Internal(mean(score)) since apparently doing it the "normal" way is somehow slow) HTH, -steve [*] A disclaimer is that I help develop the data.table package .. I'm not trying to proselytize for it over plyr, as I like and use both. It's just that for (really) large data.frame like objects, you'll notice the speed differences between the two are quite dramatic. -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD COMMENT • link 13.9 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Sorry -- my mental R parser was broken, I missed a closing parenthesis. Instead of: R> library(data.table) R> dt <- data.table(df, key='snp.id') R> avg <- dt[, list(avg=mean(score), by=snp.id] That last line should be: R> avg <- dt[, list(avg=mean(score)), by=snp.id] or R> avg <- dt[, list(avg=.Internal(mean(score))), by=snp.id] -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 13.9 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Thanks for your reply. But it cannot be for my purpose. In fact, there are two snps in the example, rs9971029 and rs9971030. I expect fellow output with the fellow data: 0.35 0.45 0.35 0.80 0.50 1.00 2.30 0.80 0.90 0.90 You can run this example to get above value -----------------------------R code------------------------------------ df<-structure(list(seqnames = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "chr10", class = "factor"), snp.id = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("rs9971029", "rs9971030"), class = "factor"), position = c(71916552L, 71916553L, 71916554L, 71916555L, 71916556L, 71916557L, 71916558L, 71916559L, 71916560L, 71916561L, 71916726L, 71916727L, 71916728L, 71916729L, 71916730L, 71916731L, 71916732L, 71916733L, 71916734L, 71916735L), score = c(0.1, 0.4, 0.3, 0.9, 1, 2, 4, 0.8, 0.9, 0.8, 0.6, 0.5, 0.4, 0.7, 0, 0, 0.6, 0.8, 0.9, 1)), .Names = c("seqnames", "snp.id", "position", "score"), class = "data.frame", row.names = c(NA, -20L)) a<-df[1:10,] b<-df[11:20,] cbind(a,b)->c (c[,4]+c[,8])/2 ---------------------------------------------------------------- The data is : chr10 rs9971029 71916552 0.1 chr10 rs9971029 71916553 0.4 chr10 rs9971029 71916554 0.3 chr10 rs9971029 71916555 0.9 chr10 rs9971029 71916556 1 chr10 rs9971029 71916557 2 chr10 rs9971029 71916558 4 chr10 rs9971029 71916559 0.8 chr10 rs9971029 71916560 0.9 chr10 rs9971029 71916561 0.8 chr10 rs9971030 71916726 0.6 chr10 rs9971030 71916727 0.5 chr10 rs9971030 71916728 0.4 chr10 rs9971030 71916729 0.7 chr10 rs9971030 71916730 0 chr10 rs9971030 71916731 0 chr10 rs9971030 71916732 0.6 chr10 rs9971030 71916733 0.8 chr10 rs9971030 71916734 0.9 chr10 rs9971030 71916735 1

ADD REPLY • link 13.9 years ago Fabrice Tourre ▴ 970

0

Entering edit mode

Hi, On Thu, May 12, 2011 at 12:38 PM, Fabrice Tourre <fabrice.ciup at="" gmail.com=""> wrote: > Thanks for your reply. But it cannot be for my purpose. In fact, there > are two snps in the example, rs9971029 and rs9971030. Yes, I see that now, sorry about that. > I expect fellow output with the fellow data: > > 0.35 0.45 0.35 0.80 0.50 1.00 2.30 0.80 0.90 0.90 > > You can run this example to get above value > > -----------------------------R code------------------------------------ > df<-structure(list(seqnames = structure(c(1L, 1L, 1L, 1L, 1L, 1L, > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = > "chr10", class = "factor"), > ? ?snp.id = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, > ? ?1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("rs9971029", > ? ?"rs9971030"), class = "factor"), position = c(71916552L, > ? ?71916553L, 71916554L, 71916555L, 71916556L, 71916557L, 71916558L, > ? ?71916559L, 71916560L, 71916561L, 71916726L, 71916727L, 71916728L, > ? ?71916729L, 71916730L, 71916731L, 71916732L, 71916733L, 71916734L, > ? ?71916735L), score = c(0.1, 0.4, 0.3, 0.9, 1, 2, 4, 0.8, 0.9, > ? ?0.8, 0.6, 0.5, 0.4, 0.7, 0, 0, 0.6, 0.8, 0.9, 1)), .Names = c("seqnames", > "snp.id", "position", "score"), class = "data.frame", row.names = c(NA, > -20L)) > > a<-df[1:10,] > b<-df[11:20,] > cbind(a,b)->c > (c[,4]+c[,8])/2 So, given your original data. Without doing the your cbind(a,b) trick, how would one know that the first row for rs9971029 should be matched (averaged) with the first row of the rs9971030 info? Should we assume that the data is already in order and you have the same number of "scores" for each SNP, and that you want to take the avg of the first elements, second elements, etc. For instance, would this do? R> score.matrix <- do.call(cbind, split(df$score, df$snp.id)) R> rowMeans(score.matrix) [1] 0.35 0.45 0.35 0.80 0.50 1.00 2.30 0.80 0.90 0.90 -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 13.9 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Ok I get it now, If your data is as shown i.e. sorted, then can you just create a dummy variable: rep(1:10,n) where n is the number of groups and then use by or tapply? So in your example: by(df[,4],rep(1:10,2),mean) cheers, Achilleas On Thu, May 12, 2011 at 12:38 PM, Fabrice Tourre <fabrice.ciup@gmail.com>wrote: > Thanks for your reply. But it cannot be for my purpose. In fact, there > are two snps in the example, rs9971029 and rs9971030. > > I expect fellow output with the fellow data: > > 0.35 0.45 0.35 0.80 0.50 1.00 2.30 0.80 0.90 0.90 > > You can run this example to get above value > > -----------------------------R code------------------------------------ > df<-structure(list(seqnames = structure(c(1L, 1L, 1L, 1L, 1L, 1L, > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = > "chr10", class = "factor"), > snp.id = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, > 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("rs9971029", > "rs9971030"), class = "factor"), position = c(71916552L, > 71916553L, 71916554L, 71916555L, 71916556L, 71916557L, 71916558L, > 71916559L, 71916560L, 71916561L, 71916726L, 71916727L, 71916728L, > 71916729L, 71916730L, 71916731L, 71916732L, 71916733L, 71916734L, > 71916735L), score = c(0.1, 0.4, 0.3, 0.9, 1, 2, 4, 0.8, 0.9, > 0.8, 0.6, 0.5, 0.4, 0.7, 0, 0, 0.6, 0.8, 0.9, 1)), .Names = > c("seqnames", > "snp.id", "position", "score"), class = "data.frame", row.names = c(NA, > -20L)) > > a<-df[1:10,] > b<-df[11:20,] > cbind(a,b)->c > (c[,4]+c[,8])/2 > ---------------------------------------------------------------- > > The data is : > > chr10 rs9971029 71916552 0.1 > chr10 rs9971029 71916553 0.4 > chr10 rs9971029 71916554 0.3 > chr10 rs9971029 71916555 0.9 > chr10 rs9971029 71916556 1 > chr10 rs9971029 71916557 2 > chr10 rs9971029 71916558 4 > chr10 rs9971029 71916559 0.8 > chr10 rs9971029 71916560 0.9 > chr10 rs9971029 71916561 0.8 > chr10 rs9971030 71916726 0.6 > chr10 rs9971030 71916727 0.5 > chr10 rs9971030 71916728 0.4 > chr10 rs9971030 71916729 0.7 > chr10 rs9971030 71916730 0 > chr10 rs9971030 71916731 0 > chr10 rs9971030 71916732 0.6 > chr10 rs9971030 71916733 0.8 > chr10 rs9971030 71916734 0.9 > chr10 rs9971030 71916735 1 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 13.9 years ago Achilleas Pitsillides ▴ 170

0

Entering edit mode

It would probably be better to construct a meaningful factor that reflects the correct interpretation. (I tend to dislike code that assumes that the order of things is always preserved and no rows got accidentally omitted....) I assume you really want to relate things based on their offset from the actual SNP position. So you might want to compute "min" based on the SNP id grouping factor and compute "offset" relative to that minimum position. You could then use the offset as the new grouping factor for the averages you want. Here is (completely untested and written on the fly) pseudo-code to do this: startpos <- tapply(df$position, df$snp.id, min) offset <- df$position - startPos[df$snp.id] myavg <- tapply(df$score, offset, mean) Kevin > Ok I get it now, > If your data is as shown i.e. sorted, then can you just create a dummy > variable: > rep(1:10,n) where n is the number of groups and then use by or tapply? > So in your example: > by(df[,4],rep(1:10,2),mean) > > cheers, > Achilleas > > On Thu, May 12, 2011 at 12:38 PM, Fabrice Tourre<fabrice.ciup at="" gmail.com="">wrote: > >> Thanks for your reply. But it cannot be for my purpose. In fact, there >> are two snps in the example, rs9971029 and rs9971030. >> >> I expect fellow output with the fellow data: >> >> 0.35 0.45 0.35 0.80 0.50 1.00 2.30 0.80 0.90 0.90 >> >> You can run this example to get above value >> >> -----------------------------R code------------------------------------ >> df<-structure(list(seqnames = structure(c(1L, 1L, 1L, 1L, 1L, 1L, >> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = >> "chr10", class = "factor"), >> snp.id = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, >> 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("rs9971029", >> "rs9971030"), class = "factor"), position = c(71916552L, >> 71916553L, 71916554L, 71916555L, 71916556L, 71916557L, 71916558L, >> 71916559L, 71916560L, 71916561L, 71916726L, 71916727L, 71916728L, >> 71916729L, 71916730L, 71916731L, 71916732L, 71916733L, 71916734L, >> 71916735L), score = c(0.1, 0.4, 0.3, 0.9, 1, 2, 4, 0.8, 0.9, >> 0.8, 0.6, 0.5, 0.4, 0.7, 0, 0, 0.6, 0.8, 0.9, 1)), .Names = >> c("seqnames", >> "snp.id", "position", "score"), class = "data.frame", row.names = c(NA, >> -20L)) >> >> a<-df[1:10,] >> b<-df[11:20,] >> cbind(a,b)->c >> (c[,4]+c[,8])/2 >> ---------------------------------------------------------------- >> >> The data is : >> >> chr10 rs9971029 71916552 0.1 >> chr10 rs9971029 71916553 0.4 >> chr10 rs9971029 71916554 0.3 >> chr10 rs9971029 71916555 0.9 >> chr10 rs9971029 71916556 1 >> chr10 rs9971029 71916557 2 >> chr10 rs9971029 71916558 4 >> chr10 rs9971029 71916559 0.8 >> chr10 rs9971029 71916560 0.9 >> chr10 rs9971029 71916561 0.8 >> chr10 rs9971030 71916726 0.6 >> chr10 rs9971030 71916727 0.5 >> chr10 rs9971030 71916728 0.4 >> chr10 rs9971030 71916729 0.7 >> chr10 rs9971030 71916730 0 >> chr10 rs9971030 71916731 0 >> chr10 rs9971030 71916732 0.6 >> chr10 rs9971030 71916733 0.8 >> chr10 rs9971030 71916734 0.9 >> chr10 rs9971030 71916735 1 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 13.9 years ago Kevin Coombes ▴ 430

0

Entering edit mode

Daniel Brewer ★ 1.9k

@daniel-brewer-1791

Last seen 10.6 years ago

Hi Fabrice, Check out the library plyr and in particular the function ddply(), which will do exactly what you want. There are also various in built functions to do this kind of thing but the results aren't as sensible as with ddply. Dan ************************************************************** Daniel Brewer Institute of Cancer Research Molecular Carcinogenesis MUCRC 15 Cotswold Road Sutton, Surrey SM2 5NG United Kingdom Tel: +44 (0) 20 8722 4109 Fax: +44 (0) 20 8722 4141 Email: daniel.brewer at icr.ac.uk ************************************************************** >>> Fabrice Tourre <fabrice.ciup at="" gmail.com=""> 12/05/2011 16:20 >>> Dear list, I have dataframe, the second column is groups factor, each group has 10 items. The data as fellow. chr10 rs9971029 71916552 0.1 chr10 rs9971029 71916553 0.4 chr10 rs9971029 71916554 0.3 chr10 rs9971029 71916555 0.9 chr10 rs9971029 71916556 1 chr10 rs9971029 71916557 2 chr10 rs9971029 71916558 4 chr10 rs9971029 71916559 0.8 chr10 rs9971029 71916560 0.9 chr10 rs9971029 71916561 0.8 chr10 rs9971030 71916726 0.6 chr10 rs9971030 71916727 0.5 chr10 rs9971030 71916728 0.4 chr10 rs9971030 71916729 0.7 chr10 rs9971030 71916730 0 chr10 rs9971030 71916731 0 chr10 rs9971030 71916732 0.6 chr10 rs9971030 71916733 0.8 chr10 rs9971030 71916734 0.9 chr10 rs9971030 71916735 1 I want to get a average of each item based on the group factor. So at last I want to get a vector which length is 10. The value calculated as this: (0.1+0.6)/2 (0.4+0.5)/2 ? (0.8+1)/2 Thank you very much in advance. _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the a...{{dropped:2}}

ADD COMMENT • link 13.9 years ago Daniel Brewer ★ 1.9k

0

Entering edit mode

Moshe Olshansky ▴ 260

@moshe-olshansky-4491

Last seen 10.6 years ago

You can use aggregate as below: junk V1 V2 V3 V4 1 chr10 rs9971029 71916552 0.1 2 chr10 rs9971029 71916553 0.4 3 chr10 rs9971029 71916554 0.3 4 chr10 rs9971029 71916555 0.9 5 chr10 rs9971029 71916556 1.0 6 chr10 rs9971029 71916557 2.0 7 chr10 rs9971029 71916558 4.0 8 chr10 rs9971029 71916559 0.8 9 chr10 rs9971029 71916560 0.9 10 chr10 rs9971029 71916561 0.8 11 chr10 rs9971030 71916726 0.6 12 chr10 rs9971030 71916727 0.5 13 chr10 rs9971030 71916728 0.4 14 chr10 rs9971030 71916729 0.7 15 chr10 rs9971030 71916730 0.0 16 chr10 rs9971030 71916731 0.0 17 chr10 rs9971030 71916732 0.6 18 chr10 rs9971030 71916733 0.8 19 chr10 rs9971030 71916734 0.9 20 chr10 rs9971030 71916735 1.0 > colnames(junk) <- c("chr","group","someNumber","value") > aggregate(junk$value,list(junk$group),mean) Group.1 x 1 rs9971029 1.12 2 rs9971030 0.55 > Dear list, > I have dataframe, the second column is groups factor, each group has > 10 items. The data as fellow. > chr10 rs9971029 71916552 0.1 > chr10 rs9971029 71916553 0.4 > chr10 rs9971029 71916554 0.3 > chr10 rs9971029 71916555 0.9 > chr10 rs9971029 71916556 1 > chr10 rs9971029 71916557 2 > chr10 rs9971029 71916558 4 > chr10 rs9971029 71916559 0.8 > chr10 rs9971029 71916560 0.9 > chr10 rs9971029 71916561 0.8 > chr10 rs9971030 71916726 0.6 > chr10 rs9971030 71916727 0.5 > chr10 rs9971030 71916728 0.4 > chr10 rs9971030 71916729 0.7 > chr10 rs9971030 71916730 0 > chr10 rs9971030 71916731 0 > chr10 rs9971030 71916732 0.6 > chr10 rs9971030 71916733 0.8 > chr10 rs9971030 71916734 0.9 > chr10 rs9971030 71916735 1 > > I want to get a average of each item based on the group factor. So at > last I want to get a vector which length is 10. > The value calculated as this: > > (0.1+0.6)/2 > (0.4+0.5)/2 > ? > (0.8+1)/2 > > Thank you very much in advance. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 13.9 years ago Moshe Olshansky ▴ 260

0

Entering edit mode

On Thu, May 12, 2011 at 11:33 PM, Moshe Olshansky <olshansky@wehi.edu.au>wrote: > You can use aggregate as below: > > junk > V1 V2 V3 V4 > 1 chr10 rs9971029 71916552 0.1 > 2 chr10 rs9971029 71916553 0.4 > 3 chr10 rs9971029 71916554 0.3 > 4 chr10 rs9971029 71916555 0.9 > 5 chr10 rs9971029 71916556 1.0 > 6 chr10 rs9971029 71916557 2.0 > 7 chr10 rs9971029 71916558 4.0 > 8 chr10 rs9971029 71916559 0.8 > 9 chr10 rs9971029 71916560 0.9 > 10 chr10 rs9971029 71916561 0.8 > 11 chr10 rs9971030 71916726 0.6 > 12 chr10 rs9971030 71916727 0.5 > 13 chr10 rs9971030 71916728 0.4 > 14 chr10 rs9971030 71916729 0.7 > 15 chr10 rs9971030 71916730 0.0 > 16 chr10 rs9971030 71916731 0.0 > 17 chr10 rs9971030 71916732 0.6 > 18 chr10 rs9971030 71916733 0.8 > 19 chr10 rs9971030 71916734 0.9 > 20 chr10 rs9971030 71916735 1.0 > > colnames(junk) <- c("chr","group","someNumber","value") > > aggregate(junk$value,list(junk$group),mean) > Group.1 x > 1 rs9971029 1.12 > 2 rs9971030 0.55 > > For many groups, this is way faster: rowsum(junk$value, junk$group) / table(junk$group) > > Dear list, > > I have dataframe, the second column is groups factor, each group has > > 10 items. The data as fellow. > > chr10 rs9971029 71916552 0.1 > > chr10 rs9971029 71916553 0.4 > > chr10 rs9971029 71916554 0.3 > > chr10 rs9971029 71916555 0.9 > > chr10 rs9971029 71916556 1 > > chr10 rs9971029 71916557 2 > > chr10 rs9971029 71916558 4 > > chr10 rs9971029 71916559 0.8 > > chr10 rs9971029 71916560 0.9 > > chr10 rs9971029 71916561 0.8 > > chr10 rs9971030 71916726 0.6 > > chr10 rs9971030 71916727 0.5 > > chr10 rs9971030 71916728 0.4 > > chr10 rs9971030 71916729 0.7 > > chr10 rs9971030 71916730 0 > > chr10 rs9971030 71916731 0 > > chr10 rs9971030 71916732 0.6 > > chr10 rs9971030 71916733 0.8 > > chr10 rs9971030 71916734 0.9 > > chr10 rs9971030 71916735 1 > > > > I want to get a average of each item based on the group factor. So at > > last I want to get a vector which length is 10. > > The value calculated as this: > > > > (0.1+0.6)/2 > > (0.4+0.5)/2 > > ? > > (0.8+1)/2 > > > > Thank you very much in advance. > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:13}}

ADD REPLY • link 13.9 years ago Michael Lawrence ★ 11k

Login before adding your answer.