Entering edit mode
Hello bioC users,
as you can see below, this was posted over a year ago. Unfortunately I
tried the same today and for some mysterious it is not working
correctly
any more.
What I have is the same data.frame:
> dat
id flybasename_gene flybase_gene_id entrezgene
1 1616608_a_at Gpdh FBgn0001128 33824
2 1622892_s_at CG33057 FBgn0053057 318833
3 1622892_s_at mkg-p FBgn0035889 38955
4 1622893_at IM3 FBgn0040736 50209
5 1622894_at CG15120 FBgn0034454 37248
GOMF
1 carboxylesterase activity:hydrolase activity:3',5'-cyclic-nucleotide
phosphodiesterase activity:protein binding:
2 nucleotide binding:protein binding:ATP binding:chaperone
binding:ammonium transmembrane transporter activity
3 nucleotide binding:protein binding:ATP binding:chaperone
binding:ammonium transmembrane transporter activity
4 aminopeptidase activity:metalloexopeptidase
activity:hydrolase activity:manganese ion binding
5
protein binding
What I would like to have is a second data frame with the GO
categories as
row names and the gene IDs to be put in each of the GO categories they
belong to. like that:
GO genes
protein binding FBgn0001128 FBgn0053057 FBgn0035889 etc.
ammonium transmembrane transporter activity FBgn0053057
FBgn0035889
hydrolayse activity FBgn0040736 FBgn0001128
Below is the script I used before, and as far as I can remember it did
work
very good:
lst <- tapply(1:nrow(dat), dat$flybase_gene_id, function(x)
dat[x,"GOMF"])
lst2 <- lapply(lst, function(x) unlist(strsplit(as.character(x),
":")))
unlst <- cbind(rep(names(lst2), sapply(lst2, length)), unlist(lst2,
use.names = FALSE))
done <- tapply(1:nrow(unlst), unlst[,2], function(x) unlst[x,1])
done_df <- lapply(done, paste, collapse = ",")
out <- data.frame(GO = names(done_df), FBgn = unlist(done_df))
But the result I am getting are not the GO categories, but a numbered
list
of the the number of gene IDs, which looks like that:
> out
GO FBgn
1 1 FBgn0040736
2 2 FBgn0001128
3 3 FBgn0035889,FBgn0053057
4 4 FBgn0034454
I would like to know if something was changed in the apply command
structure to prevent the same results as before. I would appreciate
your
help.
Thanks
Assa
> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
[[alternative HTML version deleted]]