GoStats and microRNA pipeline using Biomart
0
0
Entering edit mode
David ▴ 860
@david-3335
Last seen 6.7 years ago
Hi marc, Thanks for the tip, i realized that my function was too slow. I agree with you. Getting all GO in one shot is a much better approach. On 03/31/2011 09:23 PM, Marc Carlson wrote: > Hi David, > > If this was your function you would 1st of all want to just pass in a > big vector (with your universe of transcript IDs in it) to get out all > the data. Then making the GOFrame is just a matter of taking all the > Gene IDs (entrez gene IDs) and all the GO IDs (from any of the three > ontologies), and the evidence codes into a single data.frame as outlined > in this document here: > > http://www.bioconductor.org/packages/2.7/bioc/vignettes/GOstats/inst /doc/GOstatsForUnsupportedOrganisms.pdf > > > But if it were me, I would attempt to save a little headache for making > the final table, but just getting only the data I needed from getBM (and > since they keep the three ontologies separate, that means I would make > three calls to get BM. So like this > > getBioProcgoids <- function (id) { > getBM(attributes=c( > 'go_biological_process_id', > 'go_biological_process_linkage_type', > 'entrezgene') > ,filters="ensembl_transcript_id", values=id, mart=mart) > } > BioGOs <- getBioProcgoids( > yourBigUniverseVectorOfEnsemblTranscriptIDsGoesHere ) > > Then do separate small functions to get the other two ontologies and > call them etc. > > Then something like this: > > myGOFrame <- rbind(BioGOs, CCGOs, MFGOs) > > To stick them all together. > > Does that help? > > > Marc > > > > On 03/31/2011 02:47 AM, David martin wrote: >> Ok thanks, >> Any idea on how to turn the biomart output into a valid GOFrame input ?? >> >> For example : >> I wrote this function >> >> getgoids <- function (id) { >> getBM(attributes=c( >> 'entrezgene', >> 'ensembl_transcript_id', >> 'go_biological_process_id', >> 'go_biological_process_linkage_type', >> 'go_cellular_component_id', >> 'go_cellular_component_linkage_type', >> 'go_molecular_function_id', >> 'go_molecular_function_linkage_type') >> ,filters="ensembl_transcript_id", values=id, mart=mart) >> } >> foo >> >> How do i turn this into a valid GOFrame Object ? >> >> thanks, >> david >> >> >> >> >> On 03/31/2011 10:10 AM, James F. Reid wrote: >>> Hi David, >>> >>> On 03/30/2011 08:31 PM, David martin wrote: >>> > Yes absolutly. A few ensembl releases ago UTR tend to be smaller but >>> > this is getting better now. How would you normalize that based on >>> length ? >>> >>> I'm afraid that I don't have a simple answer to this it would need >>> thinking out especially wrt to your GO enrichment analysis. >>> Any ideas from the members of the list? >>> >>> Best, >>> J. >>> >>>> On 03/30/2011 07:00 PM, James F. Reid wrote: >>>>> Hi David, >>>>> >>>>> I understand your reasoning for counting the number of miRNA binding >>>>> sites with the 3' UTR of a predicted target, you are trying to include >>>>> the 'combinatorial' effect of miRNA targeting. >>>>> I would try to include the length of any UTR however (some kind of >>>>> normalization if you wish) since the longer the UTR the more >>>>> chances are >>>>> that miRNA will bind. >>>>> Does this make sense? >>>>> >>>>> Best, >>>>> J. >>>>> >>>>> On 03/30/2011 05:23 PM, David martin wrote: >>>>>> On 03/30/2011 04:56 PM, Steve Lianoglou wrote: >>>>>>> Hi, >>>>>>> >>>>>>> On Wed, Mar 30, 2011 at 9:43 AM, David >>>>>>> martin<vilanew at="" gmail.com=""> wrote: >>>>>>>> Hi, >>>>>>>> I open this new discussion so not to confuse with the previous one. >>>>>>>> >>>>>>>> The objective here is to look for overrepresented GoTerms from >>>>>>>> microRNA >>>>>>>> targets. One microRNA can have several targets (genes) and one >>>>>>>> single >>>>>>>> gene >>>>>>>> can be targeted by several microRNAs. The assumption is to check >>>>>>>> for a >>>>>>>> specific microRNAs which GoTerms are overrepresented. >>>>>>>> >>>>>>>> >>>>>>>> Ok so let's say me my microRNA of interest is mir-A. >>>>>>>> >>>>>>>> Step1: based on my favorite prediction algorithm i have managed to >>>>>>>> get a >>>>>>>> list of genes targeted by mir-A. The genes are ensembl transcripts >>>>>>>> and as i >>>>>>>> said before miR-A can target several times the same transcript (at >>>>>>>> different >>>>>>>> location) so i need to account for this. >>>>>>>> >>>>>>>> miR-A targets -> >>>>>>>> ENST001,ENST001,ENST001,ENST0025,ENST089,ENST099,ENST0099......) up >>>>>>>> to 300 >>>>>>>> different transcripts. >>>>>>> >>>>>>> I don't get why you'd want to have the same transcript multiple >>>>>>> times >>>>>>> as a target for the miRNA -- if the miRNA targets the same >>>>>>> transcript >>>>>>> in two different locations, you then want to double count the GO >>>>>>> terms >>>>>>> associated with that transcript? >>>>>> >>>>>> That's correct. The idea behind that is that a transcript targeted at >>>>>> different locations is more "likely to be twice targeted" and >>>>>> therefore >>>>>> GO term associated to this transcript have to be replicated. This >>>>>> sound >>>>>> good to me but i don not expect that you agree on that. >>>>>> >>>>>> >>>>>> i have managed to get all GO ids with a small function. Basically you >>>>>> input one transcript id in a loop >>>>>> >>>>>> l = length(genes) # list of all ensembl transcripts >>>>>> for (l in 1:l) >>>>>> { >>>>>> goid[l] <- getgoids("ENST...") >>>>>> >>>>>> } >>>>>> getgoids <- function (id) { >>>>>> getBM(attributes=c( >>>>>> 'go_biological_process_id', >>>>>> 'go_biological_process_linkage_type', >>>>>> 'go_cellular_component_id', >>>>>> 'go_cellular_component_linkage_type', >>>>>> 'go_molecular_function_id', >>>>>> 'go_molecular_function_linkage_type') >>>>>> ,filters="ensembl_transcript_id", values=id, mart=mart) >>>>>> } >>>>>> >>>>>> I agree wioth you that i might need to add the transcript_id to be >>>>>> able >>>>>> to use for GoStats mapping between transcripts and GO ids. >>>>>> >>>>>> >>>>>> Now i want to use that as the univere set for GoStats and do >>>>>> hyperG to >>>>>> compare with the GO for a specific microRNA. >>>>>> >>>>>> I guess : >>>>>> >>>>>> goframeData = data.frame(frame$go_id, frame$Evidence, frame$gene_id) >>>>>> #list of all GOids from all transcripts targeted by all microRNA >>>>>> >>>>>> goFrame = GOFrame(goframeData, organism = "Homo sapiens") >>>>>> goAllFrame = GOAllFrame(goFrame) #Geneid to ALL go id mapping >>>>>> >>>>>> >>>>>> In the GSEAGOHyperGParams function below can you correct me ? >>>>>> geneSetCollection = List of all go ids off all transcripts >>>>>> targetted by >>>>>> all microRNA >>>>>> single_mir_transcript_ids = list of ENSEMBl transcripts ids >>>>>> targeted by >>>>>> a specific microRNA >>>>>> univerGeneIds: list of transcript to Go mapping >>>>>> Is this correc t? >>>>>> >>>>>> >>>>>> gsc <- GeneSetCollection(goAllFrame, setType = GOCollection()) >>>>>> params <- GSEAGOHyperGParams(name = "My Custom GSEA based annot >>>>>> Params",geneSetCollection = gsc, geneIds = >>>>>> single_mir_transcripts_ids, >>>>>> universeGeneIds = universe,ontology = "BP", pvalueCutoff = 0.05, >>>>>> conditional = FALSE,testDirection = "over") >>>>>> >>>>>> >>>>>>> >>>>>>> Somehow that seems wrong to me -- if the "hit count" of the miRNA to >>>>>>> the transcript is important to you, one thing you can do is store >>>>>>> your >>>>>>> miR-A vector as its "table()" so the names will the the transcripts, >>>>>>> and the values will be the number of hits. >>>>>>> >>>>>>>> I use biomart to get the corresponding GoIds for these transcripts >>>>>>>> >>>>>>>> .... >>>>>>>> #Select mart database >>>>>>>> mart<- useMart("ensembl", dataset="hsapiens_gene_ensembl") >>>>>>>> >>>>>>>> #Get go for a specific transcript >>>>>>>> # First problem as Biomart will not return twice GoTerms for >>>>>>>> duplicated >>>>>>>> transcripts. The example below show that for transcript >>>>>>>> c("ENST00000347770","ENST00000347770") i get the same goTerms than >>>>>>>> for >>>>>>>> transcript c("ENST00000347770"). >>>>>>>> # As i said before a microRNA can target several times the same >>>>>>>> microRNA so >>>>>>>> twice the number of goterms associated to this particular microRNA. >>>>>>>> Can we >>>>>>>> force biomart to return redundant GoTerms ???? >>>>>>> >>>>>>> I'm actually still not sure what you want to do, but if you >>>>>>> follow my >>>>>>> advice above, you can manipulate the data.frame you get from >>>>>>> getBM to >>>>>>> replicate rows (or whatever you're trying to do). >>>>>>> >>>>>>> You will also want to add "ensembl_transcript_id" to your vector of >>>>>>> attributes so you can reassociate the rows in the table that is >>>>>>> returned to you with your original ensembl transcripts you are >>>>>>> querying for, eg: >>>>>>> >>>>>>> R> gomir<- getBM(attributes=c('ensembl_transcript_id', 'go..', ...), >>>>>>> filters='ensemble_transcript_id', values=c("ENST..."), mart=mart) >>>>>>> >>>>>>> Hope that helps, >>>>>>> -steve >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor at r-project.org >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
miRNA Normalization GO Organism GOstats biomaRt microRNA miRNA Normalization GO Organism • 1.4k views
ADD COMMENT

Login before adding your answer.

Traffic: 682 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6