apply function on genomicsRanges ob
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 10.3 years ago
Hello, Does the apply function exist for genomisRange object. Here , I don't talk about a genomicRangesList object but genomic Range. Is it pertinent to implement it ? Actually, I populate my gr object with a for loops : depending the position of the gene , I had some information in mcol( gr obj). unsurprising, the for loop is totally unefficient. Greg. Lady Davis Institute Montreal -- output of sessionInfo(): > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] ade4_1.6-2 IRanges_1.20.7 BiocGenerics_0.10.0 loaded via a namespace (and not attached): [1] stats4_3.0.1 tools_3.0.1 -- Sent via the guest posting facility at bioconductor.org.
• 969 views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 3.0 years ago
United States
Hi, I think you mean the GRanges class. GenomicRanges is a virtual class, GRanges is the concrete subclass. Please show a reproducable example of what you're trying to do. When you provide an example, instead of asking for a apply function for GRanges, others on the list can see what the end goal is and suggest alternatives. Using an *apply function may not be the best approach. Valerie On 06/26/2014 06:41 AM, Maintainer wrote: > Hello, > > Does the apply function exist for genomisRange object. Here , I don't talk about a genomicRangesList object but genomic Range. > Is it pertinent to implement it ? > > Actually, I populate my gr object with a for loops : depending the position of the gene , I had some information in mcol( gr obj). > unsurprising, the for loop is totally unefficient. > > Greg. > Lady Davis Institute > Montreal > > -- output of sessionInfo(): > >> sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: x86_64-redhat-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] ade4_1.6-2 IRanges_1.20.7 BiocGenerics_0.10.0 > > loaded via a namespace (and not attached): > [1] stats4_3.0.1 tools_3.0.1 > > > -- > Sent via the guest posting facility at bioconductor.org. > > ____________________________________________________________________ ____ > devteam-bioc mailing list > To unsubscribe from this mailing list send a blank email to > devteam-bioc-leave at lists.fhcrc.org > You can also unsubscribe or change your personal options at > https://lists.fhcrc.org/mailman/listinfo/devteam-bioc > -- Valerie Obenchain Program in Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, Seattle, WA 98109 Email: vobencha at fhcrc.org Phone: (206) 667-3158
ADD COMMENT
0
Entering edit mode
Hi Valirie, this is  the code . I call my function: obj_build_list5 = mclapply(obj_build_list3, function(x) {addconservedTFSInformation(x)}  ,mc.cores =nbcores) addconservedTFSInformation<- function(annot_chr){#annot_chr is a gr object   #load the initial TFS data and create a GRanges Obj   conservedTFSFile  = as.data.frame(read.table("../FullAnnotation450K/ Annotation450KBuilder/data/conserved_TFBS_sites_ucsc_JUNE2014.gtf"))   conservedTFS      = GRanges(seqnames = conservedTFSFile$V2, ranges= IRanges(conservedTFSFile$V3, conservedTFSFile$V4))   mcols(conservedTFS)= conservedTFSFile[,c(5,7,8)]   colnames(mcols(conservedTFS)) = c("TS_name","TS_strand","Z_score")   #add the metaColumn for the conservedTFS annotation   values(annot_chr) <- cbind(values(annot_chr), DataFrame(conservedTFS_name     = rep(NA, length(annot_chr))))   values(annot_chr) <- cbind(values(annot_chr), DataFrame(conservedTFS_position = rep(NA, length(annot_chr))))   values(annot_chr) <- cbind(values(annot_chr), DataFrame(conservedTFS_distance = rep(NA, length(annot_chr))))   values(annot_chr) <- cbind(values(annot_chr), DataFrame(conservedTFS_score    = rep(NA, length(annot_chr))))   values(annot_chr) <- cbind(values(annot_chr), DataFrame(conservedTFS_strand   = rep(NA, length(annot_chr))))   #add information for each CpG   for (j in 1:length(annot_chr)){     #find overlap     ov_current = nearest(annot_chr[j],conservedTFS)     conservedTFS_current= conservedTFS[ov_current]     if(!(length(ov_current)==0)){       mcols(annot_chr)[["conservedTFS_name"]][j]       = as.vector(mcols(conservedTFS_current[1])[["TS_name"]])       mcols(annot_chr)[["conservedTFS_position"]][j]   = start(ranges(conservedTFS_current[1]))       mcols(annot_chr)[["conservedTFS_distance"]][j]   = start(ranges(conservedTFS_current[1]))- start(ranges(annot_chr[j]))       mcols(annot_chr)[["conservedTFS_score"]][j]      = as.vector(mcols(conservedTFS_current[1])[["Z_score"]])     }   }   #save the object : one object by chromosome   nb= unlist(strsplit(chr, "chr"))[2]   name_obj = paste0(nb,".FullAnnotation450K_",nb, ".RData")   save (annot_chr, file = name_obj)   #return   return(annot_chr) } Le Jeudi 26 juin 2014 12h04, Valerie Obenchain <vobencha@fhcrc.org> a écrit : Hi, I think you mean the GRanges class. GenomicRanges is a virtual class, GRanges is the concrete subclass. Please show a reproducable example of what you're trying to do. When you provide an example, instead of asking for a apply function for GRanges, others on the list can see what the end goal is and suggest alternatives. Using an *apply function may not be the best approach. Valerie On 06/26/2014 06:41 AM, Maintainer wrote: > Hello, > > Does the apply function exist for genomisRange object. Here , I don't talk about a genomicRangesList object but genomic Range. > Is it pertinent to implement it ? > > Actually, I populate my gr object with a for loops : depending the position of the gene , I had some information in mcol( gr obj). > unsurprising, the for loop is totally unefficient. > > Greg. > Lady Davis Institute > Montreal > >  -- output of sessionInfo(): > >> sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: x86_64-redhat-linux-gnu (64-bit) > > locale: >  [1] LC_CTYPE=en_US.UTF-8      LC_NUMERIC=C >  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8 >  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8 >  [7] LC_PAPER=C                LC_NAME=C >  [9] LC_ADDRESS=C              LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel  stats    graphics  grDevices utils    datasets methods > [8] base > > other attached packages: > [1] ade4_1.6-2          IRanges_1.20.7      BiocGenerics_0.10.0 > > loaded via a namespace (and not attached): > [1] stats4_3.0.1 tools_3.0.1 > > > -- > Sent via the guest posting facility at bioconductor.org. > > ____________________________________________________________________ ____ > devteam-bioc mailing list > To unsubscribe from this mailing list send a blank email to > devteam-bioc-leave@lists.fhcrc.org > You can also unsubscribe or change your personal options at > https://lists.fhcrc.org/mailman/listinfo/devteam-bioc > -- Valerie Obenchain Program in Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, Seattle, WA 98109 Email: vobencha@fhcrc.org Phone: (206) 667-3158 [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
A copy of your code is not a reproducible example. By 'reproducible' I mean code that others can copy from your email into an R session and it will run. Please read the posting guide: http://www.bioconductor.org/help/mailing-list/posting-guide/ If your question is about adding metadata columns to GRanges, create a small GRanges gr <- GRanges("chr1", IRanges(1:10, width=1)) or show a few lines of the GRanges read in with addconservedTFSInformation(). Use this small GRanges to demonstrate the loop you're having trouble with. Be clear about what your matching criteria are (simply overlap of ranges?, gene id?) and what metadata you are trying to add to the rows that match. Valerie On 06/26/2014 09:53 AM, gregory voisin wrote: > Hi Valirie, > this is the code . > > I call my function: > > obj_build_list5 = mclapply(obj_build_list3, function(x) > {addconservedTFSInformation(x)} ,mc.cores =nbcores) > > > addconservedTFSInformation<- function(annot_chr){#annot_chr is a gr object > #load the initial TFS data and create a GRanges Obj > conservedTFSFile = > as.data.frame(read.table("../FullAnnotation450K/Annotation450KBuilde r/data/conserved_TFBS_sites_ucsc_JUNE2014.gtf")) > conservedTFS = GRanges(seqnames = conservedTFSFile$V2, ranges= > IRanges(conservedTFSFile$V3, conservedTFSFile$V4)) > mcols(conservedTFS)= conservedTFSFile[,c(5,7,8)] > colnames(mcols(conservedTFS)) = c("TS_name","TS_strand","Z_score") > #add the metaColumn for the conservedTFS annotation > values(annot_chr) <- cbind(values(annot_chr), > DataFrame(conservedTFS_name = rep(NA, length(annot_chr)))) > values(annot_chr) <- cbind(values(annot_chr), > DataFrame(conservedTFS_position = rep(NA, length(annot_chr)))) > values(annot_chr) <- cbind(values(annot_chr), > DataFrame(conservedTFS_distance = rep(NA, length(annot_chr)))) > values(annot_chr) <- cbind(values(annot_chr), > DataFrame(conservedTFS_score = rep(NA, length(annot_chr)))) > values(annot_chr) <- cbind(values(annot_chr), > DataFrame(conservedTFS_strand = rep(NA, length(annot_chr)))) > #add information for each CpG > for (j in 1:length(annot_chr)){ > #find overlap > ov_current = nearest(annot_chr[j],conservedTFS) > conservedTFS_current= conservedTFS[ov_current] > if(!(length(ov_current)==0)){ > mcols(annot_chr)[["conservedTFS_name"]][j] = > as.vector(mcols(conservedTFS_current[1])[["TS_name"]]) > mcols(annot_chr)[["conservedTFS_position"]][j] = > start(ranges(conservedTFS_current[1])) > mcols(annot_chr)[["conservedTFS_distance"]][j] = > start(ranges(conservedTFS_current[1]))- start(ranges(annot_chr[j])) > mcols(annot_chr)[["conservedTFS_score"]][j] = > as.vector(mcols(conservedTFS_current[1])[["Z_score"]]) > } > } > #save the object : one object by chromosome > nb= unlist(strsplit(chr, "chr"))[2] > name_obj = paste0(nb,".FullAnnotation450K_",nb, ".RData") > save (annot_chr, file = name_obj) > #return > return(annot_chr) > } > > > > > Le Jeudi 26 juin 2014 12h04, Valerie Obenchain <vobencha at="" fhcrc.org=""> a > ?crit : > > > Hi, > > I think you mean the GRanges class. GenomicRanges is a virtual class, > GRanges is the concrete subclass. > > Please show a reproducable example of what you're trying to do. When you > provide an example, instead of asking for a apply function for GRanges, > others on the list can see what the end goal is and suggest > alternatives. Using an *apply function may not be the best approach. > > Valerie > > > On 06/26/2014 06:41 AM, Maintainer wrote: > > Hello, > > > > Does the apply function exist for genomisRange object. Here , I don't > talk about a genomicRangesList object but genomic Range. > > Is it pertinent to implement it ? > > > > Actually, I populate my gr object with a for loops : depending the > position of the gene , I had some information in mcol( gr obj). > > unsurprising, the for loop is totally unefficient. > > > > Greg. > > Lady Davis Institute > > Montreal > > > > -- output of sessionInfo(): > > > >> sessionInfo() > > R version 3.0.1 (2013-05-16) > > Platform: x86_64-redhat-linux-gnu (64-bit) > > > > locale: > > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > > [7] LC_PAPER=C LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > > > attached base packages: > > [1] parallel stats graphics grDevices utils datasets methods > > [8] base > > > > other attached packages: > > [1] ade4_1.6-2 IRanges_1.20.7 BiocGenerics_0.10.0 > > > > loaded via a namespace (and not attached): > > [1] stats4_3.0.1 tools_3.0.1 > > > > > > -- > > Sent via the guest posting facility at bioconductor.org. > > > > > _________________________________________________________________ _______ > > devteam-bioc mailing list > > To unsubscribe from this mailing list send a blank email to > > devteam-bioc-leave at lists.fhcrc.org > <mailto:devteam-bioc-leave at="" lists.fhcrc.org=""> > > You can also unsubscribe or change your personal options at > > https://lists.fhcrc.org/mailman/listinfo/devteam-bioc > > > > > -- > Valerie Obenchain > Program in Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, Seattle, WA 98109 > > Email: vobencha at fhcrc.org <mailto:vobencha at="" fhcrc.org=""> > Phone: (206) 667-3158 > > >
ADD REPLY
0
Entering edit mode
On Thu, Jun 26, 2014 at 9:53 AM, gregory voisin <voisingreg@yahoo.fr> wrote: > Hi Valirie, > this is the code . > > I call my function: > > obj_build_list5 = mclapply(obj_build_list3, function(x) > {addconservedTFSInformation(x)} ,mc.cores =nbcores) > > > addconservedTFSInformation<- function(annot_chr){#annot_chr is a gr object > > > #load the initial TFS data and create a GRanges Obj > conservedTFSFile = as.data.frame(read.table()) > conservedTFS = GRanges(seqnames = conservedTFSFile$V2, ranges= > IRanges(conservedTFSFile$V3, conservedTFSFile$V4)) > mcols(conservedTFS)= conservedTFSFile[,c(5,7,8)] > colnames(mcols(conservedTFS)) = c("TS_name","TS_strand","Z_score") > The above lines are turning a GFF file into a GRanges. This is a common operation, so we have implemented it as: conservedTFS <- rtracklayer::import("../FullAnnotation450K/Annotation450KBuilder/data/ conserved_TFBS_sites_ucsc_JUNE2014.gtf") > > > #add the metaColumn for the conservedTFS annotation > values(annot_chr) <- cbind(values(annot_chr), > DataFrame(conservedTFS_name = rep(NA, length(annot_chr)))) > values(annot_chr) <- cbind(values(annot_chr), > DataFrame(conservedTFS_position = rep(NA, length(annot_chr)))) > values(annot_chr) <- cbind(values(annot_chr), > DataFrame(conservedTFS_distance = rep(NA, length(annot_chr)))) > values(annot_chr) <- cbind(values(annot_chr), > DataFrame(conservedTFS_score = rep(NA, length(annot_chr)))) > values(annot_chr) <- cbind(values(annot_chr), > DataFrame(conservedTFS_strand = rep(NA, length(annot_chr)))) > > #add information for each CpG > for (j in 1:length(annot_chr)){ > #find overlap > ov_current = nearest(annot_chr[j],conservedTFS) > conservedTFS_current= conservedTFS[ov_current] > if(!(length(ov_current)==0)){ > mcols(annot_chr)[["conservedTFS_name"]][j] = > as.vector(mcols(conservedTFS_current[1])[["TS_name"]]) > mcols(annot_chr)[["conservedTFS_position"]][j] = > start(ranges(conservedTFS_current[1])) > mcols(annot_chr)[["conservedTFS_distance"]][j] = > start(ranges(conservedTFS_current[1]))- start(ranges(annot_chr[j])) > mcols(annot_chr)[["conservedTFS_score"]][j] = > as.vector(mcols(conservedTFS_current[1])[["Z_score"]]) > } > } > The nearest() function is vectorized, so you could do: n <- nearest(annot_chr, conservedTFS) To get the index of the nearest TFS to each annot_chr range, as a vector in the same order as annot_chr. Then merge the information like this: annot_chr$conservedTFS_name <- conservedTFS$name[n] annot_chr$conservedTFS_position <- start(conservedTFS)[n] annot_chr$conservedTFS_distance <- abs(start(conservedTFS)[n] - start(annot_chr)) annot_chr$conservedTFS_score <- conservedTFS$score[n] Hope this helps get you started, Michael > #save the object : one object by chromosome > nb= unlist(strsplit(chr, "chr"))[2] > name_obj = paste0(nb,".FullAnnotation450K_",nb, ".RData") > save (annot_chr, file = name_obj) > > #return > return(annot_chr) > } > > > > > Le Jeudi 26 juin 2014 12h04, Valerie Obenchain <vobencha@fhcrc.org> a > écrit : > > > > Hi, > > I think you mean the GRanges class. GenomicRanges is a virtual class, > GRanges is the concrete subclass. > > Please show a reproducable example of what you're trying to do. When you > provide an example, instead of asking for a apply function for GRanges, > others on the list can see what the end goal is and suggest > alternatives. Using an *apply function may not be the best approach. > > Valerie > > > > On 06/26/2014 06:41 AM, Maintainer wrote: > > Hello, > > > > Does the apply function exist for genomisRange object. Here , I don't > talk about a genomicRangesList object but genomic Range. > > Is it pertinent to implement it ? > > > > Actually, I populate my gr object with a for loops : depending the > position of the gene , I had some information in mcol( gr obj). > > unsurprising, the for loop is totally unefficient. > > > > Greg. > > Lady Davis Institute > > Montreal > > > > -- output of sessionInfo(): > > > >> > sessionInfo() > > R version 3.0.1 (2013-05-16) > > Platform: x86_64-redhat-linux-gnu (64-bit) > > > > locale: > > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > > [7] LC_PAPER=C LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > > > attached base packages: > > [1] parallel stats graphics grDevices utils datasets methods > > [8] base > > > > other attached packages: > > [1] ade4_1.6-2 IRanges_1.20.7 BiocGenerics_0.10.0 > > > > loaded via a namespace (and not attached): > > [1] stats4_3.0.1 tools_3.0.1 > > > > > > -- > > Sent via the guest posting facility at bioconductor.org. > > > > __________________________________________________________________ ______ > > devteam-bioc mailing list > > To unsubscribe from this mailing list send a blank email to > > devteam-bioc-leave@lists.fhcrc.org > > You can also unsubscribe or change your personal options at > > https://lists.fhcrc.org/mailman/listinfo/devteam-bioc > > > > > -- > Valerie Obenchain > Program in Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, Seattle, WA 98109 > > Email: vobencha@fhcrc.org > Phone: (206) 667-3158 > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 765 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6