Fwd: Segfault in MEDIPS MEDIPS.methylProfiling() with ROI file
1
0
Entering edit mode
Lukas Chavez ▴ 570
@lukas-chavez-5781
Last seen 6.9 years ago
USA/La Jolla/UCSD
Hi Stephen, please excuse my late respond. Indeed, the segfault occurs whenever the currently processed genomic region is of negative length (i.e. the start coordinate is larger than the end coordinate), the coordinates are outside of the length of the chromosome, or the chromosome of the ROI is not represented by the regions used as input for creating the MEDIPS SETs. Typically, the error can be avoided by correcting the ROI file. Although you have already checked problematic genomic regions, I would be more than happy, if you once more check these regions (and also the immediate neighbors) with respect to the constraints I just mentioned. Please let me know, if your ROI file appears to be fine. In this case I will try to back-track the error message. However, please note that I have extensively revised the MEDIPS package (also avoiding segfault errors) which I intend to update as soon as possible, especially in advance of the next Bioconductor release. I strongly recommend to switch to the new version as soon as available. Thank you and all the best, Lukas On Fri, Feb 15, 2013 at 9:23 AM, Stephen Turner <vustephen@gmail.com> wrote: > Lukas, and others: > > I'm trying to use the MEDIPS package to look for differentially > methylated regions, supplying a regions of interest file (essentially > a bed file). I was able to successfully run MEDIPS.methylProfiling > supplying the frame_size=500 argument to look for DMRs in 500-bp > windows. Now I'd like to supply my own regions of interest to look for > DMR around genes that are differentially expressed from microarray. > > I get the following segfault: > > ############### > > dmrGenes <- MEDIPS.methylProfiling(data1=CONTROL.SET, data2=BPA.SET, > ROI_file="ROI_file.txt", select=2) > Preprocessing... > Reading ROIs... > Extract data according to given ROI... > Differential methylation will be calculated on the ROI data set > Analysed 379 / 2893 > *** caught segfault *** > address 0x2b61ee9d5e98, cause 'memory not mapped' > > Traceback: > 1: .Call("roiprofile", input, as.numeric(select), as.matrix(ROI2), > as.integer(chr_binposition), data1, data2, environment(wilcox.test), > wilcox.test, environment(var), var, environment(math), math, > t.test, environment(t.test), as.numeric(factor(chr_names(data1)))) > 2: withCallingHandlers(expr, warning = function(w) > invokeRestart("muffleWarning")) > 3: suppressWarnings(.Call("roiprofile", input, as.numeric(select), > as.matrix(ROI2), as.integer(chr_binposition), data1, data2, > environment(wilcox.test), wilcox.test, environment(var), var, > environment(math), math, t.test, environment(t.test), > as.numeric(factor(chr_names(data1))))) > 4: MEDIPS.methylProfiling(data1 = CONTROL.SET, data2 = BPA.SET, > ROI_file = "ROI_file.txt", select = 2) > aborting ... > ############### > > I ran this on a machine with 128GB RAM, so I know that wasn't the > problem. It looks like the segfault was happening with line 379 in the > sample above. I went into the regions of interest (ROI) file > containing gene coordinates. Nothing looked weird about this line, but > I deleted it anyway. When re-running, I still get segfaults, just at > different positions. > > Thanks for any insight you might have. > Stephen > > > > sessionInfo() > R version 2.15.2 (2012-10-26) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] BSgenome.Mmusculus.UCSC.mm9_1.3.19 MEDIPS_1.8.0 > [3] BSgenome_1.26.1 Biostrings_2.26.3 > [5] GenomicRanges_1.10.6 IRanges_1.16.4 > [7] BiocGenerics_0.4.0 > > loaded via a namespace (and not attached): > [1] gtools_2.7.0 parallel_2.15.2 stats4_2.15.2 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
Microarray Preprocessing BSgenome BSgenome MEDIPS Microarray Preprocessing BSgenome MEDIPS • 1.3k views
ADD COMMENT
0
Entering edit mode
@stephen-turner-4916
Last seen 6.4 years ago
United States
Lukas, I checked and there are no positions where start>end position. I'm getting my regions of interest by using biomaRt to map Illumina WG6v2 identifiers to position and gene name, and using this as my ROI file. Here's how I'm getting the position information: ######################### library(biomaRt) genes <- read.table("genes.txt", T) # genes$illumina contains the wg6v2 id mart <- useMart("ensembl", dataset="mmusculus_gene_ensembl") listAttributes(mart) attributes <- c("illumina_mousewg_6_v2", "chromosome_name", "start_position","end_position", "ensembl_gene_id", "external_gene_id", "description") genes_annotated <- getBM(attributes=attributes, filters="illumina_mousewg_6_v2", values=genes$illumina, mart=mart, uniqueRows=T) roi_file <- subset(genes_annotated, chromosome_name!="Y", select=c(chromosome_name, start_position, end_position, external_gene_id)) roi_file$chromosome_name <- paste("chr", roi_file$chromosome_name, sep="") subset(roi_file, end_position<start_position) #returns="" nothing="" write.table(roi_file,="" file="ROI_file.txt" ,="" row="F," col="F," quote="F," sep="\t" )="" #########################="" you="" can="" get="" my="" roi="" file="" here:="" https:="" gist.github.com="" stephenturner="" 4997682="" the="" error="" it="" throws="" suggests="" region="" #379="" is="" the="" problem.="" that="" corresponds="" to="" chr1:135720061-135752232="" (ensmusg00000026421),="" which="" isn't="" outside="" the="" length="" of="" the="" chromosome.="" not="" sure="" what="" the="" problem="" is.="" here's="" the="" error="" below:="" #########################=""> dmrGenes <- MEDIPS.methylProfiling(data1=CONTROL.SET, data2=BPA.SET, ROI_file="ROI_file.txt", select=2) Preprocessing... Reading ROIs... Extract data according to given ROI... Differential methylation will be calculated on the ROI data set Analysed 379 / 2893 *** caught segfault *** address 0x2ad79ec9ce98, cause 'memory not mapped' Traceback: 1: .Call("roiprofile", input, as.numeric(select), as.matrix(ROI2), as.integer(chr_binposition), data1, data2, environment(wilcox.test), wilcox.test, environment(var), var, environment(math), math, t.test, environment(t.test), as.numeric(factor(chr_names(data1)))) 2: withCallingHandlers(expr, warning = function(w) invokeRestart("muffleWarning")) 3: suppressWarnings(.Call("roiprofile", input, as.numeric(select), as.matrix(ROI2), as.integer(chr_binposition), data1, data2, environment(wilcox.test), wilcox.test, environment(var), var, environment(math), math, t.test, environment(t.test), as.numeric(factor(chr_names(data1))))) 4: MEDIPS.methylProfiling(data1 = CONTROL.SET, data2 = BPA.SET, ROI_file = "ROI_file.txt", select = 2) aborting ... ######################### Thanks for any help you can provide! Stephen On Tue, Feb 19, 2013 at 3:18 PM, Lukas Chavez <lukas.chavez.mailings at="" googlemail.com=""> wrote: > > Hi Stephen, > > please excuse my late respond. > > Indeed, the segfault occurs whenever the currently processed genomic region > is of negative length (i.e. the start coordinate is larger than the end > coordinate), the coordinates are outside of the length of the chromosome, or > the chromosome of the ROI is not represented by the regions used as input > for creating the MEDIPS SETs. Typically, the error can be avoided by > correcting the ROI file. Although you have already checked problematic > genomic regions, I would be more than happy, if you once more check these > regions (and also the immediate neighbors) with respect to the constraints I > just mentioned. > > Please let me know, if your ROI file appears to be fine. In this case I will > try to back-track the error message. However, please note that I have > extensively revised the MEDIPS package (also avoiding segfault errors) which > I intend to update as soon as possible, especially in advance of the next > Bioconductor release. I strongly recommend to switch to the new version as > soon as available. > > Thank you and all the best, > Lukas > > > > > > On Fri, Feb 15, 2013 at 9:23 AM, Stephen Turner <vustephen at="" gmail.com=""> wrote: >> >> Lukas, and others: >> >> I'm trying to use the MEDIPS package to look for differentially >> methylated regions, supplying a regions of interest file (essentially >> a bed file). I was able to successfully run MEDIPS.methylProfiling >> supplying the frame_size=500 argument to look for DMRs in 500-bp >> windows. Now I'd like to supply my own regions of interest to look for >> DMR around genes that are differentially expressed from microarray. >> >> I get the following segfault: >> >> ############### >> > dmrGenes <- MEDIPS.methylProfiling(data1=CONTROL.SET, data2=BPA.SET, >> > ROI_file="ROI_file.txt", select=2) >> Preprocessing... >> Reading ROIs... >> Extract data according to given ROI... >> Differential methylation will be calculated on the ROI data set >> Analysed 379 / 2893 >> *** caught segfault *** >> address 0x2b61ee9d5e98, cause 'memory not mapped' >> >> Traceback: >> 1: .Call("roiprofile", input, as.numeric(select), as.matrix(ROI2), >> as.integer(chr_binposition), data1, data2, environment(wilcox.test), >> wilcox.test, environment(var), var, environment(math), math, >> t.test, environment(t.test), as.numeric(factor(chr_names(data1)))) >> 2: withCallingHandlers(expr, warning = function(w) >> invokeRestart("muffleWarning")) >> 3: suppressWarnings(.Call("roiprofile", input, as.numeric(select), >> as.matrix(ROI2), as.integer(chr_binposition), data1, data2, >> environment(wilcox.test), wilcox.test, environment(var), var, >> environment(math), math, t.test, environment(t.test), >> as.numeric(factor(chr_names(data1))))) >> 4: MEDIPS.methylProfiling(data1 = CONTROL.SET, data2 = BPA.SET, >> ROI_file = "ROI_file.txt", select = 2) >> aborting ... >> ############### >> >> I ran this on a machine with 128GB RAM, so I know that wasn't the >> problem. It looks like the segfault was happening with line 379 in the >> sample above. I went into the regions of interest (ROI) file >> containing gene coordinates. Nothing looked weird about this line, but >> I deleted it anyway. When re-running, I still get segfaults, just at >> different positions. >> >> Thanks for any insight you might have. >> Stephen >> >> >> > sessionInfo() >> R version 2.15.2 (2012-10-26) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] BSgenome.Mmusculus.UCSC.mm9_1.3.19 MEDIPS_1.8.0 >> [3] BSgenome_1.26.1 Biostrings_2.26.3 >> [5] GenomicRanges_1.10.6 IRanges_1.16.4 >> [7] BiocGenerics_0.4.0 >> >> loaded via a namespace (and not attached): >> [1] gtools_2.7.0 parallel_2.15.2 stats4_2.15.2 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >
ADD COMMENT
0
Entering edit mode
Hi Stephen, thank you for the additional details about the segfault error. You have convinced me that your genomic regions appear to be fine. The only thing that came to my attention is that your IDs are not unique. The MEDIPS.methylProfiling() function is calling the C function roiprofile.c which has problems with non-unique IDs for the ROIs. It would be great, if you can once more try your analysis with unique IDs (e.g. just adding a counter to each ID)? One more thing: the majority of your regions of interest are very large (up to 2,960,899 nt). In my opinion, a single averaged methylation value for such a long region is difficult to interpret. MeDIP signals typically vary in a much smaller resolution along the chromosomes and a single methylation as well as CpG density value for a long stretch will be probably just an average of many local enriched regions and many non enriched region. However, as I do not know what exactly you are analyzing your approach might be reasonable and I hope we get the segfault fixed. As I mentioned previously, the soon to be released update does not depend on C functions anymore but makes extensive use of the GenomicRanges and other packages so that the current segfault error will be avoided anyway. Thank you, Lukas On Wed, Feb 20, 2013 at 10:24 AM, Stephen Turner <vustephen@gmail.com>wrote: > Lukas, > > I checked and there are no positions where start>end position. I'm > getting my regions of interest by using biomaRt to map Illumina WG6v2 > identifiers to position and gene name, and using this as my ROI file. > > Here's how I'm getting the position information: > > ######################### > library(biomaRt) > genes <- read.table("genes.txt", T) # genes$illumina contains the wg6v2 id > mart <- useMart("ensembl", dataset="mmusculus_gene_ensembl") > listAttributes(mart) > attributes <- c("illumina_mousewg_6_v2", "chromosome_name", > "start_position","end_position", "ensembl_gene_id", > "external_gene_id", "description") > genes_annotated <- getBM(attributes=attributes, > filters="illumina_mousewg_6_v2", values=genes$illumina, mart=mart, > uniqueRows=T) > roi_file <- subset(genes_annotated, chromosome_name!="Y", > select=c(chromosome_name, start_position, end_position, > external_gene_id)) > roi_file$chromosome_name <- paste("chr", roi_file$chromosome_name, sep="") > subset(roi_file, end_position<start_position) #returns="" nothing=""> write.table(roi_file, file="ROI_file.txt", row=F, col=F, quote=F, sep="\t") > ######################### > > You can get my ROI file here: > > > > The error it throws suggests region #379 is the problem. That > corresponds to chr1:135720061-135752232 (ENSMUSG00000026421), which > isn't outside the length of the chromosome. Not sure what the problem > is. Here's the error below: > > ######################### > > dmrGenes <- MEDIPS.methylProfiling(data1=CONTROL.SET, data2=BPA.SET, > ROI_file="ROI_file.txt", select=2) > Preprocessing... > Reading ROIs... > Extract data according to given ROI... > Differential methylation will be calculated on the ROI data set > Analysed 379 / 2893 > *** caught segfault *** > address 0x2ad79ec9ce98, cause 'memory not mapped' > > Traceback: > 1: .Call("roiprofile", input, as.numeric(select), as.matrix(ROI2), > as.integer(chr_binposition), data1, data2, environment(wilcox.test), > wilcox.test, environment(var), var, environment(math), math, > t.test, environment(t.test), as.numeric(factor(chr_names(data1)))) > 2: withCallingHandlers(expr, warning = function(w) > invokeRestart("muffleWarning")) > 3: suppressWarnings(.Call("roiprofile", input, as.numeric(select), > as.matrix(ROI2), as.integer(chr_binposition), data1, data2, > environment(wilcox.test), wilcox.test, environment(var), var, > environment(math), math, t.test, environment(t.test), > as.numeric(factor(chr_names(data1))))) > 4: MEDIPS.methylProfiling(data1 = CONTROL.SET, data2 = BPA.SET, > ROI_file = "ROI_file.txt", select = 2) > aborting ... > ######################### > > Thanks for any help you can provide! > > Stephen > > On Tue, Feb 19, 2013 at 3:18 PM, Lukas Chavez > <lukas.chavez.mailings@googlemail.com> wrote: > > > > Hi Stephen, > > > > please excuse my late respond. > > > > Indeed, the segfault occurs whenever the currently processed genomic > region > > is of negative length (i.e. the start coordinate is larger than the end > > coordinate), the coordinates are outside of the length of the > chromosome, or > > the chromosome of the ROI is not represented by the regions used as input > > for creating the MEDIPS SETs. Typically, the error can be avoided by > > correcting the ROI file. Although you have already checked problematic > > genomic regions, I would be more than happy, if you once more check these > > regions (and also the immediate neighbors) with respect to the > constraints I > > just mentioned. > > > > Please let me know, if your ROI file appears to be fine. In this case I > will > > try to back-track the error message. However, please note that I have > > extensively revised the MEDIPS package (also avoiding segfault errors) > which > > I intend to update as soon as possible, especially in advance of the next > > Bioconductor release. I strongly recommend to switch to the new version > as > > soon as available. > > > > Thank you and all the best, > > Lukas > > > > > > > > > > > > On Fri, Feb 15, 2013 at 9:23 AM, Stephen Turner <vustephen@gmail.com> > wrote: > >> > >> Lukas, and others: > >> > >> I'm trying to use the MEDIPS package to look for differentially > >> methylated regions, supplying a regions of interest file (essentially > >> a bed file). I was able to successfully run MEDIPS.methylProfiling > >> supplying the frame_size=500 argument to look for DMRs in 500-bp > >> windows. Now I'd like to supply my own regions of interest to look for > >> DMR around genes that are differentially expressed from microarray. > >> > >> I get the following segfault: > >> > >> ############### > >> > dmrGenes <- MEDIPS.methylProfiling(data1=CONTROL.SET, data2=BPA.SET, > >> > ROI_file="ROI_file.txt", select=2) > >> Preprocessing... > >> Reading ROIs... > >> Extract data according to given ROI... > >> Differential methylation will be calculated on the ROI data set > >> Analysed 379 / 2893 > >> *** caught segfault *** > >> address 0x2b61ee9d5e98, cause 'memory not mapped' > >> > >> Traceback: > >> 1: .Call("roiprofile", input, as.numeric(select), as.matrix(ROI2), > >> as.integer(chr_binposition), data1, data2, environment(wilcox.test), > >> wilcox.test, environment(var), var, environment(math), math, > >> t.test, environment(t.test), as.numeric(factor(chr_names(data1)))) > >> 2: withCallingHandlers(expr, warning = function(w) > >> invokeRestart("muffleWarning")) > >> 3: suppressWarnings(.Call("roiprofile", input, as.numeric(select), > >> as.matrix(ROI2), as.integer(chr_binposition), data1, data2, > >> environment(wilcox.test), wilcox.test, environment(var), var, > >> environment(math), math, t.test, environment(t.test), > >> as.numeric(factor(chr_names(data1))))) > >> 4: MEDIPS.methylProfiling(data1 = CONTROL.SET, data2 = BPA.SET, > >> ROI_file = "ROI_file.txt", select = 2) > >> aborting ... > >> ############### > >> > >> I ran this on a machine with 128GB RAM, so I know that wasn't the > >> problem. It looks like the segfault was happening with line 379 in the > >> sample above. I went into the regions of interest (ROI) file > >> containing gene coordinates. Nothing looked weird about this line, but > >> I deleted it anyway. When re-running, I still get segfaults, just at > >> different positions. > >> > >> Thanks for any insight you might have. > >> Stephen > >> > >> > >> > sessionInfo() > >> R version 2.15.2 (2012-10-26) > >> Platform: x86_64-unknown-linux-gnu (64-bit) > >> > >> locale: > >> [1] C > >> > >> attached base packages: > >> [1] stats graphics grDevices utils datasets methods base > >> > >> other attached packages: > >> [1] BSgenome.Mmusculus.UCSC.mm9_1.3.19 MEDIPS_1.8.0 > >> [3] BSgenome_1.26.1 Biostrings_2.26.3 > >> [5] GenomicRanges_1.10.6 IRanges_1.16.4 > >> [7] BiocGenerics_0.4.0 > >> > >> loaded via a namespace (and not attached): > >> [1] gtools_2.7.0 parallel_2.15.2 stats4_2.15.2 > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 629 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6