Hi, I'm fairly new to R and bioinformatics and I'm working on a project with DESeq2 in R studio. I've been trying to run the "summarizeOverlaps" function on 12 bam files. However, I let it run over the weekend and the process still hasn't finished running. I don't think it's frozen, but i'm not sure if I should try and restart it again. Does anyone know of a faster way to work around SummarizeOverlap?
You will have to give more information than that. How big are the bam files? What are you summarizing overlaps against? What is the output of sessionInfo? How much RAM does your computer have?
As Jim says, we need more information in order to help you. Please show the code of how you are calling summarizeOverlaps(). There are several options for managing memory when counting bam files, 'yieldSize' in BamFile, ScanBamParam for reading in subsets, etc. You may want to look at the Counting reads with summarizeOverlaps vignette, specifically section 2.
SorrI'm using RStudio 3.2.1 and I'm trying to run 12 BAM files that are each about 100,000 kb in size against a UCSC txdb GRangesList. The computer I'm using is windows8 with 16gb of RAM and a 3.4 ghz processor. When I terminate the summarizeOverlap function I get this warning message:
"Warning message: running command 'env MASTER=localhost PORT=11880 OUT=/dev/null RPROG=C:/PROGRA~1/R/R-32~1.1/bin/R R_LIBS= C:/Users/lchen/Documents/R/win-library/3.2/BiocParallel/RSOCKnode.sh' had status 127"
I'm wasn't sure what status 127 meant, but I checked the folder and found that the file was still there, so i'm not sure why I keep getting this error.
Please show your code so we can see how you are calling the function.
Provide the output of sessionInfo() so we can see the versions of R and packages you are using.
Did you read the Counting reads with summarizeOverlaps vignette, section 2? The last paragraph starting with "By default ..." talks about iterating through files in chunks and controlling the number of cores used.
Start with one file. When you have success with that move on to processing more. summarizeOverlaps() iterates through files in chunks defined by 'yieldSize' in a BamFile object and processes files in parallel using bplapply().
Thanks for posting your code and sessionInfo(). It looks like you're on the right track. Again, I think it's wise to start with a single file, make sure your set up is correct, then move to multiple files from there.
res <- summarizeOverlaps(exons, bf, singleEnd=FALSE, fragments=TRUE)
You may have done this already but it's good to start with a man page example to make sure all is working as expected. This codes counts a single bam with a 'yieldSize' and should work for you. If it doesn't then we have a different problem.
Thanks for posting the full error message. That is helpful.
It's possible the port being used to start the parallel workers is blocked. Do you have a system administrator you can ask about blocked ports? They should be able to tell you which are not blocked. Here is a similar derfinder: analyzeChr() freezes in f-stat calculationwith more discussion about this problem.
Once you have a non-blocked number you can use this to specify the R environment variable, R_PARALLEL_PORT, e.g., starting R with something like R_PARALLEL_PORT=12345 R --vanilla. Also see ?Sys.setenv for setting environment variables from within R.
A little more troubleshooting:
SerialParam (does not use workers/ports) should work:
library(BiocParallel)
bplapply(1:2, function(i) { Sys.sleep(1); i }, BPPARAM=SerialParam())
SnowParam will likely fail:
bplapply(1:2, function(i) { Sys.sleep(1); i }, BPPARAM=SnowParam(2))
And when i run SnowParam, i get the same error message:
> bplapply(1:2, function(i) { Sys.sleep(1); i }, BPPARAM=SnowParam(2))
Warning message:
running command 'env MASTER=localhost PORT=11444 OUT=/dev/null RPROG=C:/PROGRA~1/R/R-32~1.1/bin/R R_LIBS= C:/Users/lchen/Documents/R/win-library/3.2/BiocParallel/RSOCKnode.sh' had status 127
I'm an administrator on this comptuer, will I be able to check for blocked ports? Or will SerialParam work to run summarizeOverlaps? I added serialParam to the code but it was not working.
As for finding unblocked ports on windows, I'm not an expert. It looks like you can run netstat -a on the command line and look for LISTENING processes or use the Task Manager - more details here.
Then I replaced the packages with my own BamFiles and database and got a different error:
bf <- BamFile("list", yieldSize=100000)
> se <- summarizeOverlaps(exons, bf, BPPARAM = SerialParam())
Error in .dispatchBamFiles(features, BamFileList(reads), mode, match.arg(algorithm), :
file(s): list do not exist
As for the unblocked ports, I'll check out that link to find out more about that and see if I can find a way to unblock them!
Also, when i looked at summarizeOverlaps I get these warning messages:
> summarizeOverlaps()
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘summarizeOverlaps’ for signature ‘"missing", "missing"’
In addition: Warning messages:
1: In read.dcf(file.path(p, "DESCRIPTION"), c("Package", "Version")) :
cannot open compressed file 'C:/Users/lchen/Documents/R/win-library/3.2/lattice/DESCRIPTION', probable reason 'No such file or directory'
2: In read.dcf(file.path(p, "DESCRIPTION"), c("Package", "Version")) :
cannot open compressed file 'C:/Users/lchen/Documents/R/win-library/3.2/survival/DESCRIPTION', probable reason 'No such file or directory'
3: In read.dcf(file.path(p, "DESCRIPTION"), c("Package", "Version")) :
cannot open compressed file 'C:/Users/lchen/Documents/R/win-library/3.2/foreign/DESCRIPTION', probable reason 'No such file or directory'
4: In read.dcf(file.path(p, "DESCRIPTION"), c("Package", "Version")) :
cannot open compressed file 'C:/Users/lchen/Documents/R/win-library/3.2/cluster/DESCRIPTION', probable reason 'No such file or directory'
5: In read.dcf(file.path(p, "DESCRIPTION"), c("Package", "Version")) :
cannot open compressed file 'C:/Users/lchen/Documents/R/win-library/3.2/MASS/DESCRIPTION', probable reason 'No such file or directory'
6: In read.dcf(file.path(p, "DESCRIPTION"), c("Package", "Version")) :
cannot open compressed file 'C:/Users/lchen/Documents/R/win-library/3.2/rpart/DESCRIPTION', probable reason 'No such file or directory'
7: In read.dcf(file.path(p, "DESCRIPTION"), c("Package", "Version")) :
cannot open compressed file 'C:/Users/lchen/Documents/R/win-library/3.2/nnet/DESCRIPTION', probable reason 'No such file or directory'
Would not being able to open those files be part of the problem possibly? I've never seen those warning messagesbefore.
You now have a small example that works so you know summarizeOverlaps() runs on your system.
Next you need to check your input args to summarizeOverlaps and see why they aren't working. Start with one of your files (don't start with 12). Make sure the single file is found, use file.exists() as I showed earlier. Read the man page ?file.exists if you aren't sure how to use it. Check the GRangesList you're using as 'features'. Does everything look as expected? Another thing you may need to check is that the chromosome names in the bam files match those in the GRangesList. Use scanBamHeader() on one of the files to see the chromosome names. If you need to rename chromosomes in the TxDb to match the bam, see ?renameSeqlevels. There are several functions on that page that can help.
That last error you see is telling you there is no method for summarizeOverlaps with missing 'features' and missing 'reads'. Makes sense. That means it can't find whatever you gave it as 'features' and 'reads'.
showMethods() shows available methods for a function:
I played around with it this morning and it worked! I ended up checking my file names, restarted R, and used serialParam instead of snow! Thanks for all your help :)
I've found that featureCounts (http://bioinf.wehi.edu.au/featureCounts/) outside of R is great and super fast for counting reads. Then I just import the counts into R.
You will have to give more information than that. How big are the bam files? What are you summarizing overlaps against? What is the output of sessionInfo? How much RAM does your computer have?