I am attempting to run the latest version of MEDIPS. I have been following the example in this tutorial to set up my R script: http://www.bioconductor.org/packages/2.12/bioc/vignettes/MEDIPS/inst/doc/MEDIPS.pdf
Everything runs without any errors, however, the output doesn't look right to me. For example, here is the first few lines from MEDIPS.meth
chr start stop CF X2668.SMC.0001.sort.bam.counts 1 chr1 1 100 0 4 2 chr1 101 200 0 6 3 chr1 201 300 2 7 4 chr1 301 400 2 6 5 chr1 401 500 0 6 6 chr1 501 600 2 4 7 chr1 601 700 0 2 8 chr1 701 800 0 2 9 chr1 801 900 0 0
I think there are supposed to be more columns (one for each sample). Set 1 had 6 samples, and set 2 had 11 samples. Additionally, the first sample did not have an "X" at the begining of its name (the rest of the name is correct). Below is the code that I used to generate this data. Note that I am calling this script via Rscript from a Linux bash script. All file paths and PATH issues in .bashrc have been addressed.
#Get Arguments args <- commandArgs(TRUE) filepath1 <- args[1] filepath2 <- args[2] chrm <- args[3] input1 <- args[4] input2 <- args[5] #Load MEDIPS libraries and set environment variables library(MEDIPS) library(BSgenome.Rnorvegicus.UCSC.rn5) BSgenome="BSgenome.Rnorvegicus.UCSC.rn5" uniq=TRUE extend=300 shift=0 ws=100 chr.select <- chrm #Get vectors with lists of files and file names files1=list.files(path = filepath1, pattern = "\\.bam$", all.files = FALSE, full.names = TRUE, recursive = FALSE, ignore.case = FALSE, include.dirs = FALSE, no.. =FALSE) names1=list.files(path = filepath1, pattern = "\\.bam$", all.files = FALSE, full.names = FALSE, recursive = FALSE, ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE) files2=list.files(path = filepath2, pattern = "\\.bam$", all.files = FALSE, full.names = TRUE, recursive = FALSE, ignore.case = FALSE, include.dirs = FALSE, no.. =FALSE) names2=list.files(path = filepath2, pattern = "\\.bam$", all.files = FALSE, full.names = FALSE, recursive = FALSE, ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE) #Get length of file list vectors len1 <- length(files1) len2 <- length(files2) #Set max lines for printing options(max.print=1000000000) #Loop through first set of files to create data set x <- 2:len1 set1 = MEDIPS.createSet(file = files1[1], BSgenome = BSgenome, extend = extend, shift = shift, uniq = uniq, window_size = ws, chr.select = chr.select) #print(files1[1]) for (i in seq(along=x)) { y <- i + 1 set1 = c(set1, MEDIPS.createSet(file = files1[y], BSgenome = BSgenome, extend = extend, shift = shift, uniq = uniq, window_size = ws, chr.select = chr.select)) #print(files1[y]) } set1 #Loop through second set of files to create data set x <- 2:len2 #print(files2[1]) set2 = MEDIPS.createSet(file = files2[1], BSgenome = BSgenome, extend = extend, shift = shift, uniq = uniq, window_size = ws, chr.select = chr.select) for (i in seq(along=x)) { y <- i + 1 set2 = c(set2, MEDIPS.createSet(file = files2[y], BSgenome = BSgenome, extend = extend, shift = shift, uniq = uniq, window_size = ws, chr.select = chr.select)) #print(files2[y]) } set2 #Generate input file sets input_set1 = MEDIPS.createSet(file = input1, BSgenome = BSgenome, extend = extend, shift = shift, uniq = uniq, window_size = ws, chr.select = chr.select) input_set2 = MEDIPS.createSet(file = input2, BSgenome = BSgenome, extend = extend, shift = shift, uniq = uniq, window_size = ws, chr.select = chr.select) #Generate coupling set CS = MEDIPS.couplingVector(pattern = "CG", refObj = set1[[1]]) #Calculate differential coverage mr.edgeR = MEDIPS.meth(MSet1 = set1, MSet2 = set2, CSet = CS, ISet1 = input_set1, ISet2 = input_set2, p.adj = "bonferroni", diff.method = "edgeR", prob.method = "poisson", MeDIP = T, CNV = F, type = "rpkm", minRowSum = 1) sink(paste0(chr.select,"_mr.edgeR.txt"), append=FALSE, split=FALSE) mr.edgeR sink()
Everything appeared to finish with no errors. Here are the contents of the various data structures:
files1
names1
files2
names2
set1
set2
I tried running again, changing two of the files in each group to match your specifications (- to _, no #s starting the file name), and got the same results.
Next I tried running against a smaller data set (3 files in each set) where all of the files have been renamed (- to _, no #s starting the file name). I'm still getting the same problem.
Here's the first 10 lines of the output file:
And here are the data structures again:
files1
names1
files2
names2
set1
set2
write.table worked!
The output looks right now. Thanks for your help. I'll post a new topic if something else looks off.