Leo -- thanks for your recent help with fullCoverage() in derfinder. After your updates, that function began working very well for me. However, I have discovered that analyzeChr() is still having issues with chromosomes > 22. When I give it a chromosome in the human range, it does fine, but once we get up into higher numbers, it breaks.
R 3.1.1
Bioconductor version 3.0 (BiocInstaller 1.16.0)
derfinder 1.0.4
GenomeInfoDb 1.2.2 (version including the Canis familiaris genome update)
My script:
library(derfinder) library(GenomeInfoDb) library(GenomicRanges) args <- commandArgs(trailingOnly = TRUE) perms = 200 # for alpha = 0.05 cores = 15 # too many -> memory problems #perms = 1 #cores = 25 chroms = c(args[1]) files <- rawFiles(datadir="tophat", samplepatt="tophat-juncs-*") # Set up our groups tameIds <- read.table("/home/hekman2/Dropbox/UIUC/transcriptome/hypothalamus/notes/samples-hyp-tame.txt") tameFileNames <- lapply(tameIds, function(id) { return(paste("tophat-juncs-", id, sep="")) } ) groups <- as.factor(sapply(names(files), function(filename) { ifelse(grepl(filename, tameFileNames), "tame", "aggr") })) # Initial data analysis over all chromosomes: print("Starting fullCoverage...") fullCov <- fullCoverage(files, chrs = chroms, species = 'canis_familiaris') sampleDepths <- sampleDepth(collapseFullCoverage(fullCov), 1) models <- makeModels(sampleDepths, testvars = groups) # Keep only bases with coverage > 2 print("Filtering...") filteredCov <- lapply(fullCov, filterData, cutoff = 2) originalWd <- getwd() setwd(file.path(originalWd, 'derResults')) # Analyze each chromosome individually (analysis will be parallelized # and significance will be evaluated by permutations). for (chrom in chroms) { print(chrom) currentChrom <- paste("chr", chrom, sep="") analyzeChr(chr = currentChrom, filteredCov[[currentChrom]], models, mc.cores=cores, groupInfo = groups, writeOutput = TRUE, cutoffFstat = 0.05, nPermute = perms, seeds = 19731107 + seq_len(perms)) } print("Chromosome analysis completed.")
The error on chromosomes > 22:
2014-11-02 12:44:36 analyzeChr: Annotating regions Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) : subscript contains NAs or out-of-bounds indices Calls: analyzeChr ... extractROWS -> normalizeSingleBracketSubscript -> NSBS -> NSBS In addition: Warning message: In .Seqinfo.mergexy(x, y) : The 2 combined objects have no sequence levels in common. (Use suppressWarnings() to suppress this warning.) Execution halted
I also noticed this output message during execution of analyzeChr():
nearestgene: loading bumphunter hg19 transcript database
I'm guessing we should not be using hg19. In fact, this makes me wonder if the other low-numbered chromosomes which completed correctly have valid results or if they need to be re-run against a different transcript database.
Any help you can give is much appreciated. I suspect that the only solution will involve you editing the code again, for which I'm sorry if it's true!
Jessica
You might also want to check how to create a `TxDB.*` package (or at least the object) for Canis familiaris as I don't see one at http://www.bioconductor.org/packages/release/BiocViews.html#___AnnotationData Check more details at http://www.bioconductor.org/packages/release/bioc/vignettes/GenomicFeatures/inst/doc/GenomicFeatures.pdf
Sure. How would I use such an object (I'm not sure where it fits in to my workflow)?
I think that bumphunter::annotateNearest() could in principle use `TxDb` objects. If that's true, then you'll need one for Canis familiaris.
But well, I don't know if my 1st sentence is true