IRanges error from derfinder analyzeChr with custom TxDb?
1
0
Entering edit mode
fruce_ki ▴ 20
@fruce_ki-13220
Last seen 5.5 years ago
Austria/Vienna/Research Institute for M…

My ultimate goal is to be able to run derfinder on a custom GTF. But to get a hang of how to do it, I am first trying to succeed at it using a completely stock Ensembl GTF file (so no biomaRt suggestions please).


First I create a custom TxDb:

txdb <- makeTxDbFromGFF(file='~/devtestdata/Arabidopsis_thaliana.TAIR10.36.gtf',
                        organism='Arabidopsis thaliana', 
                        chrominfo=Seqinfo(seqnames=c('1','2','3','4','5','Mt','Pt'),          seqlengths=c(30427671,19698289,23459830,18585056,26975502,366924,154478), isCircular=c(FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE), genome='TAIR10') )

No errors reported up to here.
Then (following coverage loading from 4 BAM files using loadCoverage, setting up a dataframe with two conditions, calculating the sample depths and  building the models) I call derfinder:

analyzeChr(chr='4', 
           coverageInfo=chr4Cov, 
           txdb=txdb, 
           models=fwdModels, 
           groupInfo=covariates$condition, 
           returnOutput=TRUE, 
           chrsStyle=NULL, 
           cutoffPre=3, 
           cutoffFstat=5e-02, 
           nPermute=20, 
           seeds=NULL, 
           writeOutput=FALSE, 
           lowMemDir=NULL, 
           smooth=FALSE, 
           weights=NULL, 
           smoothFunction=bumphunter::locfitByCluster )

And I get this error:

2017-08-18 15:02:32 analyzeChr: Pre-processing the coverage data
2017-08-18 15:02:35 analyzeChr: Calculating statistics
2017-08-18 15:02:35 calculateStats: calculating the F-statistics
2017-08-18 15:02:46 analyzeChr: Calculating pvalues
2017-08-18 15:02:46 analyzeChr: Using the following theoretical cutoff for the F-statistics 161.447638797588
2017-08-18 15:02:46 calculatePvalues: identifying data segments
Error in attr(x, "tsp") <- c(1, NROW(x), 1) : 
  attempt to set an attribute on NULL​

This is the stack trace:

9: hasTsp(x)
8: start.default(position)
7: start(position)
6: start(position)
5: is(start, "Ranges")
4: IRanges(start = start(position)[runValue(position)], end = end(position)[runValue(position)])
3: .clusterMakerRle(position = position, maxGap = .advanced_argument("maxRegionGap", 
       0L, ...), ranges = TRUE)
2: calculatePvalues(coveragePrep = prep, models = models, fstats = fstats, 
       nPermute = nPermute, seeds = seeds, chr = chr, cutoff = cutoff, 
       lowMemDir = lowMemDir, smooth = smooth, smoothFunction = smoothFunction, 
       weights = weights, ...)
1: analyzeChr(chr = "4", coverageInfo = fwdCov[["4"]], txdb = txdb, 
       models = fwdModels, groupInfo = experiment$covariates$condition[fwd], 
       returnOutput = TRUE, chrsStyle = NULL, cutoffPre = 3, cutoffFstat = 0.05, 
       nPermute = 20, seeds = NULL, writeOutput = FALSE, lowMemDir = NULL, 
       smooth = FALSE, weights = NULL, smoothFunction = bumphunter::locfitByCluster)​

And this is my session:

R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ensembldb_2.0.4                         AnnotationFilter_1.0.0                  biomaRt_2.32.1                          TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
 [5] BiocInstaller_1.26.0                    Rsamtools_1.28.0                        Biostrings_2.44.2                       XVector_0.16.0                         
 [9] GenomicFeatures_1.28.4                  AnnotationDbi_1.38.2                    treat_0.0.1                             DESeq2_1.16.1                          
[13] SummarizedExperiment_1.6.3              DelayedArray_0.2.7                      matrixStats_0.52.2                      Biobase_2.36.2                         
[17] GenomicRanges_1.28.4                    GenomeInfoDb_1.12.2                     IRanges_2.10.2                          S4Vectors_0.14.3                       
[21] BiocGenerics_0.22.0                     data.table_1.10.4                       derfinder_1.10.5                       

loaded via a namespace (and not attached):
 [1] ProtGenerics_1.8.0            bitops_1.0-6                  bit64_0.9-7                   httr_1.3.0                    RColorBrewer_1.1-2           
 [6] GenomicFiles_1.12.0           tools_3.4.1                   backports_1.1.0               doRNG_1.6.6                   R6_2.2.2                     
[11] rpart_4.1-11                  Hmisc_4.0-3                   DBI_0.7                       lazyeval_0.2.0                colorspace_1.3-2             
[16] nnet_7.3-12                   gridExtra_2.2.1               curl_2.8.1                    bit_1.1-12                    compiler_3.4.1               
[21] htmlTable_1.9                 pkgmaker_0.22                 rtracklayer_1.36.4            scales_0.4.1                  checkmate_1.8.3              
[26] genefilter_1.58.1             stringr_1.2.0                 digest_0.6.12                 foreign_0.8-69                base64enc_0.1-3              
[31] pkgconfig_2.0.1               htmltools_0.3.6               BSgenome_1.44.0               htmlwidgets_0.9               rlang_0.1.2                  
[36] RSQLite_2.0                   shiny_1.0.4                   BiocParallel_1.10.1           acepack_1.4.1                 VariantAnnotation_1.22.3     
[41] RCurl_1.95-4.8                magrittr_1.5                  GenomeInfoDbData_0.99.0       Formula_1.2-2                 Matrix_1.2-11                
[46] Rcpp_0.12.12                  munsell_0.4.3                 yaml_2.1.14                   stringi_1.1.5                 zlibbioc_1.22.0              
[51] plyr_1.8.4                    qvalue_2.8.0                  bumphunter_1.16.0             AnnotationHub_2.8.2           grid_3.4.1                   
[56] blob_1.1.0                    crayon_1.3.2                  lattice_0.20-35               splines_3.4.1                 annotate_1.54.0              
[61] derfinderHelper_1.10.0        locfit_1.5-9.1                knitr_1.17                    rngtools_1.2.4                geneplotter_1.54.0           
[66] reshape2_1.4.2                codetools_0.2-15              XML_3.98-1.9                  latticeExtra_0.6-28           httpuv_1.3.5                 
[71] foreach_1.4.3                 testthat_1.0.2                gtable_0.2.0                  ggplot2_2.2.1                 mime_0.5                     
[76] xtable_1.8-2                  survival_2.41-3               tibble_1.3.3                  iterators_1.0.8               GenomicAlignments_1.12.2     
[81] registry_0.3                  memoise_1.1.0                 cluster_2.0.6                 interactiveDisplayBase_1.14.0

I am assuming I'm not setting up the TxDb properly but it is not obvious to me what is going on.
(The BAM files are aligned to the TAIR10 assembly, so there should be no mismatch from that)
Any help much appreciated!
 

derfinder genomicfeatures genomicranges • 2.8k views
ADD COMMENT
0
Entering edit mode

Hi,

The TxDb is not an issue here since the TxDb is used later by analyzeChr().

It looks like the F-stat cutoff might be too high, leaving no regions. But that use case should work. Here I run the example code with a very high F-stat cutoff:

```

> ## Collapse the coverage information
> collapsedFull <- collapseFullCoverage(list(genomeData$coverage), 
+     verbose = TRUE)
2017-08-18 12:11:42 collapseFullCoverage: Sorting fullCov
2017-08-18 12:11:42 collapseFullCoverage: Collapsing chromosomes information by sample
> 
> ## Calculate library size adjustments
> sampleDepths <- sampleDepth(collapsedFull, probs = c(0.5), nonzero=TRUE, 
+     verbose=TRUE)
2017-08-18 12:11:42 sampleDepth: Calculating sample quantiles
2017-08-18 12:11:42 sampleDepth: Calculating sample adjustments
> 
> ## Build the models
> groupInfo <- genomeInfo$pop
> adjustvars <- data.frame(genomeInfo$gender)
> models <- makeModels(sampleDepths, testvars=groupInfo, adjustvars=adjustvars)
> 
> ## Analyze the chromosome
> results <- analyzeChr(chr='21', coverageInfo=genomeData, models=models, 
+     cutoffFstat=1e10, cutoffType='manual', groupInfo=groupInfo, mc.cores=1, 
+     writeOutput=FALSE, returnOutput=TRUE, runAnnotation = FALSE)
extendedMapSeqlevels: sequence names mapped from NCBI to UCSC for species homo_sapiens
2017-08-18 12:11:43 analyzeChr: Pre-processing the coverage data
2017-08-18 12:11:43 analyzeChr: Calculating statistics
2017-08-18 12:11:43 calculateStats: calculating the F-statistics
2017-08-18 12:11:43 analyzeChr: Calculating pvalues
2017-08-18 12:11:43 analyzeChr: Using the following manual cutoff for the F-statistics 1e+10
2017-08-18 12:11:43 calculatePvalues: identifying data segments
2017-08-18 12:11:43 findRegions: segmenting information
2017-08-18 12:11:43 findRegions: found no segments to work with!!
2017-08-18 12:11:43 analyzeChr: Annotating regions
> results$regions
$regions
NULL

$nullStats
NULL

$nullWidths
NULL

$nullPermutation
NULL

> packageVersion('derfinder')
[1] ‘1.11.6’

```

I wonder how the output of the following function call looks like:

```

prep <- preprocessCoverage(coverageInfo = chr4Cov,  groupInfo = covariates$condition, cutoff = 3, lowMemDir = NULL)

```

Best,

Leonardo

ADD REPLY
0
Entering edit mode

Hi Leonardo,

preprocessCoverage() seems to be working fine:

 

$coverageProcessed
DataFrame with 18585056 rows and 4 columns
         col0_Sample_01-Col-0_fwd.bam col0_Sample_02-Col-0_fwd.bam fpa8_Sample_19-fpa-8_fwd.bam fpa8_Sample_20-fpa-8_fwd.bam
                                <Rle>                        <Rle>                        <Rle>                        <Rle>
1                                   5                            5                            5                            5
2                                   5                            5                            5                            5
3                                   5                            5                            5                            5
4                                   5                            5                            5                            5
5                                   5                            5                            5                            5
...                               ...                          ...                          ...                          ...
18585052                            5                            5                            5                            5
18585053                            5                            5                            5                            5
18585054                            5                            5                            5                            5
18585055                            5                            5                            5                            5
18585056                            5                            5                            5                            5

$mclapplyIndex
$mclapplyIndex[[1]]
logical-Rle of length 18585056 with 2 runs
  Lengths:  5000000 13585056
  Values :     TRUE    FALSE

$mclapplyIndex[[2]]
logical-Rle of length 18585056 with 3 runs
  Lengths: 5000000 5000000 8585056
  Values :   FALSE    TRUE   FALSE

$mclapplyIndex[[3]]
logical-Rle of length 18585056 with 3 runs
  Lengths: 10000000  5000000  3585056
  Values :    FALSE     TRUE    FALSE

$mclapplyIndex[[4]]
logical-Rle of length 18585056 with 2 runs
  Lengths: 15000000  3585056
  Values :    FALSE     TRUE

$position
NULL

$meanCoverage
numeric-Rle of length 18585056 with 2343537 runs
  Lengths:   1419      9   2356     13    293      7     86      7     26     47      1     16      8     11      9      8     15      2      7     17      9     11     22 ...      2      1      1      1      7     25     67      1      4   1860    109    606    118    197     56     25    125     95      7     85     63   1008
  Values :      0   0.25      0   0.25      0   0.25      0   0.25      1   1.25   3.75   3.25    3.5   3.75      4   4.25   6.75   7.75    7.5   7.25   7.75    7.5      8 ...      4   3.75      3    2.5   2.25   4.75      4   3.75    2.5      0      5      0      1      0   0.25      0    0.5      0   0.25    0.5   0.25      0

$groupMeans
$groupMeans$col0
numeric-Rle of length 18585056 with 1326213 runs
  Lengths:  4190    73     1    52    15    75     7    44     9    15    64    11  1354    49    98    52  2178    11  1411    51    87   134    12    51   592     9 ...     3     1     2     8     2     1     1     1     1     1     1     1     1     1    40     1     3     1     8    25    67     1     4  1860   109  2385
  Values :     0     1     6     5    10    12    14     9    14     9     7     2     0     5    10     5     0   0.5     0     2     4     0     4     2     0   0.5 ...  91.5    92    86  86.5  80.5    64    58  53.5    44  43.5  28.5    20    18  18.5    19   9.5   6.5     5   4.5   9.5     8   7.5     5     0    10     0

$groupMeans$fpa8
numeric-Rle of length 18585056 with 2163864 runs
  Lengths:  1419     9  2356    13   293     7    86     7    26    64     8    11     9    25     7    17     9    11    22     7     7    17     8    11     8    30 ...     1     1     1     2     1     2     3     6     6    14    12     1     2     2     1  2679   118   197    56    25   125    95     7    85    63  1008
  Values :     0   0.5     0   0.5     0   0.5     0   0.5     1   1.5     2   2.5     3   3.5     3   2.5   3.5     3     4     5   5.5   9.5     9   8.5     8   7.5 ...    15    13  12.5   7.5     6     4   4.5     5   4.5   3.5   2.5     2   1.5     1   0.5     0     2     0   0.5     0     1     0   0.5     1   0.5     0
ADD REPLY
1
Entering edit mode

How did you create 

chr4Cov

and can you show me the head of it?

Thanks. It looks like it has to do with position = NULL. 

ADD REPLY
0
Entering edit mode

BINGO!
I was passing cutoff=NULL to loadCoverage() in an attempt to disable filtering at that stage (since analyzeChr() also has a cutoffPre) while keeping the output structure in the format recognised by analyzeChr(). Since the function did not complain, I assumed it was a valid value. I now tried passing cutoff=1 instead, which is probably more sensible overall, and analyzeChr() does not crash.

What is the difference between cutoff in filterData() and cutoffPre in analyzeChr()? Why are two cutoffs needed?

ADD REPLY
0
Entering edit mode

Although I'm using the latest R and Bioconductor available for MacOSX, I noticed your version of derfinder is newer. Is there any chance that you recently fixed something relevant to this?

ADD REPLY
0
Entering edit mode

I'm using Bioc-devel but your version of derfinder and mine are the same in terms of these functions.

ADD REPLY
0
Entering edit mode

I've tried different values for the F cutoff from 0.9 to 0.0005 and all of them crashed. It doesn't seem to depend on the value.

ADD REPLY
0
Entering edit mode

It looks like this is a dispatch issue, perhaps caused by some weird corruption of the methods cache. It might help to reinstall some of the packages involved, like BiocGenerics and S4Vectors.

ADD REPLY
0
Entering edit mode

Hi Michael,

I nuked all of my R and re-installed everything fresh. The error persists.

ADD REPLY
2
Entering edit mode
@lcolladotor
Last seen 4 weeks ago
United States

Hi,

The following code reproduces the problem.

library('derfinder')

datadir <- system.file('extdata', 'genomeData', package='derfinder')
files <- rawFiles(datadir = datadir, samplepatt = '*accepted_hits.bam$', 
    fileterm = NULL)
## Shorten the column names
names(files) <- gsub('_accepted_hits.bam', '', names(files))

## Read and filter the data
dataSmall <- loadCoverage(files = files, chr = '21', cutoff = NULL)

## Collapse the coverage information
collapsedFull <- collapseFullCoverage(list('chr21' = dataSmall), verbose = TRUE)

## Calculate library size adjustments
sampleDepths <- sampleDepth(collapsedFull, probs = c(0.5), nonzero=TRUE, 
    verbose=TRUE)

## Build the models
groupInfo <- genomeInfo$pop
adjustvars <- data.frame(genomeInfo$gender)
models <- makeModels(sampleDepths, testvars=groupInfo, adjustvars=adjustvars)

## Analyze the chromosome
results <- analyzeChr(chr='21', coverageInfo= dataSmall, models=models, 
    cutoffFstat=1, cutoffType='manual', groupInfo=groupInfo, mc.cores=1, 
    writeOutput=FALSE, returnOutput=TRUE, runAnnotation = FALSE, cutoffPre = 0)
names(results)

The evaluated code would then look like this:

> ## Analyze the chromosome
> results <- analyzeChr(chr='21', coverageInfo= dataSmall, models=models,
+     cutoffFstat=1, cutoffType='manual', groupInfo=groupInfo, mc.cores=1,
+     writeOutput=FALSE, returnOutput=TRUE, runAnnotation = FALSE, cutoffPre = 0)
extendedMapSeqlevels: sequence names mapped from NCBI to UCSC for species homo_sapiens
2017-08-21 13:31:30 analyzeChr: Pre-processing the coverage data
2017-08-21 13:31:32 analyzeChr: Calculating statistics
2017-08-21 13:31:32 calculateStats: calculating the F-statistics
2017-08-21 13:33:04 analyzeChr: Calculating pvalues
2017-08-21 13:33:04 analyzeChr: Using the following manual cutoff for the F-statistics 1
2017-08-21 13:33:04 calculatePvalues: identifying data segments
Error in attr(x, "tsp") <- c(1, NROW(x), 1) :
  attempt to set an attribute on NULL
> packageVersion('derfinder')
[1] ‘1.11.6’

The issue is that preprocessCoverage() re-filters the data using cutoff = cutoffPre only when colsubset is specified. The original idea was that users would load a filtered version of the data from the get go, that is, specifying some non-NULL value for cutoff in loadCoverage(). This would reduce the memory burden. I have made changes to the documentation to make this point clear. Now the error looks like this:

> ## Analyze the chromosome
> results <- analyzeChr(chr='21', coverageInfo= dataSmall, models=models,
+     cutoffFstat=1, cutoffType='manual', groupInfo=groupInfo, mc.cores=1,
+     writeOutput=FALSE, returnOutput=TRUE, runAnnotation = FALSE, cutoffPre = 0)
extendedMapSeqlevels: sequence names mapped from NCBI to UCSC for species homo_sapiens
2017-08-21 14:08:41 analyzeChr: Pre-processing the coverage data
Error in preprocessCoverage(coverageInfo = coverageInfo, groupInfo = groupInfo,  :
  'coverageInfo$position' is NULL when it shouldn't be. Use a cutoff when loading the data with loadCoverage() or fullCoverage(). Alternatively, you can specify 'colsubset' such that preproccessCoverage() will run filterData() again.

The new changes are part of derfinder 1.10.6 (bioc-release) and 1.11.8 (bioc-devel).

Best, Leonardo

ADD COMMENT
0
Entering edit mode

So what is the recommended/most efficient usage case? Should I use only one of the two options? Like set the desired cutoff right from the load and unset the cutoffPre?  Or maybe set both to the same value? Or is cutoffPre ignored when the data is already filtered and no subsetting is requested?

ADD REPLY
0
Entering edit mode

I normally set both to the same value when loading and when running analyzeChr(). analyzeChr() will only filter again if `colsubset` is used, which sometimes I need to use and others I don't. So setting the cutoff when loading and when running analyzeChr() saves me the trouble of remembering to use/update one when I use `colsubset`.

You could always run filterData() manually before using analyzeChr() if needed.

Indeed, `cutoffPre` is ignored if `colsubset` is not used in analyzeChr()/preprocessCoverage().

ADD REPLY
0
Entering edit mode

Ok, that clears it up quite a bit. Many thanks!

ADD REPLY

Login before adding your answer.

Traffic: 452 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6