Dear Aaron,
Hi! I have used csaw before and it worked beautifully. Recently I tried to use csaw from Rstudio. When I get to the normalisation step, things started to look a bit funny:
> require(csaw)
> pe.bam <- c("X_1.bam", "X_2.bam", "Y_1.bam", "Y_2.bam")
> pe.param <- readParam(minq=50, restrict=c("Pf3D7_01_v3", "Pf3D7_02_v3", "Pf3D7_03_v3","Pf3D7_04_v3","Pf3D7_05_v3","Pf3D7_06_v3","Pf3D7_07_v3","Pf3D7_08_v3","Pf3D7_09_v3","Pf3D7_10_v3","Pf3D7_11_v3","Pf3D7_12_v3","Pf3D7_13_v3","Pf3D7_14_v3"), max.frag=1150, pe="both")
> data <- windowCounts(pe.bam, param=pe.param, spacing=50, width=300)
> design <- model.matrix(~factor(c('X', 'X', 'Y', 'Y')))
> colnames(design) <- c("intercept", "condition")
> head(assay(data))
[,1] [,2] [,3] [,4]
[1,] 7 2 2 6
[2,] 8 8 4 8
[3,] 8 10 5 11
[4,] 9 12 5 13
[5,] 9 17 5 14
[6,] 9 21 5 17
> head(rowRanges(data))
GRanges object with 6 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] Pf3D7_01_v3 [ 1, 300] *
[2] Pf3D7_01_v3 [ 51, 350] *
[3] Pf3D7_01_v3 [101, 400] *
[4] Pf3D7_01_v3 [151, 450] *
[5] Pf3D7_01_v3 [201, 500] *
[6] Pf3D7_01_v3 [251, 550] *
-------
seqinfo: 14 sequences from an unspecified genome
> data$totals
[1] 1593882 2422822 1685087 3548254
> require(edgeR)
> abundances <- aveLogCPM(asDGEList(data))
> summary(abundances)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.9018 4.0230 4.6160 4.5440 5.2340 7.8680
> keep.simple <- abundances > 2
> filtered.data <- data[keep.simple,]
> summary(keep.simple)
Mode FALSE TRUE NA's
logical 11260 446392 0
> data <- data[keep.simple,] > require(edgeR)
> binned <- windowCounts(pe.bam, param=pe.param, bin=TRUE, width=10000)
> normfacs <- normalize(binned)
> normfacs
class: RangedSummarizedExperiment
dim: 2335 4
metadata(6): spacing width ... param final.ext
assays(1): counts
rownames: NULL
rowData names(0):
colnames: NULL
colData names(5): bam.files totals ext rlen norm.factors
> y <- asDGEList(data, norm.factors=normfacs)
Error in (function (counts = matrix(0, 0, 0), lib.size = colSums(counts), :
Length of 'norm.factors' must equal number of columns in 'counts'
I have never seen this the last time I used csaw. Could you please kindly inform me what could have gone wrong?
Thank you very much from your time!
Jingyi from Melbourne
FYI, I believe that the relevant change actually happened a couple of releases ago, when
normalize
stopped returning normalization factors directly and started returning aSummarizedExperiment
object instead.normOffsets
will have the same behaviour if the input object is aSummarizedExperiment
. In any case, you should usenormOffsets
in the future; I am deprecatingnormalize
as it is too general and runs the risk of clashing with other packages.