Regarding the performance, make sure you are on the very latest version (DiffBind_3.2.4
) -- there is a performance improvement that hugely speeds up adding a lot of samples. However a 100,000 interval x 400 sample matrix will take a lot of memory, so if you have limited memory it will still take a long time no matter how you do it.
Regarding your code, I see two issues:
- The column names need to be exact (including case):
SampleID
, Condition
- In the count files you are writing out, you should not include the column headers, so set
col.names=FALSE
.
So building on the code snippet from my previous response:
peaks <- dba.peakset(tamoxifen, bRetrieve = TRUE, DataType=DBA_DATA_FRAME)
counts1 <- cbind(peaks[,1:3],countMatrix[,1])
counts2 <- cbind(peaks[,1:3],countMatrix[,2])
write.table(counts1, "samp1", row.names=FALSE, col.names=FALSE,sep="\t",
quote=FALSE)
write.table(counts2, "samp2", row.names=FALSE, col.names=FALSE,sep="\t",
quote=FALSE)
newDBA <- dba.peakset(NULL, peaks=peaks, sampID="samp1",
condition="cond1", counts="samp1")
newDBA <- dba.peakset(newDBA, peaks=peaks, sampID="samp2",
condition="cond2", counts="samp2")
newDBA
Cross-posted: https://www.biostars.org/p/9480926/
ok,I deleted the post on biostars.
It was no problem - just making sure that users do not 'double up' on efforts. As far as I know, the DiffBind developer logs in occasionally here; so, we should expect an answer eventually.