Filtering counts in SummarizedExperiment
2
0
Entering edit mode
rbronste ▴ 60
@rbronste-12189
Last seen 5.0 years ago

Hi I am making a SummarizedExperiment from a DiffBind dba.peakset in the following way (to use in DESeq2):

rangedCounts <- dba.peakset(Adult_count, bRetrieve=TRUE)

nrows <- 1025488
ncols <- 8
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
rowRanges<-GRanges(rangedCounts)

sampleN<-c("MBV1",   "MBV2",    "FBV1", "FBV2", "MBE7", "MBE8", "FBE1", "FBE2")
sampleS<-c("male", "male", "fem", "fem", "male", "male", "fem", "fem")
sampleT<-c("vehicle", "vehicle", "vehicle", "vehicle", "B", "B", "B", "B")
sampleB<-c("1","2","1","2", "1", "2", "1", "2")
colData<-data.frame(sampleName=sampleN, treatment=sampleT, batch=sampleB, treatment=sampleS)

counts <- as.matrix(mcols(rangedCounts))

se<-SummarizedExperiment(assays=list(counts=counts),rowRanges=rowRanges, colData=colData)

If I look at the count matrix after I can see something like this:

         MBV1  MBV2  MBV3  FBV1  FBV2  FBV3  MBE7  MBE8  MBE9  FBE1  FBE2  FBE3
  [1,]     1     1     1     1     1     1     1    66     1     1    50    34
  [2,]    11     1     1     1     1     1     6    98     1    11   100     1
  [3,]     1     1     1     1     1     1     1     1     1   116   108     1
  [4,]     1     1    22     2    84     1     1     4     1    64     1    40
  [5,]     1     1    18    74    74     1   102     1   126    22     1     1
  [6,]     1     1     1     1    44     1     1     1   122     1     1     1
  [7,]     1     1     1     1     1     1     1    42     1     1    96     1
  [8,]     1     2   156    20     1    58     1   250   130    62     4   282

I would like to either take this or the rangedCounts and filter at each position to lets say set a minimum of 100 for every count in the matrix or any other manipulation. I know how to do rowSums and rowMeans but not sure about other filtering. Please let me know if you can help out with this, thanks!

diffbind matrix summarizedexperiment rangedCounts • 1.7k views
ADD COMMENT
2
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 13 days ago
Cambridge, UK

You can do this using dba.count() by setting filter=100 and filterFun=min.

You'll end up filtering out most of your peaks -- if there are any single samples with low binding the site will be removed.

ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

This isn't really a Bioconductor question, but instead is a basic 'how do I get R to do things' question. And you seem to want to do one thing, but maybe something else? I mean, do you want to filter to a minimum of 100 (really?) or something else?

Anyway, you can get a long way with simple tests like

z <- rowSums(assay(se) >= 100)

and then filtering on that, depending on how many of the genomic regions have to have a count of that size.

ADD COMMENT
0
Entering edit mode

This isn't really a Bioconductor answer either. I apologize for the vagueness of the question but I think you know what I was asking and why I asked it here. Your response rehashed what I indicated I already understood. Thank you.

ADD REPLY

Login before adding your answer.

Traffic: 744 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6