What does dba.PCA in DiffBind exactly do?
3
2
Entering edit mode
renanec ▴ 20
@renanec-9080
Last seen 9.2 years ago
United States

Hi there,

I’ve been using the dba.PCA function (Package DiffBind version 1.14.6) in the standard way described in the documentation. So far,  I have two very related questions:

1. How can I retrieve the scores, loadings, eigenvalues from dba.PCA?

2. I’m interested in understanding how dba.PCA works. To do so I’ve unsuccessfully tried to recreate the PCA myself, the results are closed but not identical. I’m guessing that there are other steps in between or perhaps a different implementation of PCA that you’re using . Could you please take a look at my code ? Ideally I would like to recreate outside of dba.PCA each of the internal steps in the function

samples =dba(sampleSheet = "sample_sheet_all.csv")
samples = dba.count(samples,score = DBA_SCORE_READS)​

#Retrieve read count data

peaks.init.ranges<-dba.peakset(samples, bRetrieve=TRUE)

#Create a matrix with the read counts (there might be a more direct way of creating the inputPCA_matrix)
inputPCA = mcols(peaks.init.ranges)
inputPCA = as.data.frame(inputPCA)
inputPCA_matrix = data.matrix(inputPCA)

#Transpose matrix so that each row is a sample and the columns are read counts
inputPCA_matrix=t(inputPCA_matrix)
#It uses the covariance matrix by default
pr.res=prcomp(inputPCA_matrix)
scores = as.data.frame(pr.res$x)
biplot(pr.res)

Many Thanks,

Renan Escalante-Chong

 

diffbind pca • 4.0k views
ADD COMMENT
2
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 9 weeks ago
Cambridge, UK

Hi Renan-

You're on the right track, except for a few items:

  1. By default when you call dba.plotPCA(), the parameter bLog=TRUE, which means the log2() values of the read counts are used. Setting bLog=FALSE will use the raw read counts. Logs are default because it is less likely to identify a single outlier, with all the other sample bunched together, than with unlogged data.
  2. dba.plotPCA() uses princomp(), not prcomp(). This shouldn't make a noticeable difference however (indeed, I'm likely to change it to prcomp() in the next release).
  3. dba.plotPCA() plots the raw loadings, equivalent to pr.res$rotation[,1:2] (actually pr.res$loadings[,1:2] as it comes back from princomp()).

Hope this helps-

Rory

 

ADD COMMENT
0
Entering edit mode

Hi Rory,

Sorry to dig up an old post but I had a follow up question; if you transpose the matrix as Renan suggested, wouldn't pr.res$rotation[,1:2] give you the loadings of the peaks, not of the samples?

I've been trying to replicate Renan's code and I get the same PC contributions as in the diffbind PCA plot only when I use the non-transposed matrix.

- Brook

ADD REPLY
0
Entering edit mode

DiffBind doesn't transpose the matrix.

-R

ADD REPLY
0
Entering edit mode

Hi Rory,

Is there a way to change the pch values in dba.plotPCA() by some metadata column? It seems round points is hard-coded into the function and can't be changed. And if not, is there a way to store the coordinates of the points for use in some other plotting package e.g. ggplot2?

ADD REPLY
1
Entering edit mode

This is indeed hard-coded.

One option you can consider is that dba.plotPCA() silently returns a trellis object, which you can update. For example:

library(lattice)
data(tamoxifen_analysis)
pcaplot <- dba.plotPCA(tamoxifen)
update(pcaplot, pch=17, cex=.7)
ADD REPLY
0
Entering edit mode
renanec ▴ 20
@renanec-9080
Last seen 9.2 years ago
United States

Thanks! This clarifies how the PCA works in the package.

ADD COMMENT
0
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 9 weeks ago
Cambridge, UK

I have gone ahead and changed it to use pr.comp() in the development version.

ADD COMMENT

Login before adding your answer.

Traffic: 324 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6