Hi,
I am not sure whether this question here is proper or not. If it is not, I will remove this post.
I have a principal component analysis (PCA) figure below produced by DiffBind. After discussing with my colleagues, we wonder why the variation of Principal Component #1 is smaller than Principal Component #2. Please note that the horizontal axis (PC1) is from -0.365 to -0.345 (range is only 0.019), and the vertical axis (PC2) is from 0.4 to -0.4 (range is 0.8). Is it normal? Below are command lines I adopted. Any command lines I used were not correct? Many thanks.
Example of MCAS2 commands I used
macs2 callpeak -t MSailH3K27ac_C153.bam -c MSailInput_C160_C161.bam --format BAM --gsize 1000000000 --keep-dup auto --outdir MSailH3K27ac_C153 --name MSailH3K27ac_C153 --verbose 3 --tsize 75 --nomodel --extsize 200 --qvalue 0.00001 --slocal 1000 --llocal 10000 --cutoff-analysis
DiffBind commands I used
> h3k27ac <- dba(sampleSheet = "h3k27ac.csv")
MSailH3K27ac_C153 Sail H3K27ac Male MaleSail 1 macs2
MSailH3K27ac_C156 Sail H3K27ac Male MaleSail 2 macs2
MFlightH3K27ac_C159 Flight H3K27ac Male MaleFlight 1 macs2
MFlightH3K27ac_C165 Flight H3K27ac Male MaleFlight 2 macs2
FSailH3K27ac_C168 Sail H3K27ac Female FemaleSail 1 macs2
FSailH3K27ac_C171 Sail H3K27ac Female FemaleSail 2 macs2
FFlightH3K27ac_C177 Flight H3K27ac Female FemaleFlight 1 macs2
FFlightH3K27ac_C180 Flight H3K27ac Female FemaleFlight 2 macs2
> h3k27ac
8 Samples, 13047 sites in matrix (19937 total):
ID Tissue Factor Condition Treatment Replicate Caller Intervals
1 MSailH3K27ac_C153 Sail H3K27ac Male MaleSail 1 macs2 10373
2 MSailH3K27ac_C156 Sail H3K27ac Male MaleSail 2 macs2 9158
3 MFlightH3K27ac_C159 Flight H3K27ac Male MaleFlight 1 macs2 9622
4 MFlightH3K27ac_C165 Flight H3K27ac Male MaleFlight 2 macs2 9059
5 FSailH3K27ac_C168 Sail H3K27ac Female FemaleSail 1 macs2 10853
6 FSailH3K27ac_C171 Sail H3K27ac Female FemaleSail 2 macs2 15717
7 FFlightH3K27ac_C177 Flight H3K27ac Female FemaleFlight 1 macs2 12572
8 FFlightH3K27ac_C180 Flight H3K27ac Female FemaleFlight 2 macs2 11963
> h3k27ac <- dba.count(h3k27ac, minOverlap = 1, score = DBA_SCORE_TMM_MINUS_FULL, fragmentSize = 0, bRemoveDuplicates = TRUE, bScaleControl = TRUE, mapQCth = 10, bCorPlot = TRUE, bUseSummarizeOverlaps = TRUE, bParallel = TRUE)
> savefile <- dba.save(h3k27ac, file = "h3k27ac_counts")
> h3k27ac <- dba.load(file = "h3k27ac_counts")
> h3k27ac
8 Samples, 19937 sites in matrix:
ID Feather.type Factor Sex Treatment Replicate Caller Intervals FRiP
1 MSailH3K27ac_C153 Sail H3K27ac Male MaleSail 1 counts 19937 0.09
2 MSailH3K27ac_C156 Sail H3K27ac Male MaleSail 2 counts 19937 0.08
3 MFlightH3K27ac_C159 Flight H3K27ac Male MaleFlight 1 counts 19937 0.09
4 MFlightH3K27ac_C165 Flight H3K27ac Male MaleFlight 2 counts 19937 0.08
5 FSailH3K27ac_C168 Sail H3K27ac Female FemaleSail 1 counts 19937 0.09
6 FSailH3K27ac_C171 Sail H3K27ac Female FemaleSail 2 counts 19937 0.11
7 FFlightH3K27ac_C177 Flight H3K27ac Female FemaleFlight 1 counts 19937 0.10
8 FFlightH3K27ac_C180 Flight H3K27ac Female FemaleFlight 2 counts 19937 0.10
> h3k27ac$config$condition <- "Sex"
> h3k27ac$config$tissue <- "Feather type"
> dba.plotPCA(h3k27ac, attributes = DBA_CONDITION, dotSize = 3, cor = TRUE, vColors = c("deepskyblue", "deeppink"))
Thanks a lot. May I know the prcomp() parameters DiffBind adopted? More importantly, may I set prcomp() parameters in DiffBind by myself, e.g. scale. = TRUE? I really appreciate your help.
By default (
bLog=TRUE
), the read scores are first converted usinglog()
. Thenprcomp()
is called with the log2 read score matrix. Only the data matrix is passed, no other parameters are set. SettingbLog=FALSE
will pass in the read scores without callinglog2()
first.There is currently no way to change how
DiffBind
callsprcomp()
. You can however retrieve the read scores (usingdba.peakset()
withbRetrieve=TRUE
for the global binding matrix, ordba.report()
withbCounts=TRUE
for a specific contrast) and do your own PCA.-Rory