Incorporating Salmon offset matrix into Wilcoxon DEG analysis
1
0
Entering edit mode
Andre • 0
@804722cd
Last seen 5 hours ago
Portugal

I´m working with this dataset(GSE234297) that provides a salmon counts file and a salmon offsets file(GSE234297_gene_raw_counts.txt.gz GSE234297_gene_offset_matrix.txt.gz) . The paper authors use them like this to do DEG:~

 # create DGEList
y <- DGEList(counts = counts, samples = samples, genes = genes, group = samples$Disease_status)

# add salmon offsets
y <- scaleOffset(y, offset = as.matrix(salmon_offset)) 

# filter low count genes 
keep <- filterByExpr(y, group = y$samples$group)
table(keep) 
y <- y[keep, , keep.lib.sizes=FALSE] 

# set design matrix
design <- model.matrix(~ Sex + GC_content + group, data = y$samples)
design

# estimate dispersion parameters using edgeR robust method
y <- estimateGLMRobustDisp(y, design)

But my question is if I want to use the salmon offsets for Wilcoxon (which can't use offsets directly), could I simply do:

Copy <- DGEList(counts)
y$offset <- offsets  
cpms <- edgeR::cpm(y, offset = y$offset, log = FALSE)

To get properly normalized values that account for the Salmon bias corrections?

edgeR • 126 views
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 5 minutes ago
WEHI, Melbourne, Australia

To compute CPM values from edgeR:

y <- DGEList(counts = counts, samples = samples, genes = genes, group = samples$Disease_status)
y <- scaleOffset(y, offset = salmon_offset) 
cpms <- cpm(y)

Needless to say, using edgeR v4 quasi-likelihood would be much faster and better than Wilcoxon tests. I would be skipping the estimateGLMRobustDisp call shown in your code above and instead using:

fit <- glmQLFit(y, design, robust=TRUE)
ftest <- glmQLFTest(fit)
topTags(ftest)

BTW, please ask questions just once. I answered essentially the same question from you 3 hours ago, added as a comment to an old answer.

ADD COMMENT
0
Entering edit mode

Thank you. What is the difference between setting robust=TRUE in glmQLFit and estimateGLMRobustDisp? The paper i mentioned in the post used both:

y <- estimateGLMRobustDisp(y, design) 

fit <- glmQLFit(y, design, robust=TRUE)
ADD REPLY

Login before adding your answer.

Traffic: 701 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6