Co-expression between host and pathogen genes
2
0
Entering edit mode
jhj89 ▴ 10
@jhj89-9623
Last seen 7.5 years ago

Hi,

I have this host and pathogen RNA-seq data generated from the same individuals and I have performed DGEA on both data to determine the DE genes between two disease groups. With this, I would like to find out which host DE genes are correlated with which pathogen DE genes. I thought a simple pearson correlation would suffice, but I would like to adjust for several covariates of interest (ex. age).

I'm thinking a straightforward way is to do as follows:

design <- model.matrix(~1+pathogen_gene_exp+Age)

disp <- estimateDisp(y, design, robust = TRUE)

fit <- glmFit(disp, design, robust = TRUE)

lrt <- glmLRT(fit, coef = 2)

topTags(lrt)

This would compare the expression of one pathogen gene against all host genes. However, I can't think of a good way to represent the expression of pathogen gene - under edgeR, would it be acceptable in this scenario to have the expression of pathogen gene to be logCPM? Or is there another (better/more straightforward) way to perform co-expression analysis with adjustment for covariates? Thank you.

edger rna-seq • 1.7k views
ADD COMMENT
2
Entering edit mode
chris86 ▴ 420
@chris86-8408
Last seen 5.0 years ago
UCL, United Kingdom

hi, im not entirely sure what your trying to do with this code, but you say 'I thought a simple pearson correlation would suffice, but I would like to adjust for several covariates of interest (ex. age).'. You can do a partial correlation in R and just loop over the data set, so you can adjust for continuous covariates e.g. I do that sometimes.

#m[z,2] <- abs(pcor(c("gene", "DAS28.6M", "DAS28.0M"), var(df))) # partial correlation of gene expression with my clinical variable, adjusted for another clinical variable

You can do this to look at co expression with adjustment. You need to make a data frame first for this code.

ADD COMMENT
0
Entering edit mode

Thank you for your answer. I tried your approach and it works quite well.

ADD REPLY
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 16 minutes ago
The city by the bay

Log-CPMs seem fine to me. edgeR uses a log-link function to fit each GLM, so you're effectively correlating the log-CPM of your pathogen gene to the log-(library size-adjusted-)count of each of your host genes. The log-transformations cancel out, so you just get a linear correlation between the pathogen and host expression. The estimated value of coefficient for the pathogen expression covariate represents the gradient of the log-log line; larger values correspond to a greater effect size. If you want to be more sophisticated, you can play around with splines by using the pathogen expression as a covariate; this will provide a more flexible fit in case the pathogen-host expression association is not linear, but at the cost of reducing the interpretability of your model.

ADD COMMENT
0
Entering edit mode

Thank you Aaron for your answer. I have a slight off-topic question but it is something related to what I have above. I am wondering whether it would be possible to extract logCPM from edgeR that takes into account the covariates such as Age. For example, if I were to use WGCNA package, it requires log-transformed expression values. Would this be doable? Thank you again.

ADD REPLY
1
Entering edit mode

Perhaps try something like this:

logCPMs <- cpm(y, prior.count=3, log=TRUE)
subdesign <- model.matrix(~pathogen_gene_exp)
logCPMs <- removeBatchEffect(logCPMs, design=subdesign, covariates=Age)

This will remove the effect of the age covariate from the log-CPMs.

ADD REPLY
0
Entering edit mode

Thank you. Just one more question: if there are multiple covariates that I want to adjust for, and if I have them in a matrix (ex. multiple_covariates), would the following work in the same way as having just one covariate (ex. Age)?

logCPMs <- removeBatchEffect(logCPMs, design=subdesign, covariates=multiple_covariates)
ADD REPLY
0
Entering edit mode

Yes. If you look at ?removeBatchEffect, you'll see that you can supply a matrix as covariates.

ADD REPLY

Login before adding your answer.

Traffic: 525 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6