Question about RNAseq analysis in EdgeR to identify common and donor specific differentially expressed genes
2
0
Entering edit mode
@mohammedtoufiq91-17679
Last seen 6 weeks ago
United States

Hi,

This question is about RNAseq analysis in EdgeR to identify common and specific differentially expressed genes. I have 3 different Donor's in-vitro cultured tissue with two different infection status, one with infected virus (high dose 6hr) and another with un-infected (baseline 0hr). This was sequenced using RNAseq, then aligned and quantified. I now have a gene counts file ready to import into EdgeR RNAseq analysis pipeline. Using this data, I am interested in what are the common vs specific DEGs in response to virus per donor?

  1. Identifying common DEGs in response to virus between infected vs. un-infected samples from 3 different donors?
  2. Following, this perform donor specific analysis or response to virus. To pull out specific response per donor?

Sample info

#>   Samples Donor Time     Status
#> 1      S1    D1  0hr Uninfected
#> 2      S2    D1  6hr   Infected
#> 3      S3    D2  0hr Uninfected
#> 4      S4    D2  6hr   Infected
#> 5      S5    D3  0hr Uninfected
#> 6      S6    D3  6hr   Infected

1. Identifying common DEGs in response to virus between infected vs. un-infected samples from 3 different donors?


library(edgeR)
group.Status <-  factor(Sample_info$Status)
y <- DGEList(counts = gene_counts, group = group.Status, remove.zeros = TRUE)
keep <- filterByExpr(y)
y <- y[keep, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y, method = "TMM")
cpm_with_log2 = cpm(y, prior.count=2, log=TRUE)


## Create design file

Donor_ID <- factor(Sample_info$Donor)
Status <- factor(Sample_info$Status, levels=c("Uninfected", "Infected"))
design <- model.matrix(~Donor_ID+Status)


## Dispersion estimation
y <- estimateDisp(y,design, robust=TRUE) 


# Fit the model
fit <- glmQLFit(y,design, robust = TRUE)


## To detect genes that are differentially expressed in Infected vs Uninfected:
qlf.Infected_vs_Uninfected <- glmQLFTest(fit, coef=4)
topTags(qlf.Infected_vs_Uninfected, n=10, adjust.method = "BH", sort.by = "PValue", p.value = 1)

2. Following, this perform donor specific analysis or response to virus. To pull out specific response per donor?

To perform donor specific analysis, is there a way to extract DEGs specific to donor from the above glmQLFTest?

OR, probably, just change the design formula to: design <- model.matrix(~0 + Donor_ID + Status + Donor_ID:Status)

Best Regards,

Toufiq

R edgeR RNASeq • 1.7k views
ADD COMMENT
2
Entering edit mode
@gordon-smyth
Last seen 10 hours ago
WEHI, Melbourne, Australia

Trying to determine donor-specific DE genes is a non-standard thing to do in terms of statistical testing, but you can use a method similar to that we used for the Oral squamous cell carcinoma data in our paper McCarthy et al (2012).

Starting with the code you have already have, you fit a new model with donor-specific effects:

design.donorspecific <- model.matrix(~Donor_ID + Donor_ID:Status)
fit.donorspecific <- glmFit(y, design.donorspecific, dispersion=y$trended.dispersion)

The donor-specific model has no residual df, but the code will resuse the trended dispersions you estimated previously from the non-donor specific model to achieve a approximate test. Now you can test for DE genes for each donor individually. To get DE genes for donor D1:

lrt.D1 <- glmLRT(fit.donorspecific, coef="Donor_IDD1:StatusUninfected")
topTags(lrt.D1)

Note that you must use glmFit and glmLRT rather than glmQLFit and glmQLFTest for this approximate method to work.

Despite the small sample numbers, the donor-specific test should be conservative rather than liberal on average, because it uses the dispersion estimated from the donor by status interaction in place of a donor-specific repeated measures dispersion, which would almost certainly be smaller.

This is an application of Method 3 from Section 2.12 of the edgeR User's Guide.

ADD COMMENT
0
Entering edit mode

Gordon Smyth thank you very much for the inputs and suggestions. This approach seems to be very useful and address the questions that we are looking at the moment.

ADD REPLY
0
Entering edit mode
swbarnes2 ★ 1.4k
@swbarnes2-14086
Last seen 7 hours ago
San Diego

You have 6 samples total?

You don't have enough power to do anything complex. Compare infected to uninfected. That's it.

ADD COMMENT
0
Entering edit mode

swbarnes2 Thank you for the response. Yes, I have 6 samples in total. In addition to the Infected vs. Uninfected comparison, I also wanted to try specific donor's response. I see a method in edgeR 2.12 What to do if you have no replicates. Does this make sense to use?

ADD REPLY
0
Entering edit mode

I understand what you want, but you have the bare minimum number of samples to do the simple comparison. I don't think you can do more than that.

ADD REPLY

Login before adding your answer.

Traffic: 802 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6