I would like to identify deferentially expressed genes.
I have 50 different individuals in three biological replicates (50 x 3=150) and they are not challenged for any condition (for e.g. control vs wild). I mean there is no treated vs non-treated.
I would like to perform an ANOVA. Could you please exemplify using some example code in edgeR?
Differential expression... between what? If you have no conditions, then the only contrast you can do is between individuals. This would only be sensible if you're looking for highly variable genes across individuals. Identifying DE genes between specific individuals seems pointless as this cannot be generalized to the population.
Anyway, to run an ANOVA-like comparison in edgeR, you should set up your design matrix like this:
And then run estimateDisp, glmFit and glmLRT (or their quasi-likelihood equivalents). For glmLRT, you need to drop all non-intercept coefficients via the coef argument:
res <- glmLRT(fit, coef=2:ncol(design))
... which will test against the null hypothesis that all individuals have the same expression.
Aaron already provided a suggestion for how to do it in his answer. If that suggestion didn't work, can you explain why it's not what you're looking for?
The line individuals<-factor(y$counts) makes no sense. I suggest you follow my original instructions. Even your subsetting is wrong; you've picked libraries 2 to 13, so you're either just using one replicate from each individual, or you're using all replicates from only 4 individuals. If you want to test it, then you should use all replicates from 12 individuals and use individuals <- factor(rep(1:12, each=3)).
Dear Aaron,
I am getting an error while generating Smear (MA) plot in edgeR. Is there
any way that I can generate a smear plot using all the individual and
highlight differentially expressed genes?
Thanks
*I used following Codes:*
individuals <- factor(rep(1:50, each=3))
design <- model.matrix(~individuals)
#design <- model.matrix(~0+individuals)
head(design)
#estimateDisp
y<-estimateGLMCommonDisp(y,design)
y<-estimateGLMTrendedDisp(y,design)
y<-estimateGLMTagwiseDisp(y,design)
res <- glmLRT(fit, coef=2:ncol(design))
top<-topTags(res,n=15)
tt = topTags(res, n=nrow(y))
head(tt$table)
nc = cpm(y, normalized.lib.sizes=TRUE)
rn = rownames(tt$table)
rn
deg = rn[tt$table$FDR < .05]
*plotSmear(res, de.tags=deg)*
Error in plotSmear(res, de.tags = deg) :
table$logFC slot in DGELRT object is NULL. We cannot produce an MA
(smear) plot if more than one coefficient from the GLM is being tested in
the likelihood ratio test as this results in more one logFC value per
gene---one for each coefficient.
*plotSmear(res$table$logFC.individuals2, de.tags=deg)*
Error in plotSmear(res$table$logFC.individuals2, de.tags = deg) :
Currently only supports DGEList/DGELRT/DGEExact objects as the object
argument.
On Thu, Feb 18, 2016 at 6:36 PM, myprogramming2016 [bioc] <
noreply@bioconductor.org> wrote:
> Activity on a post you are following on support.bioconductor.org
>
> User myprogramming2016 <https: support.bioconductor.org="" u="" 9741=""/> wrote Comment:
> ANOVA using edgeR <https: support.bioconductor.org="" p="" 78480="" #78560="">:
>
> Sorry Aaron! There was an error in subsetting. I have revised the code and
> its working fine now.
>
> Thanks for your help.
>
>
>
>
>
>
>
>
>
>
>
>
>
> ------------------------------
>
> Post tags: edger
>
> You may reply via email or visit
> C: ANOVA using edgeR
>
No, there isn't. The error message is fairly self-explanatory; the ANOVA yields multiple log-fold changes for each gene, so how on earth would you fit that into a single plot?
Is there any way to show differentially expressed genes?
Thanks
On Sat, Feb 20, 2016 at 11:59 AM, Aaron Lun [bioc] <noreply@bioconductor.org> wrote:
> Activity on a post you are following on support.bioconductor.org
>
> User Aaron Lun <https: support.bioconductor.org="" u="" 6732=""/> wrote Comment:
> ANOVA using edgeR <https: support.bioconductor.org="" p="" 78480="" #78612="">:
>
> No, there isn't. The error message is fairly self-explanatory; the ANOVA
> yields multiple log-fold changes for each gene, so how on earth would you
> fit that into a single plot?
>
> ------------------------------
>
> Post tags: edger
>
> You may reply via email or visit
> C: ANOVA using edgeR
>
With an ANOVA? Not unless you want to look at 50 plots (one showing the log-fold change for each individual against the rest). Even then, it wouldn't help; the log-fold change may only be large in one or two individuals/plots, but you'd end up marking them as DE in all plots because the ANOVA doesn't specify between which individuals the DE occurs.
No. edgeR is built to identify DE genes, not highly variable genes. If you want to identify HVGs across individuals, I would suggest fitting an intercept-only model (i.e., all individuals treated as replicates, probably after summing all libraries for each individual together so you get one sample per individual) and then ranking the genes on their tagwise dispersions after running estimateDisp. This will give you an idea of the top HVGs; it doesn't really make sense to ask whether they're significantly highly variable, because what amount of variability would you expect under the null hypothesis anyway?
Looking at "the genes expressed in each individual" doesn't make any sense. I'm sure that you'll find plenty of genes that are expressed, like actin or GAPDH or something equally boring. I can only presume you want to find genes that are unique to each individual, though even this seems pointless. Who cares about what happens in a particular individual, if you can't generalize it to/replicate the result in other individuals?
P.S. please use the "Add comment" field instead of the "Add answer" field to respond, otherwise the thread will be broken.
Hm... well, if you can re-use particular individuals for new experiments, perhaps individual-specific inferences might be useful. Still seems a bit iffy, though. Anyway, if you want to identify genes that are unique to an individual, there's a couple of things you can do. The least specific approach would be:
... which would identify genes that are up/down-regulated in plant 1 compared to the average of all other plants. This should give you a fair number of DE genes, but many of these may not be specific to plant 1 in the strict sense - for example, if a gene is expressed in both plant 1 and 50, you might still detect it as being DE if all other plants have lower expression.
The most specific approach would be something like this:
... which will identify genes that are DE in the same direction between plant 1 and each other plant. This will probably give no genes, as it's very conservative - lack of evidence for DE between any two plants will cause the gene to be discarded.
A compromise might be to do an ANOVA:
res <- glmLRT(fit, contrast=con) # using the same contrast above.
current.fdr <- p.adjust(res$table$PValue, method="BH")
all.signs <- sign(res$table[,grep("logFC", colnames(res$table)),drop=FALSE])
same.sign <- rowSums(all.signs[,1]!=all.signs)==0L
keep <- which(current.fdr <= 0.05 & same.sign)
... which will identify all genes that have any DE between plant 1 and any other plant, and where plant 1's expression is either higher or lower than that of all other plants.
I should add that you'll have to set up different contrasts for each plant. This can be done without re-writing the long makeContrasts call every time. For the first approach, you can just do:
con <- rep(-1/49, 50)
con[x] <- 1 # where 'x' is the plant of interest.
... while for the second and third approaches, you can do:
Do I need to change this code (con[x,] <- 1) and run for each plant separately. That means, I will have to run it 50 times.
Secondly, How do I get an estimate of how many genes expressed in each plants?
In the end, I would like to calculate common genes across all the plants.
Second question, that's generally not a sensible thing to ask, as it depends on what you consider "expressed". 1 read? 2 reads? 10 reads? How much of that is due to mapping errors, or leaky transcription, or contamination from something else? If you want to find non-DE genes, you can do an ANOVA like before, and just pick the genes with large p-values.
This doesn't seem related to the ANOVA question. Please keep the discussion relevant to the original question, or start a new post. In any case, it seems that Wei has provided you with some useful links in his answer (https://support.bioconductor.org/p/78611/#78641). If it doesn't help, respond to his answer directly.
Thanks Aaron.
I would like to look at the differential expression among 50 different individuals. Do you have any suggestion?
Aaron already provided a suggestion for how to do it in his answer. If that suggestion didn't work, can you explain why it's not what you're looking for?
Hi Aaron and Ryan,
I am getting an error while designing a model (design <- model.matrix(~individuals). This code is not working.
Could you please go through the following code?
Thanks for your help.
R code:
The line
individuals<-factor(y$counts)
makes no sense. I suggest you follow my original instructions. Even your subsetting is wrong; you've picked libraries 2 to 13, so you're either just using one replicate from each individual, or you're using all replicates from only 4 individuals. If you want to test it, then you should use all replicates from 12 individuals and useindividuals <- factor(rep(1:12, each=3))
.Sorry Aaron! There was an error in subsetting. I have revised the code and its working fine now.
Thanks for your help.
No, there isn't. The error message is fairly self-explanatory; the ANOVA yields multiple log-fold changes for each gene, so how on earth would you fit that into a single plot?
With an ANOVA? Not unless you want to look at 50 plots (one showing the log-fold change for each individual against the rest). Even then, it wouldn't help; the log-fold change may only be large in one or two individuals/plots, but you'd end up marking them as DE in all plots because the ANOVA doesn't specify between which individuals the DE occurs.