Question

DEseq2 results

0

Entering edit mode

MOHAMMAD • 0

@MOHAMMAD-24781

Last seen 3.6 years ago

I am a beginner in R I did Differential expressed gene analysis and I have two questions:

1- depending on the graphs is the differentially expressed genes list reliable or not For further analysis?

I am disappointed because I didn't get the segregation of the samples in the heatmap and PCA1 and PCA2 are low

enter image description here

I noticed that the sample S1 varies from other samples resulting in low PCA1 but also different from the 10 controls. is there any way to handle it?!

2- when I am exporting the list of DEGs I get (some genes appears many times) For example, if the gene id is df3t_00100 I got records as following:

df3t_00100

df3t_00100.1

df3t_00100.2

df3t_00100.3

what are those and how can I handle them?

Thank you in advance!

heatmaps DEGs PCA • 1.6k views

ADD COMMENT • link updated 3.7 years ago by Kevin Blighe ★ 4.0k • written 3.7 years ago by MOHAMMAD • 0

score 2 · Accepted Answer · 2021-03-01

2

Entering edit mode

Kevin Blighe ★ 4.0k

@kevin

Last seen 9 weeks ago

Republic of Ireland

Hi Mohammad,

Regarding the PCA bi-plot, I see no major issue, assuming that you have generated this PCA bi-plot in an unbiased ('unsupervised') way using all genes. Can you share the code that you used? Your 2 groups (Control + Sample) are almost exclusively segregated along PC1. The sample at the bottom-right is behaving differently, but it is still not grouping with Control.

Then, in your second figure generated with pheatmap(), it seems that —yes— your groups are segregated perfectly via hierarchical clustering, and the heatmap colour shade also indicates this.

Regarding the gene naming issue, which species is this? Can you confirm how the read count quantification was performed and with which reference GTF? Generally, to help, please explain your broader analysis pipeline so that we can begin to try to solve this.

Kevin

ADD COMMENT • link 3.7 years ago Kevin Blighe ★ 4.0k

0

Entering edit mode

PCA code:

vsd <- vst(dds, blind = T) # Varaiance Stabilizing transformation

plotPCA(vsd, intgroup = "C.S")

2- the organism is plasmodium falicparium

design(dds) <- ~ C.S

dds <- DESeq(dds)

res <-results(dds)

summary(res)

resSort <- res[order(res$padj),]

library("org.Pf.plasmo.db")

geneinfo <- select(org.Pf.plasmo.db, keys=rownames(resSort)[1:20], columns=c("SYMBOL","GENENAME","GO"), keytype="SYMBOL")

geneinfo

gene info returns some repeated genes and some with decimal:

enter image description here

ADD REPLY • link 3.7 years ago MOHAMMAD • 0

1

Entering edit mode

Thanks, you are evidently not following the typical DESeq2 analysis pipeline - you are missing the lfcShrink() stage. Please take a look at the Quick start.

Are you showing all of the output of geneinfo? There seems to be at least 2 columns missing.

ADD REPLY • link 3.7 years ago Kevin Blighe ★ 4.0k

0

Entering edit mode

Thank you,

can you kindly where should I use ilfcShrink() stage

the geneinfo output is ok its just cut to show gene_id

ADD REPLY • link 3.7 years ago MOHAMMAD • 0

1

Entering edit mode

Hi, regarding lfcShrink, the information is in the Quick start (please see my other comment)

ADD REPLY • link 3.7 years ago Kevin Blighe ★ 4.0k