Question

How can I generate a heatmap and clustering of differentially expressed genes in a RNA-seq data?

0

Entering edit mode

mg.mahabad1365 • 0

@mgmahabad1365-23539

Last seen 4.1 years ago

Hi Dear All, I do my analysis RNA-seq with cufflinks, I have two samples (B and S) with some 1300 differentially expressed genes. I want to create a heat map in the R program by FPKM value. I need an example script and import file to draw heat. head of my data in "exp gene file" expresion in cuffdiff out put:

test_id gene_id gene    locus   sample_1    sample_2    status  value_1 value_2 log2(fold_change)   test_stat   p_value q_value significant

XLOC_000001 XLOC_000001 -   NC_001717.1:1003-16642  B   S   OK  619.381 503.44  -0.299007   -0.10151    0.5686  0.841981    no
XLOC_000002 XLOC_000002 -   NC_001717.1:1003-16642  B   S   OK  33.9561 38.1908 0.169555    0.00317568  0.8953  0.975653    no
XLOC_000003 XLOC_000003 LOC110523613    NC_035077.1:26930-49145 B   S   OK  1.22995 1.66132 0.433728    0.623074    0.2776  0.630311    no
XLOC_000004 XLOC_000004 LOC110523749    NC_035077.1:145369-149364   B   S   OK  1.41429 1.71394 0.277236    0.54581 0.33515 0.681568    no
XLOC_000005 XLOC_000005 LOC110523873    NC_035077.1:177503-190839   B   S   NOTEST  0.12787 0   -inf    0   1   1   no

and I have the "gene file" cuffdiff out put with below head:

tracking_id class_code  nearest_ref_id  gene_id gene_short_name tss_id  locus   length  coverage    B_FPKM  B_conf_lo   B_conf_hi   B_status    S_FPKM  S_conf_lo   S_conf_hi   S_status    
XLOC_000001 -   -   XLOC_000001 -   TSS1    NC_001717.1:1003-16642  -   -   619.381 0   2799.14 OK  503.44  0   1546.13 OK  
XLOC_000002 -   -   XLOC_000002 -   TSS2    NC_001717.1:1003-16642  -   -   33.9561 0   2312.08 OK  38.1908 0   1232.13 OK  
XLOC_000003 -   -   XLOC_000003 LOC110523613    TSS3    NC_035077.1:26930-49145 -   -   1.22995 0.339538    2.12037 OK  1.66132 0.601249    2.72139 OK

I do not know how to use this data and files to create a heat map?!! please send me R script and example import file. Similar to the following article: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6858811/

With regards

Heat Map. FPKM. p_value. FDR • 1.9k views

ADD COMMENT • link 4.4 years ago mg.mahabad1365 • 0

0

Entering edit mode

Thank you Dr.Kevin But I have two examples (disease and control )(I have two groups=samples) that each group includes three repetitions and I analyzed with the Linux and cuffdiff procedure, I just want to draw the heat map according to according to information.

ADD REPLY • link 4.4 years ago mg.mahabad1365 • 0

0

Entering edit mode

Hi, then you need to extract the per-sample expression levels from your other Cufflinks output files.

ADD REPLY • link 4.4 years ago Kevin Blighe ★ 4.0k

score 0 · Answer 1 · 2020-05-21

Hi,

This does not appear to relate to any particular Bioconductor package...? Also, the old TopHat/ Cuffdiff pipeline has been superseded by HISAT2 / StringTie, which also has easier integration with Bioconductor via a DESeq2 and EdgeR Python export script: Using StringTie with DESeq2 and edgeR.

With only 2 samples in your study, the data is not great and you would struggle to publish this work, I think. It is neither suitable for clustering, but you could probably still generate a heatmap by disabling clustering on samples / columns. Unfortunately, FPKM expression units are neither suitable for any type of differential expression analysis. If this is just for training purposes, then that is okay.

First, you need to extract the FPKM values from your data, and likely subset this for statistically significant genes - you can use the exp gene file file. So, if exp gene file is stored in an object called res:

res_significant <- subset(res, significant == 'yes')

Then, extract out the FPKM values:

res_significant <- data.matrix(
  data.frame(
    res_significant[,c('value_1','value_2')],
    row.names = as.character(res_significant$gene_id)))

After that, you can use any heatmap function, including:

gplots::heatmap.2
ComplexHeatmap::Heatmap
pheatmap::pheatmap

I provide some code for FPKM data on Biostars: Question: Heatmap based with FPKM values

Here also is a quick and very simple example where I disable sample / column clustering:

mat <- data.matrix(data.frame(
  S1 = c(1,2,3,4),
  S2 = c(3,4,5,6),
  row.names = c('a','b','c','d')))
gplots::heatmap.2(mat, Colv = FALSE, dendrogram = 'row', keysize = 1.0)

Kevin