Generate Heatmap from Gene Names after DESEQ2
1
0
Entering edit mode
uhlkatie • 0
@uhlkatie-20770
Last seen 2.3 years ago
United States

Enter the body of text here

Code should be placed in three backticks as shown below


# include your problematic code here with any corresponding output 
# please also include the results of running the following in an R session 

sessionInfo( )

Hello,

I have a tentative grasp on DESEQ2 I am trying to generate a heatmap (and other DEG visualization tools) using my output from DESEQ2. I followed the tutorial (http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#why-un-normalized-counts) starting with salmon files and then using tximeta to create a summarized experiment. I performed DESEQ2 on my summarized experiment:

se <- tximeta(coldata)
library(DESeq2)
ddsTxi <- DESeqDataSet(se, design = ~ Condition)
ddsTxi

The object ddsTxi looks something like this:

Sample1     Sample2    Sample3    Sample4     Sample5
ENST00000456328.2        1.850       1.333      3.845      0.000       6.181
ENST00000450305.2        0.000       0.000      0.000      0.000       0.000
ENST00000488147.1       84.946     140.966    117.379    142.208      71.022

And then DESEQ2:

dds <- DESeq(dds)
res <- results(dds)
res
mcols(res, use.names = TRUE)

Which looks something like this:

                     baseMean log2FoldChange     lfcSE        stat    pvalue
                    <numeric>      <numeric> <numeric>   <numeric> <numeric>
ENST00000456328.2   13.750573     -0.2570229  1.268077 -0.20268715  0.839380
ENST00000488147.1  119.249079      0.2125775  0.305591  0.69562687  0.486663
ENST00000473358.1    0.287952     -0.0662034  7.678081 -0.00862239  0.993120

And I know that from there I can perform comparisons between any of my variables/conditions. My first issue is that I would really like the transcript IDs to be replaced by the Gene names. I've used Biomart before to convert the transcript ID to ensmbl gene names in the past. But when I do this, I end up with multiple transcripts for the same gene. I will look at the fold changes for Gene A, only to find that there are three different entries. Is there any way to generate results tables that take this into account and only have one entry per gene? I need to generate heatmaps for the different experimental comparisons, but for that I would also need a gene counts table with one entry per gene. I will take any advice on how to do the conversion, download a gene counts table, or any good packages for generating heatmaps with DESEQ2 data.

Thank you!

DESeq2 DESEQ2 tximeta SummarizedExperiment heatmaps • 1.7k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 5 days ago
United States

Have you read the tximeta vignette, particularly the part about summarizing to gene level and changing the identifiers?

ADD COMMENT
0
Entering edit mode

I just looked through the vignette and that helped! I was able to follow it through to create the "gse" object to use as input for DESEQ2. My question now is if there is a way to save that object, (which I'm presuming are the gene-level counts for each sample), as a data frame, matrix, or anything I can use outside of R? I tried the write.csv command and got an error message saying, "Error in as.vector(x) : no method for coercing this S4 class to a vector".

ADD REPLY
0
Entering edit mode

The object you wish to save acts like a data.frame in order to shield end users from the complications of trying to consistently manipulate several different things at once. The 'dds' (gse?) object is actually comprised of multiple objects; one or more matrices of the underlying count (and possible modifications of the counts) data, a DataFrame that describes the samples, and a GRanges object that describes the rows. The show method gives you a peek at all the data in an object.

> fake.o <- makeExampleDESeqDataSet()
> fake.o
class: DESeqDataSet 
dim: 1000 12 
metadata(1): version
assays(1): counts
rownames(1000): gene1 gene2 ... gene999 gene1000
rowData names(3): trueIntercept trueBeta trueDisp
colnames(12): sample1 sample2 ... sample11 sample12
colData names(1): condition

And you can inspect each underlying object using the relevant accessor methods.

> colData(fake.o)
DataFrame with 12 rows and 1 column
         condition
          <factor>
sample1          A
sample2          A
sample3          A
sample4          A
sample5          A
...            ...
sample8          B
sample9          B
sample10         B
sample11         B
sample12         B
> rowData(fake.o)
DataFrame with 1000 rows and 3 columns
         trueIntercept  trueBeta  trueDisp
             <numeric> <numeric> <numeric>
gene1        0.0455954         0  3.975559
gene2        2.0951366         0  1.036184
gene3        2.3462632         0  0.886619
gene4        3.8920042         0  0.369433
gene5        6.0617779         0  0.159880
...                ...       ...       ...
gene996        2.51762         0  0.798522
gene997        5.45743         0  0.191035
gene998        6.19437         0  0.154622
gene999        1.26165         0  1.768266
gene1000       2.49833         0  0.807924
> rowRanges(fake.o)
GRanges object with 1000 ranges and 3 metadata columns:
           seqnames       ranges strand | trueIntercept  trueBeta  trueDisp
              <Rle>    <IRanges>  <Rle> |     <numeric> <numeric> <numeric>
     gene1        1        1-100      * |     0.0455954         0  3.975559
     gene2        1      101-200      * |     2.0951366         0  1.036184
     gene3        1      201-300      * |     2.3462632         0  0.886619
     gene4        1      301-400      * |     3.8920042         0  0.369433
     gene5        1      401-500      * |     6.0617779         0  0.159880
       ...      ...          ...    ... .           ...       ...       ...
   gene996        1  99501-99600      * |       2.51762         0  0.798522
   gene997        1  99601-99700      * |       5.45743         0  0.191035
   gene998        1  99701-99800      * |       6.19437         0  0.154622
   gene999        1  99801-99900      * |       1.26165         0  1.768266
  gene1000        1 99901-100000      * |       2.49833         0  0.807924
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> dim(assay(fake.o))
[1] 1000   12

> head(assay(fake.o))
      sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9
gene1       0       2       0       0       0       0       0       0       1
gene2       5       2      12       1      10       0       2       3       4
gene3       2      18       5       7       0       3       3       2       1
gene4       2       5      10      17       9       8       8      14      27
gene5      39      30      40      68      55     104      32      19      51
gene6       5       7       0       1       1       3       0       4       0
      sample10 sample11 sample12
gene1        0        2        0
gene2        1        5        0
gene3       11        1        4
gene4       18       29        5
gene5       59       61      109
gene6        0        0       14
>

And after running DESeq on that object, there are even more things in the resulting DESeqDataSet object. So 'writing to csv' isn't such a simple thing, and I would argue is unnecessary. To what end? All these objects are intended to be very complex, but simple to use, and there are any number of accessors that you can use to make plots or whatever. I can't figure out what you could do outside of R that wouldn't be easier to do from within R.

But anyway, you can extract anything you like from the object and then use write.csv or write.table directly.

ADD REPLY

Login before adding your answer.

Traffic: 456 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6