Question

How to use DESeqDataSetFromHTSeqCount

0

Entering edit mode

biok0423 ▴ 20

@biok0423-23341

Last seen 4.6 years ago

I am analyzing FPKM RNA-seq data downloaded from DGC database. Now trying to identify genes that differently expressed between sanmples by conducting DESeqDataSetFromHTSeqCount. But I got an error saying:

Error in DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory, : Gene IDs (first column) differ between files.

And I don`t know hot to figure it out and need suggestions. Here is the commands I have done.

`

setwd("~/Download/GC/FPKMs") directory <- "~/Download/GC/FPKMs" library(DESeq2) sampletable <- data.frame(sampleName = samplesheet$path, fileName = samplesheet$path, condition=samplesheet$label) sampletable

       sampleName          fileName condition

1 ECsample1.txt ECsample1.txt 1 2 ECsample2.txt ECsample2.txt 0 3 ECsample3.txt ECsample3.txt 1

[ reached 'max' / getOption("max.print") -- omitted 249 rows ]

ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampletable, directory = directory, design= ~ condition)

Error in DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory, : Gene IDs (first column) differ between files.

`

The content of each sample file is like:

ENSG00000242268.2 0.0 ENSG00000270112.3 0.00258202876781 ENSG00000167578.15 3.30893315419 ENSG00000273842.1 0.0 ENSG00000078237.5 8.05933601781 ENSG00000146083.10 15.0446810186 ENSG00000225275.4 0.0 ENSG00000158486.12 0.221675972087

Could you give me an advice what kind of data and format I need? Thank you!

deseq2 • 3.2k views

ADD COMMENT • link updated 4.6 years ago by swbarnes2 ★ 1.4k • written 4.6 years ago by biok0423 ▴ 20

score 0 · Answer 1 · 2020-04-15

0

Entering edit mode

swbarnes2 ★ 1.4k

@swbarnes2-14086

Last seen 8 hours ago

San Diego

I am analyzing FPKM RNA-seq data

This never ends well. FPKM is unsuitable for DESeq. It wants raw counts only. If you lie and pretend that's what you have when it's not, you can't trust your results.

ADD COMMENT • link 4.6 years ago swbarnes2 ★ 1.4k

0

Entering edit mode

Thank you for the answer. I understand raw data should be provided to DESeq. But I am still wondering this would not be the reason for the error.

What DESqp is trying to refer gene_id? In the sample case, I can find GFF file which contains gene id and its transcripts. Also a csv file integrating all sample reads data with geneid, aside from read txt file of each sample.

ADD REPLY • link 4.6 years ago biok0423 ▴ 20