I am analyzing FPKM RNA-seq data downloaded from DGC database. Now trying to identify genes that differently expressed between sanmples by conducting DESeqDataSetFromHTSeqCount. But I got an error saying:
Error in DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory, : Gene IDs (first column) differ between files.
And I don`t know hot to figure it out and need suggestions. Here is the commands I have done.
`
setwd("~/Download/GC/FPKMs") directory <- "~/Download/GC/FPKMs" library(DESeq2) sampletable <- data.frame(sampleName = samplesheet$path, fileName = samplesheet$path, condition=samplesheet$label) sampletable
sampleName fileName condition
1 ECsample1.txt ECsample1.txt 1 2 ECsample2.txt ECsample2.txt 0 3 ECsample3.txt ECsample3.txt 1
[ reached 'max' / getOption("max.print") -- omitted 249 rows ]
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampletable, directory = directory, design= ~ condition)
Error in DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory, : Gene IDs (first column) differ between files.
`
The content of each sample file is like:
ENSG00000242268.2 0.0 ENSG00000270112.3 0.00258202876781 ENSG00000167578.15 3.30893315419 ENSG00000273842.1 0.0 ENSG00000078237.5 8.05933601781 ENSG00000146083.10 15.0446810186 ENSG00000225275.4 0.0 ENSG00000158486.12 0.221675972087
Could you give me an advice what kind of data and format I need? Thank you!
Thank you for the answer. I understand raw data should be provided to DESeq. But I am still wondering this would not be the reason for the error.
What DESqp is trying to refer gene_id? In the sample case, I can find GFF file which contains gene id and its transcripts. Also a csv file integrating all sample reads data with geneid, aside from read txt file of each sample.