Hi everyone,
I have several RNAseq datasets. I have one control and one treated sample and each sample have 3 replicates(6 datasets in total). I already got the count data for each gene using Kallisto. Can I use the count table from Kallisto as an input for DESeq2 to get the normalized counts?
Thanks,
Yong
Hi Michael,
I'm working on B. subtilis 168.
First, I downloaded all the ORF from one B. subtilis website and used that to build a index for Kallisto.
After Kallisto quantification is done (for all 6 samples). I generated the count matrix for R analysis, so I extracted all the counts data from 6 independent files to generate a new txt file in the script. In the end, I ran the script and got the estimated count file for all 6 samples.
Thanks,
Yong
I moved this to a comment rather than an "answer" which should be an answer to the original question.
I was confused because I didn't know if you were dealing with splicing (multiple isoforms per gene) or not. If you want to perform analysis on the same level as you did the quantification, yes you can just pass the estimated count matrix to DESeq2 using the DESeqDataSetFromMatrix function.
Regarding how to read in the column data (colData), I'd suggest you make an Excel sheet (and then export to CSV) or a CSV file in a text editor which contains all the information and read that in with read.csv() or the like. This way you can avoid mistakes which can happen using rep().
Hi Michael,
Sorry to bother you again. I generated a .csv file for my dataset. I typed the following codes in R.
> countMatrix=read.csv("C:/Users/kingon001/Desktop/counts.csv",header=TRUE,row.names=1)
> head(countMatrix)
WT6A WT6B WT6C KO6A KO6B KO6C
dnaA 6594 7412 3401 9067 7344 3803
dnaN 4274 5048 3680 7368 7300 6571
yaaA 712 678 531 813 591 433
recF 11282 10414 6007 9485 8185 4954
yaaB 1246 1074 448 898 787 291
gyrB 22499 21507 6943 17273 14169 5334
> coldata=data.frame(row.names=c('WT6A', 'WT6B', 'WT6C','KO6A','KO6B','KO6C'),group=rep(c("WT6","KO6"),3,each=3),treatment=rep(c("control","treated"),each=3))
Error in data.frame(row.names = c("WT6A", "WT6B", "WT6C", "KO6A", "KO6B", :
row names supplied are of the wrong length.
Is this error related to the column order. I set the 1st column as row names. How can I avoid this error?
Yong
I'm suggesting that you create a CSV for the phenotypic data: which samples are in which group and which treatment, etc. This will make things easier for you and help to avoid sample labelling mistakes which can cause big errors downstream.
Thanks!
Yong