Hello,
I am a novice for R and bioinfomatics. I am trying to run DESeq2 with my raw count table. I have been trying to follow the beginner's guide for the DESeq2 package, but it is still hard to understand because my experimental condition is different from the example. I tried to run with a kind help of poinsoAlien, but encounter the error. Would you please help me to figure it out? Thank you in advance for your help.
Here is my experiemtal design and the code I tried.
- Treatment: Control vs Treated
- Animal group: each treatment has 4 genotypes in duplicate.
Control (genotype1-1, genotype1-2, genotype2-1, genotype2-2, genotype3-1, genotype3-2, genotype4-1, genotype4-2).
Treated (genotype1-1, genotype1-2, genotype2-1, genotype2-2, genotype3-1, genotype3-2, genotype4-1, genotype4-2).
- Model: Treatment, Animal group, Aniaml * Treatment (interaction of animal and treatment).
I want to run DESeq2 with 2-way ANOVA-like frame work (2 treatment X 4 genotypes).
- Code
> sampleInfo <- read.csv("~/Desktop/KiPBSvsPIC.csv")
> head(sampleInfo)
ID Genotype1.1 Genotype1.2 Genotype2.1 Genotype2.2 Genotype3.1
1 unigene22398972 223736 97095 182659 161499 288360
2 unigene22407675 197279 218530 241506 194318 189604
3 unigene22402019 102051 50008 68617 60005 98462
4 unigene22387256 82289 93196 88090 69212 67281
5 unigene22410725 26403 51824 35820 36544 21938
6 unigene22400817 45626 20860 28781 24550 43365
Genotype3.2 Genotype4.1 Genotype4.2 Genotype1.1_treated Genotype1.2_treated
1 280347 160611 239753 162563 51277
2 158650 184416 173256 173624 185385
3 138085 39193 79512 68575 28035
4 55019 62363 71076 54943 58875
5 22089 33376 21841 31780 27587
6 55484 15369 30763 26308 11168
Genotype2.1_treated Genotype2.2_treated Genotype3.1_treated
1 286391 92844 243987
2 161664 154975 187180
3 93280 30827 88396
4 58179 54870 82084
5 32564 25289 27143
6 40633 13106 34436
Genotype3.2_treated Genotype4.1_treated Genotype4.2_treated
1 314833 120738 105184
2 149823 173155 163640
3 136919 33087 27698
4 55364 99173 80765
5 18275 33866 26191
6 54273 13731 10517
> coldata = data.frame(row.names = c('genotype1-1', 'genotype1-2', 'genotype2-1', 'genotype2-2', 'genotype3-1', 'genotype3-2', 'genotype4-1', 'genotype4-2','genotype1-1_treated', 'genotype1-2_treated', 'genotype2-1_treated', 'genotype2-2_treated', 'genotype3-1_treated', 'genotype3-2_treated', 'genotype4-1_treated', 'genotype4-2_treated'),group = rep(c("gt1","gt2","gt3","gt4"),2,each = 2),treatment = rep(c("control","treated"),each = 8))
> coldata$treatment = factor(x = coldata$treatment,levels = c('treated','control'))
> coldata
group treatment
genotype1-1 gt1 control
genotype1-2 gt1 control
genotype2-1 gt2 control
genotype2-2 gt2 control
genotype3-1 gt3 control
genotype3-2 gt3 control
genotype4-1 gt4 control
genotype4-2 gt4 control
genotype1-1_treated gt1 treated
genotype1-2_treated gt1 treated
genotype2-1_treated gt2 treated
genotype2-2_treated gt2 treated
genotype3-1_treated gt3 treated
genotype3-2_treated gt3 treated
genotype4-1_treated gt4 treated
genotype4-2_treated gt4 treated
> dds = DESeqDataSetFromMatrix(countData = countMatrix, colData = coldata, design = ~ group + treatment + group:treatment)
Error in as.matrix(countData) :
error in evaluating the argument 'x' in selecting a method for function 'as.matrix': Error: object 'countMatrix' not found
Calls: DESeqDataSetFromMatrix -> as.matrix
Thank you guy for your help.
Sorry again, I am in trouble with another error below.
> library(DESeq2)
> library("DESeq2")
> countMatrix <- read.table ("~/Desktop/KiPBSvsPIC.txt", header=TRUE)
> head (countMatrix)
ID Genotype1.1 Genotype1.2 Genotype2.1 Genotype2.2 Genotype3.1
1 unigene22398972 223736 97095 182659 161499 288360
2 unigene22407675 197279 218530 241506 194318 189604
3 unigene22402019 102051 50008 68617 60005 98462
4 unigene22387256 82289 93196 88090 69212 67281
5 unigene22410725 26403 51824 35820 36544 21938
6 unigene22400817 45626 20860 28781 24550 43365
Genotype3.2 Genotype4.1 Genotype4.2 Genotype1.1_treated Genotype1.2_treated
1 280347 160611 239753 162563 51277
2 158650 184416 173256 173624 185385
3 138085 39193 79512 68575 28035
4 55019 62363 71076 54943 58875
5 22089 33376 21841 31780 27587
6 55484 15369 30763 26308 11168
Genotype2.1_treated Genotype2.2_treated Genotype3.1_treated
1 286391 92844 243987
2 161664 154975 187180
3 93280 30827 88396
4 58179 54870 82084
5 32564 25289 27143
6 40633 13106 34436
Genotype3.2_treated Genotype4.1_treated Genotype4.2_treated
1 314833 120738 105184
2 149823 173155 163640
3 136919 33087 27698
4 55364 99173 80765
5 18275 33866 26191
6 54273 13731 10517
> #form coldata
> coldata = data.frame(row.names = c('genotype1-1', 'genotype1-2', 'genotype2-1', 'genotype2-2', 'genotype3-1', 'genotype3-2', 'genotype4-1', 'genotype4-2','genotype1-1_treated', 'genotype1-2_treated', 'genotype2-1_treated', 'genotype2-2_treated', 'genotype3-1_treated', 'genotype3-2_treated', 'genotype4-1_treated', 'genotype4-2_treated'),group = rep(c("gt1","gt2","gt3","gt4"),2,each = 2),treatment = rep(c("control","treated"),each = 8))
> #make factors for treatment column.
> coldata$treatment = factor(x = coldata$treatment,levels = c('control','treated'))
> coldata
group treatment
genotype1-1 gt1 control
genotype1-2 gt1 control
genotype2-1 gt2 control
genotype2-2 gt2 control
genotype3-1 gt3 control
genotype3-2 gt3 control
genotype4-1 gt4 control
genotype4-2 gt4 control
genotype1-1_treated gt1 treated
genotype1-2_treated gt1 treated
genotype2-1_treated gt2 treated
genotype2-2_treated gt2 treated
genotype3-1_treated gt3 treated
genotype3-2_treated gt3 treated
genotype4-1_treated gt4 treated
genotype4-2_treated gt4 treated
> #construct DESeq object from input matrix (countdata)
> dds = DESeqDataSetFromMatrix(countData = countMatrix, colData = coldata, design = ~ group + treatment + group:treatment)
Error in validObject(.Object) :
invalid class “SummarizedExperiment” object: 'colData' nrow differs from 'assays' ncol
Calls: DESeqDataSetFromMatrix ... .local -> new -> initialize -> initialize -> validObject
In addition: Warning message:
In sort(rownames(colData)) == sort(colnames(countData)) :
longer object length is not a multiple of shorter object length
Error in validObject(.Object) :
invalid class “SummarizedExperiment” object: 'colData' nrow differs from 'assays' ncol
Calls: DESeqDataSetFromMatrix ... .local -> new -> initialize -> initialize -> validObject
Look at the error message: 'colData' nrow differs from 'assays' ncol
This could admittedly be written a bit more clearly, but it means: The number of rows ("nrow") of colData differs from the number of columns ("ncol") of the assays matrix (i.e., "countMatrix").
This is because the countMatrix contains one more column, namely the gene IDs in the first column. If you use
the first column in the file (i.e., the gene IDs) will be placed into the row names slot rather than into a separate column.
hi,
If you are just starting R, I have a couple of recommendations:
1) you should pay very careful attention to the text of the messages which are printed, and try to figure out what is wrong on your own through exploration of the variables you have defined.
Some very quick notes on how to explore your workspace are here:
http://www.statmethods.net/interface/workspace.html
http://www.statmethods.net/input/contents.html
2) I'd recommend that you take a free R course. Some good ones are:
http://swirlstats.com/
https://www.coursera.org/course/rprog
Some people can jump into R without any prior experience, usually because they have experience with other programming languages like python or Matlab. But investing a little time to learn the very basics of R will pay off quickly, and you can avoid mistakes.
While jumping straight into a Bioconductor package with no R experience is technically possible, it's just going to be much easier if you spend a little time learning the basics first.
Thank you for your suggestion. I will do it as you recommended.