There are two genotypes 216 and 218
Three development stages 5 WEEK (5W), 7W, 9W.
Three tissue: Ca, Co, Pa
each with 2 biological replicate.
only 218_5W_Ca has one replicate:
My aim is to do pairwise comparison and to extract upregulated genes in Ca tissue, initially at specific development stage with in a genotypes and than comapre with other genotypes.
> countMatrix = read.table("count.row.txt",header=T,sep='\t',check.names=F)
> dim(countMatrix)
[1] 57894 34
Now I am not sure how to construct a DESeqDataSet: how to make colData and design formula?
dds <- DESeqDataSetFromMatrix(countData = countMatrix, colData = colData, design = ~ condition)
I tried to make colData by using below command:
- > colData <- data.frame(genotypes = c('216','216','216','216','216','216','216','216','216','216','216','216','216','216','216','216','218','218','218','218','218','218','218','218','218','218','218','218','218','218','218','218','218'), development_stage = c('5W','5W','5W','5W','5W','5W','7W','7W','7W','7W','9W','9W','9W','9W','9W','9W','5W','5W','5W','5W','5W','7W','7W','7W','7W','7W''7W','9W','9W',9W','9W','9W','9W','9W','9W'),Tissue_type = c('Ca','Ca','Co','Co','Pa','Pa','Ca','Ca','Pa','Pa','Ca','Ca','Co','Co','Pa','Pa','Ca','Co','Co','Pa','Pa','Ca','Ca','Co','Co','Pa','Pa','Ca','Ca','Co','Co','Pa','Pa'))
Error:
- Error: unexpected string constant in "8','218','218','218','218',......
Regarding the error, try to write out the information in Excel (export to CSV) or in a text editor using TSV format and then read into R with read.csv or read.delim.
Thanks, I prepared coldata in excel sheet and open in deseq2 as csv file.
colData <- read.csv("sampleInfo.csv", check.names=F)
> head (colData)
Genotypes Development_stage Tissue
1 216_5W_Ca1 216 5W Ca
2 216_5W_Ca2 216 5W Ca
3 216_5W_Co1 216 5W Co
But Now I am getting another small issue: coldata rownames and countdata column are not showing similarity: all(rownames(colData) %in% colnames(countMatrix))
[1] FALSE
colnames(countMatrix)
[1] "" "216_5W_Ca1" "216_5W_Ca2" "216_5W_Co1" "216_5W_Co2" [6] "216_5W_Pa1" "216_5W_Pa2" "216_7W_Ca1" "216_7W_Ca2" "216_7W_Pa1" [11] "216_7W_Pa2" "216_9W_Ca1" "216_9W_Ca2" "216_9W_Co1" "216_9W_Co2" [16] "216_9W_Pa1" "216_9W_Pa2" "218_5W_Ca1" "218_5W_Co1" "218_5W_Co2" [21] "218_5W_Pa1" "218_5W_Pa2" "218_7W_Ca1" "218_7W_Ca2" "218_7W_Co1" [26] "218_7W_Co2" "218_7W_Pa1" "218_7W_Pa2" "218_9W_Ca1" "218_9W_Ca2" [31] "218_9W_Co1" "218_9W_Co2" "218_9W_Pa1" "218_9W_Pa2"
> rownames(colData)
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" [16] "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28" "29" "30" [31] "31" "32" "33"
> head(colData)
Genotypes Development_stage Tissue
1 216_5W_Ca1 216 5W Ca
2 216_5W_Ca2 216 5W Ca
3 216_5W_Co1 216 5W Co
Thnaks ....
I did it and also understant it:
> SampleInfo <- colData[,-1] # remove first column
> rownames(SampleInfo) <- colData[,1] # and then add first column as row names.
Now I need to understand design:
for pairwise comparison like: 216_5W_Ca1_ VS_ 216_5W_Co1
or multifactor design like: upregulated and downregulated genes: 216_5W_Ca1_ VS_216_7W_Ca1::218_5W_Ca1:218_7W_Ca1
I'll refer to my answer above.
You need to collaborate with a local statistician or bioinformatician for questions about choosing which statistical analysis to perform. The support site is for software questions.
You need to collaborate with a bioinformatician.
I'm sorry but I can't answer questions like this on this support site.