Hello.
I have the clinical data from the PAAD patients (n=182). However, when I am trying to do DE, I cannot prepare the data for dds command to kick off the DESeq2 commands.
I don't know what the problem is, but I get some errors while I looked for them on the internet and even went over the DESeq2 tutorial and other examples. It might be noteworthy to declare that I do not have the raw htseq count data and I tried to download it from TCGABiolinks package. I would be happy if I can know your worthy comments. I also attached my colData to this email. Here are the codes I run:
library(TCGAbiolinks) library(DESeq2)
query1 <- GDCquery(project = "TCGA-PAAD", data.category = "Transcriptome Profiling", data.type = "Gene Expression Quantification", file.type = "htseq.counts.gz", experimental.strategy = "RNA-Seq", legacy = F)
GDCdownload(query1, method = "api", files.per.chunk = 10)
maindata <- GDCprepare(query1)
tumorsamplesonly <- maindata[,maindata$sample_type_id == "01"]
coldata01 <- read.csv(file = "colData.csv", row.names = 1) class(coldata01)
row.names(coldata01) <- coldata01$bcr_patient_barcode
coldata01$tumor_status <- factor(coldata01$tumor_status, levels = c("with_tumor", "tumor_free", ordered = F))
(Now, in this level, I stuck in the following codes (codes are in order)
coldata01 <- coldata01[,!is.na(coldata01$tumor_status)]
(THE ERROR: Error in [.data.frame
(coldata01, , !is.na(coldata01$tumor_status)) : undefined columns selected)
coldata01$tumor_status <- relevel(coldata01$tumor_status, ref = "low")
(THE ERROR: Error in relevel.factor(coldata01$tumor_status, ref = "low") : 'ref' must be an existing level)
colData(tumorsamplesonly) <- cbind(colData(tumorsamplesonly), coldata01)
(THE ERROR: Error in DataFrame(..., check.names = FALSE) : different row counts implied by arguments)
dds <- DESeqDataSet(maindata,coldata01, ~tumor_status)
Tip: I hope that the dollar sign between words in commands can be seen clearly. For example, in this command: coldata01$tumor_status
No, I have this column in my colData
Step through what you are doing:
Does this make any sense? You are subsetting columns based on rows. I don't see why you would want to do that.
Furthermore, go slow, and when you encounter an error, take a look at the individual pieces. You'll be able to figure things out on your own computer:
Before the line that gives an error, examine the pieces of the command:
[1] tumor_free tumor_free with_tumor tumor_free with_tumor with_tumor tumor_free with_tumor [9] with_tumor with_tumor tumor_free with_tumor tumor_free with_tumor <NA> with_tumor [17] with_tumor with_tumor with_tumor with_tumor with_tumor with_tumor tumor_free tumor_free [25] with_tumor with_tumor <NA> with_tumor with_tumor tumor_free tumor_free tumor_free [33] tumor_free tumor_free tumor_free with_tumor tumor_free tumor_free with_tumor with_tumor [41] <NA> with_tumor with_tumor tumor_free with_tumor with_tumor with_tumor tumor_free [49] with_tumor with_tumor with_tumor tumor_free tumor_free with_tumor <NA> tumor_free [57] with_tumor <NA> <NA> with_tumor tumor_free tumor_free tumor_free with_tumor [65] with_tumor tumor_free with_tumor with_tumor <NA> with_tumor with_tumor with_tumor [73] with_tumor with_tumor with_tumor with_tumor with_tumor tumor_free <NA> <NA>
[81] with_tumor <NA> <NA> <NA> with_tumor with_tumor with_tumor with_tumor [89] tumor_free tumor_free with_tumor with_tumor with_tumor tumor_free with_tumor with_tumor [97] with_tumor with_tumor with_tumor tumor_free with_tumor tumor_free tumor_free with_tumor [105] tumor_free with_tumor with_tumor with_tumor with_tumor tumor_free tumor_free with_tumor [113] with_tumor with_tumor with_tumor with_tumor <NA> with_tumor tumor_free with_tumor [121] tumor_free tumor_free <NA> with_tumor with_tumor with_tumor with_tumor with_tumor [129] with_tumor with_tumor with_tumor with_tumor with_tumor with_tumor with_tumor tumor_free [137] with_tumor with_tumor with_tumor tumor_free <NA> with_tumor tumor_free with_tumor [145] with_tumor with_tumor tumor_free tumor_free tumor_free with_tumor with_tumor <NA>
[153] tumor_free with_tumor with_tumor <NA> with_tumor <NA> tumor_free tumor_free [161] with_tumor tumor_free with_tumor <NA> with_tumor tumor_free with_tumor with_tumor [169] with_tumor tumor_free <NA> with_tumor tumor_free with_tumor with_tumor tumor_free [177] <NA> tumor_free tumor_free Levels: with_tumor tumor_free FALSE
This is what I get from coldata01$tumor_status
I'm not necessarily asking to look over your code, so much as pointing out why what you did 1) didn't make sense and 2) lead to error.
You may want to work with someone with R experience to look over your code with you.
Thank you professor. Is there any forum to work with? because I put my questions on Biostar and stackoverflow and other related web forums and could not get the answer
I did give you an answer, but you have to work on it. I think I'm done replying to this thread now.