Hi all, I am having an issue with DESeq2. One is related to its use in galaxy (did not get an answer on galaxy forum so I thought why not ask here) and one is related to the introduction of coldata information in the matrix before running DESeq2 when using featureCounts data.
1) using Galaxy with 2 factors (2 batches/ 2 discinct studies from the litterature), 3 levels in each factor that are not the same. so 2 batches and in first batch I have non-treated, treated 1h and selected population treated 1h and in second batch I have 3 populations selected that I think could contribute to the 1h treatment of the first batch/study.
also in the first study they have duplicates and in the second they have triplicates.
I end up with the folliowing error:
"Error in .rowNamesDF<-
(x, value = value) : invalid 'row.names' length
Calls: rownames<- ... row.names<- -> row.names<-.data.frame -> .rowNamesDF<-"
I tried to change names of factors, of factorlevels and put duplicates everywhere, it did not work.
However with only One factor and putting everything as factor levels, it works. My batch effect is not taken into account though...
Could you tell me where could the error lie? Or at least what this row.names length error refers to?
2) I then tried to retrieve my featureCounts datasets from galaxy so that I can do deseq2 myself in R (I'm beginner in R) I will fuse my different featureCounts data using join under terminal to have my list of gene names in first column and counts for all replicates in a column each and import it in R and make it as a matrix. Here, I read the bioconductor Doc of DESeq2, but I'm not sure I understand how to create the colData information to inform about the factors. after some search I propose (condition <- factor(c(rep("cond1", 2), rep("cond2", 2), rep("cond3", 2), rep("cond4", 3), rep("cond5", 3), rep("cond6", 3)))) (batch <- factor(c(rep("batch1", 6), rep("batch2", 9)))) (coldata <- data.frame(row.names=colnames(countdata), condition, batch)) dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~condition, batch) dds <- DESeq(dds)
and then I can go on. Could you tell me if it is correct? Where could I find more explanation about the coldata implementation into the matrix? and if I have only one factor with factorlevels only how should I do? only the "condition" lane?
thanks for any help you could provide, and let me know if you need any more information. Best regards
Hi , thanks a lot for your answer. \ For 2) could I ask you the format of the csv? I guess first line ID second column could be first factor then second factor etc...\ For 1), I can tell you the structure: in galaxy I created 2 "factors"\ factorname1 => 3 levels (3 condtions of first paper): FactorLevel1_WT, FactorLevel2_injured, FactorLevel3_celltype1_injured and in each factor level, 2 replicates\ factorname2 => 3 levels (3 sorted cell type frm another tissue that may contribute to cells in injured condition): FactorLevel4_celltype2, FactorLevel5_celltype3, FactorLevel6_celltype4, 3 replicates each\ Putting everything under one unique factor works fine (This example does not have the 3rd level in factor one but I also tried it and also tried to put duplicates only for the factor2), I've also tried to put a different structure with factor one "condition" and factor 2 is "batch" but then I have duplicates as error because I have the same featureCounts (all of them) in both factors.\ I can attach a screenshot if that helps, the bug report tells me this
\ thank you again for your help.
Re: format of the CSV, this is some basic R input, I'd poke around on the online R guides as to how to read CSV data into R. Also if you feel more comfortable doing this with
data.frame
andfactor
, go ahead.I won't be able to debug the Galaxy bit, sorry, due to time pressure. It just may not be possible to do all types of analyses within the Galaxy plugin.