Hello,
I would like to use coseq to cluster my results based upon transcripts that were identified as differentially expressed by DESeq2. Coseq mentions that I can take the object (e.g. dds) and use that directly.
With this said, I implemented coseq as such:
run <- coseq(dds2_sig, normFactors = "none", transformation = "none", model = "logclr",
parallel = TRUE, meanFilterCutoff = 50, nstart = 1, iter.max = 10, K=2:15, alpha=0.05)
but I receive an error:
****************************************
Co-expression analysis on DESeq2 output:
7479 DE genes at p-adj < 0.05
Error in transformRNAseq(y = y, normFactors = normFactors, transformation = "profile", :
‘normFactors’ must be one of
the following: none, TC, DESeq, TMM, or a vector oflength equal to the number of columns
in ‘y’
Looking at the code, I did specify a valid option for normFactors, but R is registering as invalid.
Users of coseq, any thoughts as to why this problem might be occuring?
Thanks.
apologies, here is my session information
Hi @varunorama,
I wanted to clarify the specific analysis you're trying to perform here:
normFactors = "none"
, which means that the clustering analysis will not make use of the library size normalization factors that were calculated by DESeq2. I would strongly recommend that you leave this to its default value (in the case of aDESeqDataSet
object,normFactors = "DESeq"
) unless you have a good reason for not wanting to normalize the raw counts for the clustering step.transformation = "none"
as well, which means that your raw counts will not be transformed prior to performing the K-means algorithm (which is the default) for clustering. Typically I would recommend using either the logCLR or CLR tranformation of expression profiles (transformation = "logclr"
, which is the default, ortransformation = "clr"
) in conjunction with the K-means algorithm, unless you have a good reason for wanting to use the raw counts.model = "logCLR"
, but the only valid model choices aremodel = "kmeans"
(defautl) ormodel = "Normal"
for a Gaussian mixture model.At the very least, you should thus change the value for the
model
argument in your call tocoseq
(although again, I would also recommend using default values fornormFactors
andtransformation
) and see if that helps things.Andrea (the coseq developer)
Hello Andrea,
Thank you very much for your response. Yes, looking back at your comments I realized the errors in my parameters. I am running it again using this command:
However, I still get the same error:
I noticed that in some other versions,
normFactors
was written asnorm
, but changing that did not alleviate the error.Do you have any suggestions moving forward?
Thanks for double-checking your parameters. Is there any way you could send a fully reproducible example? I'm having trouble reproducing your error.
When you run the analogous example from the vignette, do you get the same error you're seeing with your own data?
Also, regarding
norm
versusnormFactors
, R uses exact or unique partial matches for arguments, so using either of them is the same as using the full argument namenormFactors
.Hello again Andrea,
This will be a somewhat long message but I ran the vignette by itself and it worked. As the only difference between your code and mine was the usage of HTSFilter, I incorporated that into the vignette as such:
However, I stylized the code according to the vignette, but still continued to have the error in my code.
However, looking into it more, I managed to get an output when I called the counts specifically from dds2 as such:
The package I was able to go through the clusters and coseq ran successfully, and it seems to work! I am expecting that this route will produce the same results moving forward but I'd like to get your input on this.
EDIT (3/4/19 3:54 EST): Looking at the summary of my results, I noticed that a lot of Clusters had errors:
I take it that it may not be working as well as I assumed initially... :(
Thanks for the extra detail -- I think I know what's going on here. Because the data here were processed using tximport prior to the differential analysis, the
DESeqDataSet
object does not include any normalization factors that can be accessed viasizeFactors(dds)
(which is how coseq attempts to identify the normalization factors used in the differential analysis). SincesizeFactors(dds) = NULL
and coseq is expecting one of the supported normalization factor types, that throws the error you saw.For an initial work-around, I would suggest something like the following:
Note that I extracted the normalized counts from the
dds2
object using thecounts
accessor function from DESeq2 withnorm = TRUE
and then specifiednormFactors = "none"
in coseq so that it wouldn't try to renormalize counts. Could you try this and see if it works for you?Regarding the clusters that have errors, those represent clusters for with
ifault
from thekmeans
function returned an indicator for a possible algorithm problem (i.e., lack of convergence). I would try leaving thenstart
anditer.max
parameters at their default values (iter.max = 50
and andnstart = 50
) to fix this -- I used smaller values for these in the vignette simply to improve compilation time.As to the usage of HTSFilter here (which is actually another package I developed, to filter weakly expressed genes in RNA-seq data prior to running a differential analysis), there is no reason you need to use it here unless you want to; it is not at all necessary for or connected to coseq for clustering.
Hello Andrea,
Thank you so very much - I followed your advice and instructions which allowed me to run coseq successfully! Your explanations were very helpful as to why the workaround was successful - thank you for a patient answer.
As for HTSFilter (thank you for this package btw, I've used it endlessly), yes I can see why it's not needed to input into coseq (I take it the
[which(res$padj < 0.05),]
part of the command will only consider the significant genes from the LRT test). Thanks for pointing that out.Thanks for a great package!
Wonderful, I'm glad it worked! And yes you're right, the
which
index will indeed only consider the genes significant atpadj < 0.05
from the LRT test in my above code.I'm going to add an answer below to close out this question, and I will try to add something in the coseq documentation to indicate what needs to be done with data processed via tximport.
Thanks for using both coseq and HTSFilter!