Hi all,
I really need help for a problem I have come across since I received another batch of RNA-seq data which I have combined with the first batch. Within both batches I have the same organ but different biological replicates... for example, 2 replicates of the lungs in batch 1 and 2 and the third and fourth replicate of the lungs in batch 2.
With the metadata, replicate info and the column order of the counts table aligning with the row order of the metadata sheet, all samples model well with DESeq2 with regards to replicates etc... apart from one organ.. the small intestine. The small intestine is actually recognised as two separate samples and not replicates (or the same organ) across different ages. The small intestine samples are actually separated according to the batches so that in a PCA plot for example, there are separate coloured dots for one set of small intestine samples and another, as if they are two different organs.
This is not happening with the other organs and all organs are recognised correctly as one sample in terms of one organ within which there are different replicates for different ages.
Is this a known issue/bug? Could this result from mistakes in the metadata sheet? I have checked this over and unless I am missing something really obvious, I cannot see any inconsistencies in the metadata table... Any help would be greatly appreciated..
I am also happy to provide code although I don't know what to add as the steps are a standard DESeq2 pipeline!
Many thanks!
Hi Michael,
Thank you for the quick response.
No the points are not separated, they cluster together. The problem is, is that there are two colours assigned for organ as if they are two separate samples even though they are the same. So I have 24 samples in total for the small intestine. I get a split in to two separate samples. about 20 named small intestine and another 5 samples also called small intestine but treated as a separate sample.
I am really sorry if this is unclear, I am not sure how to link the PCA through a link... as I dont know where to upload.
Check table(dds$organ) and make sure that there isn’t a typo in the levels. Recent releases of DESeq2 checks that there aren’t spaces or stray punctuation (typos) potentially affecting factor levels but you may have an older version of DESeq2.
To be more clear, R won’t tolerate any changes in the exact characters. It doesn’t do any kind of fuzzy clumping of characters into levels. “small intestine” is different than “small intestine ” is different than “small.intestine” etc
Thank you Michael,
I get the following result:
Small_Intestine...12
Small_Intestine...8
I can only imagine there may be an alteration in the apostrophe, although there is none entered in the metadata sheet...
There is a small difference somewhere, which you can’t see by eye... R doesn’t make mistakes in comparisons. Just recode from scratch.
Like I said earlier, not sure what version of DESeq2 you are using, but if you used the ones from the past few years, they check if extra spaces are present in the coding of variables and warn the user.
Thanks Michael, I will edit from scracth as you suggest and update DESeq2!
Also, I just realised that 'small intestine' is different to 'small intestine ' with a space in the line! Will edit this all and report back if there is a fix for anyone else who might have this problem!
Many thanks
Solved! Indeed an invisible space in some of the samples! Thanks again!