Hi,
I have two set of RNA-Seq data of leukaemia, one from our lab and one from a published data set. The aim of the analysis is to further subgrouping the cases by certain traits (e.g. translocation). The batch effect is very heavy and my current approach is: filtering by TPM > voom+quantile normalisation/Deseq2 vst > limma's removeBatchEffect > unsupervised hierarchical clustering. The performance is not too bad as I see groupings of some known subtypes in the analysis and the distribution of delta in gPCA is at 0.01. However, I am still looking for some other means to do batch removal without knowing the underlying biological factors i.e. unknown subtypes, as to see if there is further improvement in grouping the samples. I know pSVA can deal with samples with unknown biological info. Is there any other method that someone could recommend?
Many thanks,
Kent
svaseq requires a priori knowledge of the biological factor in all samples, which is not suitable for my case. And even if it doesn't, I have tried pSVA, which would eventually lead to the same results.
I realise there is a lack of tools for batch effect correction on samples with unknown biological factors for class detection. Perhaps it is statistically challenging to do so?