Enter the body of text here
How to solve the problem of dataset imbalance?
I have 82 cancer samples and 390 control samples and limma gives different results when I randomly select an equivalent number of control samples ?
I read that imbalance sample size has a significant effect on identifying differential expressed genes.
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-7-S4-S8#Sec9
Code should be placed in three backticks as shown below
# include your problematic code here with any corresponding output
# please also include the results of running the following in an R session
sessionInfo( )
There is no problem with unequal sample sizes. As long as you have enough cancer samples and enough control samples to be representative, the fact that one sample size is greater than the other does not bias the DE results at all.
With a large human RNA-seq study there are lots of things that one needs to give attention to, like outliers and variations in sample quality or batch effects, but unequal sample sizes is not one of them.
I do not agree with the conclusions of the paper you cite. The paper doesn't compare to limma anyway. The paper seems concerned about unequal variances between the groups but limma can easily handle that through the use of empirical quality weights.