Question

high number of DEGs detected using EBSeq-HMM *****UPDATED

0

Entering edit mode

fa.gholizadeh89 • 0

@fagholizadeh89-12402

Last seen 6.0 years ago

Hi,

I'm using EBSeq-HMM to find DE genes. The problem is among 17611 genes, 16139 genes are identified as DE which is very unusual.

My cutoff to filter genes out before detecting DE genes was to discard all those genes without at least 5 reads in at least one sample.

Do you think my result is ok? I can conduct further analysis to obtain the most significant DE genes, but since I want to investigate the closeness of this model to another model, I have to run this model by its default settings.

What do you think accounts for this large number of DEGs found? Has anyone worked with this package before? I read the paper behind this package. It seems they themselves have declared that their model has found more DEGs in comparison with other models such as edgeR, DESeq2, Voom, etc.. But detecting 16139 genes as DE out of 17611 genes is way inordinate.

Please if you haven't worked with this package but you can sense what is probably going wrong, let me know.

Can my filtering Criteria to discard low expressed genes be a reason? Should I filter genes more strictly before I use this package?

Thanks a lot

ebseq-hmm • 1.4k views

ADD COMMENT • link 8.2 years ago fa.gholizadeh89 • 0

0

Entering edit mode

Not an EBSeq-HMM user here, but nonetheless it would be handy to know how many timepoints (or number of ordinal levels) you have, and how many replicates. A large number of timepoints offers a large number of non-flat paths, and it wasn't immediately clear how this is dealt with.

Maybe do a DESeq2 LRT on the ordinal factor, and visually inspect some of the genes that are DESeq2-null but not not EBSeq-null.

ADD REPLY • link 8.2 years ago Gavin Kelly ▴ 690

0

Entering edit mode

Thank you for your response. I have 4 time points and 4 replicates at each time point. Other models identify about 10k to 12k genes by default. I wanna compare two models with a benchmark model to investigate which one is doing better at detecting DEGs. So all I need to know is that is it ordinary to find 16k genes as DE out of 17500 genes if your goal is only to see how different models work by their default settings?

ADD REPLY • link 8.2 years ago fa.gholizadeh89 • 0

0

Entering edit mode

No - I'd say that is unusual in my experience, but so is 10-12k. It makes me wonder if variance is being (artificially) reduced - is it a real experiment, or are you simulating the data

ADD REPLY • link 8.2 years ago Gavin Kelly ▴ 690

0

Entering edit mode

It's a real data set. I downloaded it from ncbi. These are the genes found by default settings of each model (only the adj.P.value) and I haven't filter them by FC or further analysis.

ADD REPLY • link 8.2 years ago fa.gholizadeh89 • 0

0

Entering edit mode

Could a problem be that there are some uncontrolled batch effects in data which could lead to these strange results? They themselves have used a non-standard normalization method to adjust for the batch effects but I didn't adjust for the batch effects and I only controlled the sequencing depth bias using the standard normalization methods ( like TMM).

ADD REPLY • link 8.2 years ago fa.gholizadeh89 • 0