Question

Genome annotation file in TxDb.Mmusculus.UCSC.mm10.knownGene

0

Entering edit mode

Kasit • 0

@kasit-24846

Last seen 3.6 years ago

United Kingdom

Dear all,

I'm currently trying to do the integrative analysis of ChIP-seq and RNA-seq data. I used the "ChIPseeker" and "TxDb.Mmusculus.UCSC.mm10.knownGene" package for annotating ChIP-seq peaks. So I understand that the package used the "mm10.knownGene.gtf" file from UCSC (https://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/genes/) for gene annotation (please correct me if I'm wrong). For my RNA-seq analysis, I then used this file in STAR mapping and RSEM quantification, together with the "knownIsoforms.txt" file downloaded from http://hgdownload.cse.ucsc.edu/goldenPath/mm10/database/.

The problem is, a lot of geneID corresponding to peaks obtained from "ChIPseeker" and "TxDb.Mmusculus.UCSC.mm10.knownGene" are not presented in RNA-seq result from RSEM and vice versa. I would like to ask if I used the right file and correct analysis methods or not? If not, what annotation file should be used for my RNA-seq analysis so that I get the compatible result to ChIP-seq peak annotation from "TxDb.Mmusculus.UCSC.mm10.knownGene"?

Best regards, Kasit

ChIPseeker TxDb.Mmusculus.UCSC.mm10.knownGene • 2.5k views

ADD COMMENT • link 3.7 years ago Kasit • 0

0

Entering edit mode

You'll have to provide more information than that. You appear to be using transcripts (e.g., you pass a GTF to STAR and then use RSEM), but then you talk about Gene IDs, which are different from transcript IDs. Perhaps if you show some output of the peaks you are getting and the results from your RNA-Seq it would help clarify things.

ADD REPLY • link 3.7 years ago James W. MacDonald 67k

0

Entering edit mode

Thank you James, I've just checked and found that all transcript IDs (in ensembl format) from ChIP-seq peak annotation are included in the transcript IDs (also in ensembl format) from RSEM output. Yet the gene IDs from RSEM output appear not to be in the EntrezID and I have no idea why.

Here is an example of output from RSEM.

For example, "ENSMUST00000000001.4" is actually correspond to gene_ID "14679" in entrez not "1" as shown here.

ADD REPLY • link 3.7 years ago Kasit • 0

score 1 · Answer 1 · 2021-02-22

1

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 10 hours ago

United States

All of the gene_ids you present there are human, not mouse. No idea how you got the two mixed up like that? Anyway, this looks like an issue with how you ran RSEM rather than anything Bioconductor related, so you should probably ask over on Biostars.org, and provide more information about how you ran RSEM.

ADD COMMENT • link 3.7 years ago James W. MacDonald 67k

0

Entering edit mode

Ah, I see. Thank you very much James.

Best regards, Kasit

ADD REPLY • link 3.7 years ago Kasit • 0