Dear all,
I'm currently trying to do the integrative analysis of ChIP-seq and RNA-seq data. I used the "ChIPseeker" and "TxDb.Mmusculus.UCSC.mm10.knownGene" package for annotating ChIP-seq peaks. So I understand that the package used the "mm10.knownGene.gtf" file from UCSC (https://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/genes/) for gene annotation (please correct me if I'm wrong). For my RNA-seq analysis, I then used this file in STAR mapping and RSEM quantification, together with the "knownIsoforms.txt" file downloaded from http://hgdownload.cse.ucsc.edu/goldenPath/mm10/database/.
The problem is, a lot of geneID corresponding to peaks obtained from "ChIPseeker" and "TxDb.Mmusculus.UCSC.mm10.knownGene" are not presented in RNA-seq result from RSEM and vice versa. I would like to ask if I used the right file and correct analysis methods or not? If not, what annotation file should be used for my RNA-seq analysis so that I get the compatible result to ChIP-seq peak annotation from "TxDb.Mmusculus.UCSC.mm10.knownGene"?
Best regards, Kasit
You'll have to provide more information than that. You appear to be using transcripts (e.g., you pass a GTF to STAR and then use RSEM), but then you talk about Gene IDs, which are different from transcript IDs. Perhaps if you show some output of the peaks you are getting and the results from your RNA-Seq it would help clarify things.
Thank you James, I've just checked and found that all transcript IDs (in ensembl format) from ChIP-seq peak annotation are included in the transcript IDs (also in ensembl format) from RSEM output. Yet the gene IDs from RSEM output appear not to be in the EntrezID and I have no idea why.
Here is an example of output from RSEM.
For example, "ENSMUST00000000001.4" is actually correspond to gene_ID "14679" in entrez not "1" as shown here.