Unexpected gene polymorphism using Salmon-tximeta-DESeq2
1
0
Entering edit mode
Ray ▴ 20
@Ray-24558
Last seen 2.3 years ago
Hong Kong

We're analyzing RNAseq data with a pipeline consisting of Salmon, tximeta, and DESeq2.

We have a multi-factorial experimental design, and the experiment was performed on cell lines.

On thing that surprised us is that in the result output, we observe many gene polymorphisms.

For example, for gene NLRP2 we observed multiple entries associated with different ensembl IDs ENSG00000022556, ENSG00000275082, ENSG00000275843, etc.

Entries of NLRP2 from one particular RNAseq experiment result

My question is how do we interpret data like this? And how to deal with this kind of situation? Can we add/average different entries associated with the same gene?

tximeta DESeq2 • 1.6k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 9 hours ago
United States

This is a consequence of the transcriptome you used for quantification. I recommend that people working with human data use GENCODE reference transcripts, because it does not duplicate genes on haplotype chromosomes (which Ensembl does for its transcripts FASTA files). See the chromosome for the genes other than the first, they are listed as "Chromosome CHR_HSCHR19..." which is a haplotype of chr19.

Another reason is that GENCODE provides a single file, while for Ensembl you need to combine the cDNA and ncRNA files to produce a transcriptome.

ADD COMMENT
0
Entering edit mode

Thanks so much for the clarification Michael.

I was indeed confused by the alternative scaffolds included in ensembl genome.

Now that you've mentioned it, I will rebuild salmon index with GENCODE reference transcriptome.

ADD REPLY
0
Entering edit mode

Oh and a further recommendation, when you use Salmon to index, specify --gencode which will clean the transcript names in the Salmon output.

ADD REPLY
0
Entering edit mode

Thank you!!

Indeed I included the --gencode flag by following a tutorial from here https://biocorecrg.github.io/RNAseq_course_2019/salmon.html :)

Right now I'm trying to extract some extra information (i.e. gene symbol, description, etc.) from the rowRanges slot. When I was using ensembl genome reference, these were automatically appended to the SummarizedExperiment object from AnnotationHub, but with GENCODE genome these information were missing.

I've tried the makeLinkedTxome() function to link a local gencode gtf file but it didn't seem to work.

Now I'm reading this vignette https://biodatascience.github.io/compbio/bioc/SE.html to see if I can add these back directly from the gencode gtf file. Any suggestions?

ADD REPLY
0
Entering edit mode

Have you tried addIds from tximeta package?

ADD REPLY
0
Entering edit mode

Just tried addIds and it worked, thanks a lot Michael!

ADD REPLY

Login before adding your answer.

Traffic: 655 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6