Question

Cross-species RNA-seq with DESeq2

1

Entering edit mode

vtartaglio ▴ 10

@vtartaglio-13451

Last seen 5.5 years ago

I am a complete noob to RNA-seq, but I'm trying to use DESeq2 to do differential expression analysis in a rather unusual system. I would like to compare gene expression between two closely related species (same genus). I have reference genomes for both. The RNA is just from control conditions, no treatment. I'm aware that a common approach for cross-species RNA-seq is to analyze only conserved exons, but because these species are so closely related--most genes in species A have a 1:1 ortholog in species B--that I'm hoping to just use a gene-level approach. Currently, I'm analyzing the data in two ways:

(1) Align reads from both species to the reference genome of species A, and proceed with downstream analysis as if my samples were two different treatments on species A.

(2) Align reads from each species to its own reference genome. Discard all genes that don't have a 1:1 orthology relationship. Change the gene names in the species B raw counts files to the names of their species A orthologs. Proceed with downstream analysis as if these were two different treatments on species A, analyzing only these "shared" genes.

I like this second approach because it takes advantage of both reference genomes (which my collaborators would like me to use), but I'm not sure if I'm making any truly bad assumptions, for example if it's bad that gene X is a different length in species A from its ortholog in species B. With either approach (1) or approach (2) I get a similar number of DEGs (a bit more for (1), as we would expect), and the majority of the DEGs from either analysis are called significant in both. Does anyone have opinions on this? Am I completely crazy for even attempting strategy (2)? Thank you so much in advance!!!

deseq2 rnaseq across species • 6.0k views

ADD COMMENT • link updated 20 months ago by James W. MacDonald 68k • written 7.6 years ago by vtartaglio ▴ 10

0

Entering edit mode

I am working on a similar project as yours. Have you figure out a good pipeline to do this? Would you mind to share with me? Thanks

ADD REPLY • link 5.7 years ago qwzhang0601 • 0

0

Entering edit mode

HI vtartaglio, I have a question can you please tell me how did you do 1:1 orthology relationship and discarded genes. because i have two species aligned to respective reference genome how can i do this to them please help i m new to the field.

2nd both species have different gene ids i think they have local ids how to solve this problem too how can i get common genes expressed in both i am dealing with non model organisms please help

ADD REPLY • link 20 months ago Syed Zaheer ▴ 10

0

Entering edit mode

Please don't respond to six year old posts!

If you have a question, make a new post, and describe what you are trying to do.

ADD REPLY • link 20 months ago James W. MacDonald 68k

score 3 · Answer 1 · 2017-07-11

3

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

I don't have a solution per se, I think this kind of analysis can be difficult to remove all "technical" issues to get to cross-species gene expression changes.

(1) has a systematic bias, because you're favoring the reads from species A, due to alignment problems for reads from species B.

(2) sounds better. You can use the gene lengths as a normalization factor. Make a matrix which gives the gene length for each gene and each sample (so the same size as the count matrix). Then before you run DESeq(), you can store this matrix to:

assays(dds)[["avgTxLength"]] <- mat

This will be picked up by DESeq() and used for normalization.

ADD COMMENT • link 7.6 years ago Michael Love 43k

0

Entering edit mode

Awesome, this is so helpful!! I really like this length normalization idea, I'll go ahead and try it out today. Thank you so much!

ADD REPLY • link 7.6 years ago vtartaglio ▴ 10

1

Entering edit mode

Perhaps you might try doing your alignments with salmon (or kallisto, license permitting) and using tximport to load up your results.

The tximport vignette shows you how to then import these data and summarize the transcript level counts to the gene level, as well as how you can incorporate the sample specific gene-length estimates salmon gives you using an offset matrix for DESeq2 or edgeR, or by using "lengthScaledTPM" with voom.

ADD REPLY • link 7.6 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

One note: tximport needs the transcript names to be the same across samples. So you can only do it across quant.sf files using the same index. And the length scaling for "lengthScaledTPM" is only relative across the samples that it's importing. But you can use this to get the lengths and counts as matrices for each species.

ADD REPLY • link 7.6 years ago Michael Love 43k

0

Entering edit mode

Michael,

I have a followup question to this excellent post on using DESeq2 for multi-species comparisons.

As suggested in DESeq2’s vignette, I generated count tables using Salmon and imported these count tables using tximport for my subsequent analyses in DESeq2. What I still haven’t been able to determine is if tximport will automatically correct for the gene length differences between species or if I also need to provide an additional matrix with gene lengths (as described above) in addition to using tximport. Based off of this post (inter-species RNA-seq with DEseq2: how to account for gene length differences), I believe the first approach is the most appropriate.

Thank you for the help.

ADD REPLY • link 2.9 years ago petersearle94 • 0

0

Entering edit mode

For tximport -> DESeq2, each sample has its own gene length. That is what is used in calculating the offset. So it should "just work" without need for any tweaking.

ADD REPLY • link 2.9 years ago Michael Love 43k

0

Entering edit mode

Awesome! Thank you for the help Michael.

ADD REPLY • link 2.9 years ago petersearle94 • 0

0

Entering edit mode

Hi Michael,

I am facing the same question (working with orthologs of two different species), but I am using DESeqDataSetFromMatrix. How can I store a matrix which gives the gene length for each gene and each sample and use it for normalization in this case?

I already did this matrix but I can't find a way to use it for normalization.

Thank you.

ADD REPLY • link 4.7 years ago mellec • 0

0

Entering edit mode

Isn’t this directly answered by (2) above?

ADD REPLY • link 4.7 years ago Michael Love 43k

0

Entering edit mode

When I try to use the: assays(dds)[["avgTxLength"]] <- mat

It says:

Error: object 'mat' not found

DESEq2 is loaded and I did all the steps before DESeq()

ADD REPLY • link 4.7 years ago mellec • 0

0

Entering edit mode

Oh, if you don't have the matrix, I don't have a solution for you. DESeq2 doesn't provide any utility functions for doing cross-species analysis (and I have almost zero expertise in this tricky area of bioinformatics).