Question

salmon output file for differential expression analysis

0

Entering edit mode

Bob • 0

@bob-12005

Last seen 6.3 years ago

Dear All

I want to find predicted lncRNA expression according to RNA-seq data, I prepared a database from all of the predicted lncRNA and then I aligned the PE RNA-seq libs to the indexed lncRNA db via salmon. but in salmon output files there are some non-integer numReads, what is the best approach to handle this non-integer numbers for DE analysis. I don't have gtf and gene id files for db sequence. the database only contains the putative long non-coding RNA which retrieved from the genome.

Thanks

   Name    Length  EffectiveLength  TPM  NumReads
CUFF.47.1   1011    845.627      21.0942   250.461
CUFF.53.2   734    570.457      45.1108    361.328
CUFF.54.1   760    596.362     87.4186    732
CUFF.57.1   825    661.123     268.776    2495
CUFF.58.2   338    176.503     356.296     883
CUFF.80.1   1348   1182.63    0.594387    9.86994

salmon rna seq deseq2 • 2.1k views

ADD COMMENT • link updated 6.3 years ago by Michael Love 43k • written 6.3 years ago by Bob • 0

score 0 · Answer 1 · 2018-11-02

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 4 days ago

United States

hi,

I wouldn't quantify against just a small subset of the transcriptome. If you are working with human or mouse for example, I recommend to use the Gencode transcriptome files, which have protein coding and non-coding together. If you leave out transcripts from the reference which are in the sample it will result in worse quantification. You can subset to lncRNA after quantifying.

The non-integer is fine, see tximport package for importing from Salmon and then running DESeq2 or other inference packages (edgeR, limma).

ADD COMMENT • link 6.3 years ago Michael Love 43k

0

Entering edit mode

Thanks for your answer, this data is for a plant and I don't have Gencode transcriptome files for that. it just a prediction of lncRNA in plant and then find number expression for each putative lncRNA in our RNA-seq libs. so, in that case, Numreads of salmon is useful for quantifying?

regarding tximport package, could you please help me to prepare tx2gene table for my data?

Thanks

ADD REPLY • link 6.3 years ago Bob • 0

0

Entering edit mode

Preparing tx2gene is up to you as the analyst. If you want to combine transcripts to the gene level, you’ll need to provide that mapping.

Again I’d recommend to quantify against coding and non coding together.

ADD REPLY • link 6.3 years ago Michael Love 43k

0

Entering edit mode

due to the small subset of the transcript, if I select TPM of salmon output and then just compare the frequency of each non-coding RNA to find the most expressed one, it will be a robust result?

ADD REPLY • link 6.3 years ago Bob • 0

0

Entering edit mode

I don't really follow what's going on here sorry. If you have a question about using DESeq2 or tximport, feel free to follow up, otherwise, you may get better feedback on a more general bioinformatics forum such as Biostars.

ADD REPLY • link 6.3 years ago Michael Love 43k