Dear All
I want to find predicted lncRNA expression according to RNA-seq data, I prepared a database from all of the predicted lncRNA and then I aligned the PE RNA-seq libs to the indexed lncRNA db via salmon. but in salmon output files there are some non-integer numReads, what is the best approach to handle this non-integer numbers for DE analysis. I don't have gtf and gene id files for db sequence. the database only contains the putative long non-coding RNA which retrieved from the genome.
Thanks
Name Length EffectiveLength TPM NumReads
CUFF.47.1 1011 845.627 21.0942 250.461
CUFF.53.2 734 570.457 45.1108 361.328
CUFF.54.1 760 596.362 87.4186 732
CUFF.57.1 825 661.123 268.776 2495
CUFF.58.2 338 176.503 356.296 883
CUFF.80.1 1348 1182.63 0.594387 9.86994
Thanks for your answer, this data is for a plant and I don't have Gencode transcriptome files for that. it just a prediction of lncRNA in plant and then find number expression for each putative lncRNA in our RNA-seq libs. so, in that case, Numreads of salmon is useful for quantifying?
regarding tximport package, could you please help me to prepare tx2gene table for my data?
Thanks
Preparing tx2gene is up to you as the analyst. If you want to combine transcripts to the gene level, you’ll need to provide that mapping.
Again I’d recommend to quantify against coding and non coding together.
due to the small subset of the transcript, if I select TPM of salmon output and then just compare the frequency of each non-coding RNA to find the most expressed one, it will be a robust result?
I don't really follow what's going on here sorry. If you have a question about using DESeq2 or tximport, feel free to follow up, otherwise, you may get better feedback on a more general bioinformatics forum such as Biostars.