Hi,
I am interested in performing isoform analysis on short read data (150bp) using the edgeR package, following the example from the edgeR User's Guide, section "4.6 Differential transcript expression of human lung adenocarcinoma cell lines."
I ran my pipeline using the nf-core/rnasplice pipeline and obtained counts and TPM values: nf-core rnasplice output:
From the example 4.6 in the edgeR User's Guide, I tried importing the "quant.sf" files but experienced difficulties. Based on the example, I imported scaled counts as suggested:
# Define the path to your TSV file
file_path <- "/projects/salmon/tximport/salmon.merged.transcript_counts_scaled.tsv"
# Import the TSV file
scaled.counts <- read.delim(file_path, header = TRUE, sep = "\t", row.names = 1)
# Create DGEList object
y <- DGEList(counts = scaled.counts, samples = Samples_metadata)
dim(y)
Which of the following files makes the most sense to import for "4.6 Differential transcript expression of human lung adenocarcinoma cell lines"?
Counts from nf-core/rnasplice:
- salmon.merged.transcript_counts.tsv: Matrix of isoform-level raw counts across all samples.
- salmon.merged.transcript_counts_scaled.tsv: Matrix of isoform-level scaled raw counts across all samples.
- salmon.merged.transcript_counts_length_scaled.tsv: Matrix of isoform-level length-scaled raw counts across all samples.
- salmon.merged.transcript_counts_dtu_scaled.tsv: Matrix of isoform-level dtu scaled raw counts across all samples.
TPMs from nf-core/rnasplice:
- salmon.merged.transcript_tpm.tsv: Matrix of isoform-level TPM values across all samples.
- salmon.merged.transcript_tpm_scaled.tsv: Matrix of isoform-level scaled TPM values across all samples.
- salmon.merged.transcript_tpm_length_scaled.tsv: Matrix of isoform-level length-scaled TPM values across all samples.
- salmon.merged.transcript_tpm_dtu_scaled.tsv: Matrix of isoform-level dtu scaled TPM values across all samples.
ATpoint Thank you. I did wanted to try using the
quant.sf
files from thesalmon
run by referring edgeR user guide example4.6 Differential transcript expression of human lung adenocarcinoma cell lines
I tried the below steps, but I see there areNA
values in the Overdispersion, and after runningscaled.counts
, the table is populated with NA values (see below screenshots)Probably nf-core did not run with bootstrap/gibbs replicates. Check the code. See salmon docs on how to turn that on.
OP needs to run Salmon with
--numGibbsSamples=50
. I suspect that nf-core just won't do that, so Salmon needs to be run directly.All nf-core pipelines can be run with any additional tool CLI flags, on a per-user basis without editing any pipeline code. See docs on how to do it here.
In this case I think it would be a question of the OP setting this Nextflow config locally:
If it's an argument that should be set for all pipeline users then it's best to create an issue on the pipeline's GitHub repo.
I think this already happened, see https://github.com/nf-core/rnasplice/issues/162