Hi, I'm new in this field, and trying to learn, so any advice would be appreciated. In my RNA seq experiment, I used Salmon to map my reads to a Transcriptome (no genome reference in stevia). Now I have my quant.sf files, that I want to import them to DESeq2 using tximport. I have seen the webpage (https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html), but I am not sure how I am going to get this part below: library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene k <- keys(txdb, keytype = "TXNAME") tx2gene <- select(txdb, k, "GENEID", "TXNAME") Since there is no genome reference in stevia, how this part is going to work for me? Thank you,
Or you can perform transcript level analysis, by setting
txOut=TRUE
Thank you for the quick and clear answer.
Sorry, one more question; if I can use "txOut=TRUE" in tximport (to perform transcript level analysis), then what is the point in using tximport, I could just input the quant.sf files directly to RStudio and start DESeq2 on the TPMs. In this post "https://support.bioconductor.org/p/84883/" you have said there is no difference between TPM from salmon and TPM from tximport. Thank you,
DESeq2 from tximport will make use of effective transcript lengths. If you use Salmon these would account for eg sample specific GC biases or transcript length biases.
It makes sense then I am going to use tximport (with "txOut=TRUE"), but I am not sure how I am going to define the replications for each sample for tximport? In here "Importing transcript abundance with tximport" it says create a vector of filenames by reading in a table that contains the sample IDs, but doesn't say anything about reps. Thank you,
This isn’t something covered by tximport. Take a look at the DESeq2 vignette though. You need to provide a table of sample information, called colData. And you need to make sure the rows of that table match the order of files given to tximport.
Thanks, I got that part. Now I'm trying to get to the next steps. For that, I made a data frame out of txi$counts: mydata.df <- data.frame(txi$counts), as a matrix for the rest of the process. Then I built colData and CountNoeZero from mydata.df. Then DESeq2 by: dds <- DESeqDataSetFromMatrix(countsNonZero, colData = coldata, design = ~ genotype). Is this a right workflow? Thank you,
You should read over the documentation a bit more.
I think I got the workflow right this time (transcript-level analysis, with no tx2gene):
1: files <- file.path(dir, "salmon", samples$run, "quant.sf")
2: names(files) <- paste0("sample", 1:18)
3: txi <- tximport(files, type = "salmon", txOut=TRUE)
4: rownames(sampleTable) <- colnames(txi$counts)
5: dds <- DESeqDataSetFromTximport(txi, sampleTable, ~genotype)
Thank you,