tximport and stevia, how to build tx2gene without any reference genome
1
0
Entering edit mode
@bahmanikmsuedu-23146
Last seen 4.1 years ago
Michigan State University

Hi, I'm new in this field, and trying to learn, so any advice would be appreciated. In my RNA seq experiment, I used Salmon to map my reads to a Transcriptome (no genome reference in stevia). Now I have my quant.sf files, that I want to import them to DESeq2 using tximport. I have seen the webpage (https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html), but I am not sure how I am going to get this part below: library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene k <- keys(txdb, keytype = "TXNAME") tx2gene <- select(txdb, k, "GENEID", "TXNAME") Since there is no genome reference in stevia, how this part is going to work for me? Thank you,

deseq2 annotation • 1.8k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 1 day ago
United States

Here's a basic answer:

You obtained your reference transcript sequences from somewhere in order to quantify the samples. If that source provides a grouping of transcripts to genes, you use that. If, as it sounds like is your case, there are no groupings of transcripts to genes available from your reference source, then you need to use a computational method to produce that, and then provide that grouping to tximport.

ADD COMMENT
1
Entering edit mode

Or you can perform transcript level analysis, by setting txOut=TRUE

ADD REPLY
0
Entering edit mode

Thank you for the quick and clear answer.

ADD REPLY
0
Entering edit mode

Sorry, one more question; if I can use "txOut=TRUE" in tximport (to perform transcript level analysis), then what is the point in using tximport, I could just input the quant.sf files directly to RStudio and start DESeq2 on the TPMs. In this post "https://support.bioconductor.org/p/84883/" you have said there is no difference between TPM from salmon and TPM from tximport. Thank you,

ADD REPLY
1
Entering edit mode

DESeq2 from tximport will make use of effective transcript lengths. If you use Salmon these would account for eg sample specific GC biases or transcript length biases.

ADD REPLY
0
Entering edit mode

It makes sense then I am going to use tximport (with "txOut=TRUE"), but I am not sure how I am going to define the replications for each sample for tximport? In here "Importing transcript abundance with tximport" it says create a vector of filenames by reading in a table that contains the sample IDs, but doesn't say anything about reps. Thank you,

ADD REPLY
1
Entering edit mode

This isn’t something covered by tximport. Take a look at the DESeq2 vignette though. You need to provide a table of sample information, called colData. And you need to make sure the rows of that table match the order of files given to tximport.

ADD REPLY
0
Entering edit mode

Thanks, I got that part. Now I'm trying to get to the next steps. For that, I made a data frame out of txi$counts: mydata.df <- data.frame(txi$counts), as a matrix for the rest of the process. Then I built colData and CountNoeZero from mydata.df. Then DESeq2 by: dds <- DESeqDataSetFromMatrix(countsNonZero, colData = coldata, design = ~ genotype). Is this a right workflow? Thank you,

ADD REPLY
0
Entering edit mode

You should read over the documentation a bit more.

ADD REPLY
1
Entering edit mode

I think I got the workflow right this time (transcript-level analysis, with no tx2gene):
1: files <- file.path(dir, "salmon", samples$run, "quant.sf")
2: names(files) <- paste0("sample", 1:18)
3: txi <- tximport(files, type = "salmon", txOut=TRUE)
4: rownames(sampleTable) <- colnames(txi$counts)
5: dds <- DESeqDataSetFromTximport(txi, sampleTable, ~genotype)

Thank you,

ADD REPLY

Login before adding your answer.

Traffic: 710 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6