Hello I am new to NGS analysis and please excuse if the query is too naive. I am working on small RNA seq (tRNA fragments specifically) data. I have 3 samples each for the CONTROL and the TEST and all of them have different read numbers. I have aligned them using STAR aligner and have obtained the aligned BAM files. I am processing the aligned BAM through SALMON to obtain the transcript count matrix in terms of TPM. Since, DESeq analysis works on gene count matrix, I am using tiximport to map my transcript to gene IDs. Later I did DGE analysis through DESeq as per the following codes:
txi <- tximport(files, type="salmon", txIn = TRUE, txOut = FALSE, tx2gene=tx2gene, ignoreTxVersion=TRUE)
sampletype <- factor(c(rep("CONTROL", 3), rep("TEST", 3))) meta <- data.frame(sampletype, row.names = colnames(txi$counts))
all(colnames(txi$counts) %in% rownames(meta)) all(colnames(txi$counts) == rownames(meta))
dds <- DESeqDataSetFromTximport(txi, colData = meta, design = ~ sampletype) dds <- DESeq(dds) sizeFactors(dds) <- estimateSizeFactorsForMatrix(txi$counts) results <- results(dds) significant_genes <- subset(results, padj < 0.05)
Having a very little idea about the normalisation. I wanted to ask if the end results obtained in the significant_genes is over normalized data or not ? Also, if someone can please verify my workflow (mentioned above), it will be a great help.
Thank you.
Thank you !!