To the developers,
We did some initial RNA-Seq analysis through CLC Bio and got some excel files from it. For our individual sample, we actually get these several columns:
NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Expression values | NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Normalized expression values | NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Gene name | NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Transcripts annotated | NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Detected transcripts | NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Transcript length | NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Unique transcript reads | NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Total transcript reads | NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Ratio of unique to total (transcript reads) | NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Exons | NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - RPKM | NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Relative RPKM | NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Chromosome | NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Chromosome region start | NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Chromosome region end |
We're trying to use the DESeq2 workflow for analysis after this, but we're not sure which input data from these can we use. The expression values (column 1) has the same value with our total transcript reads (column 8) as integers. The rows are the list of differentially express genes across all the chromosomes in rice.
Can we use this as input data?
Please and kindly advise.
Sincerely,
Asher
Thank you very much for your response.
I did actually used the total transcript reads or the expression value from CLC bio edge tests results as countdata input. Dumb follow-up question here, so the total transcript reads or the expression value cannot be used as input file? What I did with that was with these codes:
#Upload table containing the NIL Drought vs Control#
countData<-read.table("Trial_RNASeq_NIL_DroughtxControl_all.txt",header=T,row.names=1)
#Lets see what is in countData#
head(countData)
#make a data frame colData, with column "condition" and "genotype", get entries in the column "condition" and "genotype" from the column names of countData##
#this data frame will become a deseq table
colData<-data.frame(condition=ifelse(grepl("Drought",colnames(countData)),"Drought","Control"),
genotype=c(rep("NIL",8),rep("Swarna",7)) )
#add rownames in colData using the colnames of countData#
rownames(colData)<-colnames(countData)
#Create DESeq dataset from countData and colData matrix
##construct your DESeq2 data set, making sure to specify the design matrix here
dds<-DESeqDataSetFromMatrix(countData,colData,formula(~genotype+condition+genotype:condition))
#releveling column names so "Drought" will come earlier
colData(dds)$condition<-relevel(colData(dds)$condition,"Drought")
#RUN DESeq2 (differential expression analysis) on the dataset
#DESeq is designed to assess the statistical significance of expression differences measured in RNAseq.
dds<-DESeq(dds)
Thanks and best regards.
Sincerely,
Asher
Thank you again for responding.
I checked the CLC Genomics manual and found this part.
The Expression value parameter describes how expression per gene or transcript can be defined
in different ways on both levels:
Total counts. When the reference is annotated with genes only, this value is the total
number of reads mapped to the gene. For un-annotated references, this value is the
total number of reads mapped to the reference sequence. For references annotated with
transcripts and genes, the value reported for each gene is the number of reads that map
to the exons of that gene. The value reported per transcript is the total number of reads
mapped to the transcript.
Unique counts. This is similar to the above, except only reads that are non-specifically mapped are counted.This is the number of reads that match uniquely to the gene or its transcripts.
Another dumb question here does this looks like the counts or estimated counts of fragments that can be assigned to each gene.
Thank you so much for clarifying things out.
Best regards,
Asher