Question

input data from CLC bio edge tests result

0

Entering edit mode

tarun2 • 0

@tarun2-11885

Last seen 3.4 years ago

United States

To the developers,

We did some initial RNA-Seq analysis through CLC Bio and got some excel files from it. For our individual sample, we actually get these several columns:

NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Expression values

NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Normalized expression values

NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Gene name

NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Transcripts annotated

NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Detected transcripts

NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Transcript length

NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Unique transcript reads

NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Total transcript reads

NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Ratio of unique to total (transcript reads)

NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Exons

NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - RPKM

NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Relative RPKM

NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Chromosome

NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Chromosome region start

NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Chromosome region end

We're trying to use the DESeq2 workflow for analysis after this, but we're not sure which input data from these can we use. The expression values (column 1) has the same value with our total transcript reads (column 8) as integers. The rows are the list of differentially express genes across all the chromosomes in rice.

Can we use this as input data?

Please and kindly advise.

Sincerely,

Asher

deseq2 • 1.6k views

ADD COMMENT • link updated 8.4 years ago by Michael Love 43k • written 8.4 years ago by tarun2 • 0

score 1 · Answer 1 · 2016-11-21

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

You need to obtain a matrix of the count or estimated count of fragments that can be assigned to each gene in each sample. And you need this data over all the genes, not just the DE genes. From this point, you can simply follow the DESeq2 vignette or the workflow. You should request information from whoever processed your data on how to obtain this matrix for use with DESeq2.

ADD COMMENT • link 8.4 years ago Michael Love 43k

0

Entering edit mode

Thank you very much for your response.

I did actually used the total transcript reads or the expression value from CLC bio edge tests results as countdata input. Dumb follow-up question here, so the total transcript reads or the expression value cannot be used as input file? What I did with that was with these codes:

#Upload table containing the NIL Drought vs Control#
countData<-read.table("Trial_RNASeq_NIL_DroughtxControl_all.txt",header=T,row.names=1)

#Lets see what is in countData#
head(countData)

#make a data frame colData, with column "condition" and "genotype", get entries in the column "condition" and "genotype" from the column names of countData##
#this data frame will become a deseq table
colData<-data.frame(condition=ifelse(grepl("Drought",colnames(countData)),"Drought","Control"),
genotype=c(rep("NIL",8),rep("Swarna",7)) )

#add rownames in colData using the colnames of countData#
rownames(colData)<-colnames(countData)

#Create DESeq dataset from countData and colData matrix
##construct your DESeq2 data set, making sure to specify the design matrix here
dds<-DESeqDataSetFromMatrix(countData,colData,formula(~genotype+condition+genotype:condition))

#releveling column names so "Drought" will come earlier
colData(dds)$condition<-relevel(colData(dds)$condition,"Drought")

#RUN DESeq2 (differential expression analysis) on the dataset
#DESeq is designed to assess the statistical significance of expression differences measured in RNAseq.
dds<-DESeq(dds)

Thanks and best regards.

Sincerely,

Asher

ADD REPLY • link 8.4 years ago tarun2 • 0

0

Entering edit mode

I don't know about the output of the upstream software. That's for you to make sure of by reading its documentation, or if in doubt you need to contact the developers.

ADD REPLY • link 8.4 years ago Michael Love 43k

0

Entering edit mode

Thank you again for responding.

I checked the CLC Genomics manual and found this part.

The Expression value parameter describes how expression per gene or transcript can be defined
in different ways on both levels:
Total counts. When the reference is annotated with genes only, this value is the total
number of reads mapped to the gene. For un-annotated references, this value is the
total number of reads mapped to the reference sequence. For references annotated with
transcripts and genes, the value reported for each gene is the number of reads that map
to the exons of that gene. The value reported per transcript is the total number of reads
mapped to the transcript.
Unique counts. This is similar to the above, except only reads that are non-specifically mapped are counted.This is the number of reads that match uniquely to the gene or its transcripts.

Another dumb question here does this looks like the counts or estimated counts of fragments that can be assigned to each gene.

Thank you so much for clarifying things out.

Best regards,

Asher

ADD REPLY • link 8.4 years ago tarun2 • 0