Dear Prof Mark Robinson,
Good day.
> treat <-
>
> factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2"
,"3h1","3h2","24h1","24h2","48h1","48h2"))
>
> Maybe it's better to call this 'time' and you'll need to change this
to
> something like:
> time <- factor( rep(c("C","3h","24h","48h"), each=4) )
>
I assigned the time factor as suggested and it showed me as:
> time
[1] C C C C 3h 3h 3h 3h 24h 24h 24h 24h 96h 96h 96h 96h
Levels: 24h 3h 96h C
However, may I ask, does this time factor vector matches the read
counts of
time in the input file?
The read counts of time in the input file is: C C 3h 3h 24h 24h 96h
96h
C C 3h 3h 24h 24h 96h 96h
Should I make some adjustment in my input file in order to suit the
time
factor vector? Please forgive me, if this is a stupid question.
If I understand your description, you are interested primarily in the
> differences in genotypes. I would suggest starting with:
>
> design <- model.matrix(~time+geno)
>
I would like to study the differentially expressed genes in both 2 HS
and
LS genotypes of tress across time course experiment.
If I would like to study the differentially expressed genes in HS
genotype
tress across time course, can I have the design matrix as below?
design <- model.matrix(~time, data=rawdata)
Thank you very much for your time and help.
Best regards,
KJ Lim
On 18 May 2012 11:32, Mark Robinson <mark.robinson@imls.uzh.ch> wrote:
> Hi KJ Lim,
>
> > trees <-
> >
> factor(c("HS","HS","HS","HS","HS","HS","HS","HS","LS","LS","LS","LS"
,"LS","LS","LS","LS"))
>
> This one seems ok, although could be written more compactly and I'll
give
> it a different name:
>
> geno <- factor( rep(c("HS","LS"),each=8))
>
> > treat <-
> >
> factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2"
,"3h1","3h2","24h1","24h2","48h1","48h2"))
>
> Maybe it's better to call this 'time' and you'll need to change this
to
> something like:
>
> time <- factor( rep(c("C","3h","24h","48h"), each=4) )
>
> These two factor vectors match the columns of your count table, so
you
> should be ok there.
>
> If I understand your description, you are interested primarily in
the
> differences in genotypes. I would suggest starting with:
>
> design <- model.matrix(~time+geno)
>
> > design
> (Intercept) time3h time48h timeC genoLS
> 1 1 0 0 1 0
> 2 1 0 0 1 0
> 3 1 0 0 1 0
> 4 1 0 0 1 0
> 5 1 1 0 0 0
> 6 1 1 0 0 0
> 7 1 1 0 0 0
> 8 1 1 0 0 0
> 9 1 0 0 0 1
> 10 1 0 0 0 1
> 11 1 0 0 0 1
> 12 1 0 0 0 1
> 13 1 0 1 0 1
> 14 1 0 1 0 1
> 15 1 0 1 0 1
> 16 1 0 1 0 1
> attr(,"assign")
> [1] 0 1 1 1 2
> attr(,"contrasts")
> attr(,"contrasts")$time
> [1] "contr.treatment"
>
> attr(,"contrasts")$geno
> [1] "contr.treatment"
>
> Then, you can follow the usual steps as per the edgeR user's guide
--
> estimate dispersion, fit the glm with glmFit() and do the LR testing
with
> glmLRT() and so on.
>
> Hope that gets you started.
>
> Best,
> Mark
>
>
>
>
> On 18.05.2012, at 05:15, KJ Lim wrote:
>
> > Dear edgeR users,
> >
> > I am not a statistician nor R programming geek, please forgive me
if I
> ask
> > stupid question.
> >
> > I have RNA-seq data for 2 different genotype of trees with
different time
> > points 0hour(control),3hr,24hours,and 48hours. Each time point has
two
> > replicates. The experiment design like following:
> >
> > Sample harvested after treatment at
> > Tree H1 Ctrl 3hrs 24hrs 48hrs
> > Tree H2 Ctrl 3hrs 24hrs 48hrs
> >
> > Tree L1 Ctrl 3hrs 24hrs 48hrs
> > Tree L2 Ctrl 3hrs 24hrs 48hrs
> >
> > I would like to study genes that are differentially expressed (DE)
> > throughout the time points in these 2 genotype of trees with
edgeR. I
> read
> > from the edgeR user guide, the suitable DE analysis method for my
> expriment
> > is GLM likelihood ratio test.
> >
> > After read case study in the user guide, I have the RNA-Seq counts
in a
> > file as below in order to input into the edgeR package.
> >
> > Ref Tags H1_C H2_C H1_3H H2_3H H1_1D H2_1D H1_2D H2_2D L1_C L2_C
L1_3H
> > L2_3H L1_1D L2_1D L1_2D L2_2D
> > AA212259 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
> > AB216460 1 0 0 1 2 1 6 2 1 1 1 0 5 1 3 1
> > AC221873 3 0 1 0 3 0 0 2 1 0 1 0 2 1 2 0
> > AD235900 3 1 6 0 5 4 4 4 7 2 4 3 3 0 8 0
> >
> > I used the following commands to input the counts file into edgeR:
> >
> > rawdata <- read.delim("file.txt", sep="\t", check.name=FALSE,
> > stringsAsFactor=FALSE)
> > trees <-
> >
> factor(c("HS","HS","HS","HS","HS","HS","HS","HS","LS","LS","LS","LS"
,"LS","LS","LS","LS"))
> > treat <-
> >
> factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2"
,"3h1","3h2","24h1","24h2","48h1","48h2"))
> >
> > I'm not good in R programming, thus, having this file input into
the
> edgeR
> > and assign the factors as well as design-matrix is a challenge.
I'm stuck
> > how to tell the edgeR about my design matrix.
> >
> > Could you guys kindly help me on this? Have I input my RNA-Seq
counts
> > correctly? Please correct me if there is any mistake I have done
in my
> > edgeR input.
> >
> > I appreciate very much for your help and time. Have a nice day.
> >
> > Best regards,
> > KJ Lim
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor@r-project.org
> >
https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> ----------
> Prof. Dr. Mark Robinson
> Bioinformatics
> Institute of Molecular Life Sciences
> University of Zurich
> Winterthurerstrasse 190
> 8057 Zurich
> Switzerland
>
> v: +41 44 635 4848
> f: +41 44 635 6898
> e: mark.robinson@imls.uzh.ch
> o: Y11-J-16
> w:
http://tiny.cc/mrobin
>
>
[[alternative HTML version deleted]]