Hello List
I have RNA-seq data from different time points and I would like to
find
oscillating genes.
I thought of using the "cycle" package (which is based on Fourier
score) ,
but I'm not sure what values to use: FPKM or DESeq/edgeR normalized
values.
Any suggestion what would be more appropriate?
Thanks
Mali
[[alternative HTML version deleted]]
hi Mali,
You would certainly want to normalize each sample based on library
size,
which happens in all methods, although the DESeq and edgeR methods for
library size normalization use a robust estimator, while the 'sum of
mapped
reads' is is strongly influenced by the features with highest counts.
I can't tell from the documentation if the standardise() function in
the
cycle package is creating mean 0, variance 1 along the rows or
columns, but
this would be relevant to your question.
If you were calculating simple gene-gene distances and looking for
genes
which cluster together, it would be important to center the log-scale
counts along the genes, as you would probably like to find genes which
rise
and fall together regardless of the base level. (see the sweep()
function
in base R)
Another factor is the dependence of the variance of log-transformed
counts
on the mean. From a quick read of the vignette, the model in cycle
package
has normally-distributed error with a variance that depends on time,
but
not explicitly on the mean. You might try first using one of the
transformations described in the DESeq2 vignette and then working with
the
variance stabilized values.
Mike
On Tue, Apr 1, 2014 at 8:33 AM, mali salmon <shalmom1@gmail.com>
wrote:
> Hello List
> I have RNA-seq data from different time points and I would like to
find
> oscillating genes.
> I thought of using the "cycle" package (which is based on Fourier
score) ,
> but I'm not sure what values to use: FPKM or DESeq/edgeR normalized
values.
> Any suggestion what would be more appropriate?
> Thanks
> Mali
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
[[alternative HTML version deleted]]
Dear Mali,
It is easy to do such an analysis directly in edgeR.
Suppose for example that you are looking for a daily cycle, and that
you
have at least half a dozen time points spanning more than a full day.
The
method is to create basis vectors for a periodic expression pattern.
Suppose that 'time' is time in hours:
s1 <- sin(2*pi*time/24)
c1 <- cos(2*pi*time/24)
design <- model.matrix(~s1+c1)
This will give a design matrix with three columns: one for the
intercept
and two for the periodic signal. To test for a periodic cycle:
fit <- glmFit(y,design)
lrt <- glmLRT(fit,coef=2:3)
If you have lots of time points and you want to make a more complex
curve
you can add a harmonic:
s2 <- sin(4*pi*time/24)
c2 <- cos(4*pi*time/24)
design <- model.matrix(~s1+c1+s2+c2)
There are now four coefficients parametrizing the periodic curve.
(These
can be viewed as representing amplitudes and phase shifts for the two
harmonics.) I think you would seldom want more than two harmonics for
RNA-seq data.
This method can easily be adapted to find cell-cycle genes and so on.
Best wishes
Gordon
> Date: Tue, 1 Apr 2014 14:33:27 +0200
> From: mali salmon <shalmom1 at="" gmail.com="">
> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">
> Subject: [BioC] detecting oscillating genes from RNA-seq data
>
> Hello List
> I have RNA-seq data from different time points and I would like to
find
> oscillating genes.
> I thought of using the "cycle" package (which is based on Fourier
score) ,
> but I'm not sure what values to use: FPKM or DESeq/edgeR normalized
values.
> Any suggestion what would be more appropriate?
> Thanks
> Mali
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}
Dear Prof. Smith
Thank you so much for the explanation of how to do the analysis.
If you don't mind I'll describe the design of our experiment just to
be
sure that I'm not doing any mistake.
We work on a sea animal, and we are interested with day/night
oscillating
genes (24 hours cycle) and with tidal genes (12.4 h cycle).
In the first week we have sacrified animals every 4 hours across 48 h,
and
then repeated the same procedure the week after.
So in total we have 24 samples, 4 "replicates" of 6 time points
(2am,6am,10am,2pm,6pm,10pm)
>From your mail I understand that time is a vector of time points in
hours,
but I'm not sure how to define it, it can be:
1) time <- rep(seq(2,46,4),2)
2) time <- seq(2,94,4)
3) time <- rep(seq(2,22,4),4)
I saw this matter for the cos vector.
I suppose that for detecting tidal genes I just have to repeat the
analysis
with 12.4 cycle.
Thanks again
Mali
On Thu, Apr 3, 2014 at 6:16 AM, Gordon K Smyth <smyth@wehi.edu.au>
wrote:
> Dear Mali,
>
> It is easy to do such an analysis directly in edgeR.
>
> Suppose for example that you are looking for a daily cycle, and that
you
> have at least half a dozen time points spanning more than a full
day. The
> method is to create basis vectors for a periodic expression pattern.
> Suppose that 'time' is time in hours:
>
> s1 <- sin(2*pi*time/24)
> c1 <- cos(2*pi*time/24)
> design <- model.matrix(~s1+c1)
>
> This will give a design matrix with three columns: one for the
intercept
> and two for the periodic signal. To test for a periodic cycle:
>
> fit <- glmFit(y,design)
> lrt <- glmLRT(fit,coef=2:3)
>
> If you have lots of time points and you want to make a more complex
curve
> you can add a harmonic:
>
> s2 <- sin(4*pi*time/24)
> c2 <- cos(4*pi*time/24)
> design <- model.matrix(~s1+c1+s2+c2)
>
> There are now four coefficients parametrizing the periodic curve.
(These
> can be viewed as representing amplitudes and phase shifts for the
two
> harmonics.) I think you would seldom want more than two harmonics
for
> RNA-seq data.
>
> This method can easily be adapted to find cell-cycle genes and so
on.
>
> Best wishes
> Gordon
>
>
> Date: Tue, 1 Apr 2014 14:33:27 +0200
>> From: mali salmon <shalmom1@gmail.com>
>> To: "bioconductor@r-project.org" <bioconductor@r-project.org>
>> Subject: [BioC] detecting oscillating genes from RNA-seq data
>>
>> Hello List
>> I have RNA-seq data from different time points and I would like to
find
>> oscillating genes.
>> I thought of using the "cycle" package (which is based on Fourier
score) ,
>> but I'm not sure what values to use: FPKM or DESeq/edgeR normalized
>> values.
>> Any suggestion what would be more appropriate?
>> Thanks
>> Mali
>>
>
>
______________________________________________________________________
> The information in this email is confidential and
inte...{{dropped:10}}
On Thu, 3 Apr 2014, mali salmon wrote:
> Dear Prof. Smith
> Thank you so much for the explanation of how to do the analysis.
> If you don't mind I'll describe the design of our experiment just to
be
> sure that I'm not doing any mistake.
> We work on a sea animal, and we are interested with day/night
oscillating
> genes (24 hours cycle) and with tidal genes (12.4 h cycle).
> In the first week we have sacrified animals every 4 hours across 48
h, and
> then repeated the same procedure the week after.
> So in total we have 24 samples, 4 "replicates" of 6 time points
> (2am,6am,10am,2pm,6pm,10pm)
> From your mail I understand that time is a vector of time points in
hours,
> but I'm not sure how to define it, it can be:
> 1) time <- rep(seq(2,46,4),2)
> 2) time <- seq(2,94,4)
> 3) time <- rep(seq(2,22,4),4)
Makes no difference. All three choices will give exactly the same
results.
> I saw this matter for the cos vector.
> I suppose that for detecting tidal genes I just have to repeat the
analysis
> with 12.4 cycle.
You need to carefully identify the tidal stage applicable to each of
the
24 samples. Clearly the tidal phase will have moved in the second
week.
For example you might code 'tide' to be between 0 and 1, with 0/1 for
high
tide and 0.5 for low tide. Then:
tides <- sin(2*pi*tide)
tidec <- cos(2*pi*tide)
Then you need to fit both day/night and tidal cycles to your data at
the
same time:
~s1+c1+tides+tidec
It may be difficult to distinguish a tidal cycle from a 12-hour cycle.
You might try to check check this by:
~s1+c1+s2+c2+tides+tidec
Gordon
> Thanks again
> Mali
>
>
> On Thu, Apr 3, 2014 at 6:16 AM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote:
>
>> Dear Mali,
>>
>> It is easy to do such an analysis directly in edgeR.
>>
>> Suppose for example that you are looking for a daily cycle, and
that you
>> have at least half a dozen time points spanning more than a full
day. The
>> method is to create basis vectors for a periodic expression
pattern.
>> Suppose that 'time' is time in hours:
>>
>> s1 <- sin(2*pi*time/24)
>> c1 <- cos(2*pi*time/24)
>> design <- model.matrix(~s1+c1)
>>
>> This will give a design matrix with three columns: one for the
intercept
>> and two for the periodic signal. To test for a periodic cycle:
>>
>> fit <- glmFit(y,design)
>> lrt <- glmLRT(fit,coef=2:3)
>>
>> If you have lots of time points and you want to make a more complex
curve
>> you can add a harmonic:
>>
>> s2 <- sin(4*pi*time/24)
>> c2 <- cos(4*pi*time/24)
>> design <- model.matrix(~s1+c1+s2+c2)
>>
>> There are now four coefficients parametrizing the periodic curve.
(These
>> can be viewed as representing amplitudes and phase shifts for the
two
>> harmonics.) I think you would seldom want more than two harmonics
for
>> RNA-seq data.
>>
>> This method can easily be adapted to find cell-cycle genes and so
on.
>>
>> Best wishes
>> Gordon
>>
>>
>> Date: Tue, 1 Apr 2014 14:33:27 +0200
>>> From: mali salmon <shalmom1 at="" gmail.com="">
>>> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">
>>> Subject: [BioC] detecting oscillating genes from RNA-seq data
>>>
>>> Hello List
>>> I have RNA-seq data from different time points and I would like to
find
>>> oscillating genes.
>>> I thought of using the "cycle" package (which is based on Fourier
score) ,
>>> but I'm not sure what values to use: FPKM or DESeq/edgeR
normalized
>>> values.
>>> Any suggestion what would be more appropriate?
>>> Thanks
>>> Mali
>>>
>>
>>
______________________________________________________________________
>> The information in this email is confidential and intended solely
for the
>> addressee.
>> You must not disclose, forward, print or use it without the
permission of
>> the sender.
>>
______________________________________________________________________
>>
>
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}
Thank you so much Prof. Smyth for your reply.
I have been trying to follow your suggestions, and encountered an
error
when trying to run estimateGLMTagwiseDisp.
Below is the code I used and the design matrix.
Many thanks
Mali
>library(edgeR)
>x <- read.delim("count.matrix", row.names=1, stringsAsFactors=FALSE)
>x<-round(x) // since I have RSEM values
>y <- DGEList(counts=x)
>keep <- rowSums(cpm(y)>1) >= 4
>y <- y[keep,]
>y$samples$lib.size <- colSums(y$counts)
>y <- calcNormFactors(y)
>time <- rep(seq(2,22,4),4)
>s1 <- sin(2*pi*time/24)
>c1 <- cos(2*pi*time/24)
>tide<-c(1,0,0.5,1,0,0.5,0.5,1,0,0.5,1,0.5,1,0,0.5,1,0.5,0.5,0.5,1,0,0
.5,1,0) *//
I used 1 for high tide, 0 for low tide, and 0.5 in-between (rising or
falling)*
>tides <- sin(2*pi*tide)
>tidec <- cos(2*pi*tide)
>design <- model.matrix(~s1+c1+tides+tidec)
>design
(Intercept) s1 c1 tides tidec
1 1 0.5 8.660254e-01 -2.449294e-16 1
2 1 1.0 6.123234e-17 0.000000e+00 1
3 1 0.5 -8.660254e-01 1.224647e-16 -1
4 1 -0.5 -8.660254e-01 -2.449294e-16 1
5 1 -1.0 -1.836970e-16 0.000000e+00 1
6 1 -0.5 8.660254e-01 1.224647e-16 -1
7 1 0.5 8.660254e-01 1.224647e-16 -1
8 1 1.0 6.123234e-17 -2.449294e-16 1
9 1 0.5 -8.660254e-01 0.000000e+00 1
10 1 -0.5 -8.660254e-01 1.224647e-16 -1
11 1 -1.0 -1.836970e-16 -2.449294e-16 1
12 1 -0.5 8.660254e-01 1.224647e-16 -1
13 1 0.5 8.660254e-01 -2.449294e-16 1
14 1 1.0 6.123234e-17 0.000000e+00 1
15 1 0.5 -8.660254e-01 1.224647e-16 -1
16 1 -0.5 -8.660254e-01 -2.449294e-16 1
17 1 -1.0 -1.836970e-16 1.224647e-16 -1
18 1 -0.5 8.660254e-01 1.224647e-16 -1
19 1 0.5 8.660254e-01 1.224647e-16 -1
20 1 1.0 6.123234e-17 -2.449294e-16 1
21 1 0.5 -8.660254e-01 0.000000e+00 1
22 1 -0.5 -8.660254e-01 1.224647e-16 -1
23 1 -1.0 -1.836970e-16 -2.449294e-16 1
24 1 -0.5 8.660254e-01 0.000000e+00 1
y <- estimateGLMCommonDisp(y,design)
y <- estimateGLMTrendedDisp(y,design)
y <- estimateGLMTagwiseDisp(y,design)
*Error in dispCoxReidInterpolateTagwise(y, design, offset = offset,
dispersion, : *
* design matrix must be full column rank*
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-redhat-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] splines stats graphics grDevices utils datasets
methods
[8] base
other attached packages:
[1] edgeR_3.4.2 limma_3.18.12
loaded via a namespace (and not attached):
[1] tools_3.0.2
On Fri, Apr 4, 2014 at 2:19 AM, Gordon K Smyth <smyth@wehi.edu.au>
wrote:
>
> On Thu, 3 Apr 2014, mali salmon wrote:
>
> Dear Prof. Smith
>> Thank you so much for the explanation of how to do the analysis.
>> If you don't mind I'll describe the design of our experiment just
to be
>> sure that I'm not doing any mistake.
>> We work on a sea animal, and we are interested with day/night
oscillating
>> genes (24 hours cycle) and with tidal genes (12.4 h cycle).
>> In the first week we have sacrified animals every 4 hours across 48
h, and
>> then repeated the same procedure the week after.
>> So in total we have 24 samples, 4 "replicates" of 6 time points
>> (2am,6am,10am,2pm,6pm,10pm)
>> From your mail I understand that time is a vector of time points in
hours,
>> but I'm not sure how to define it, it can be:
>> 1) time <- rep(seq(2,46,4),2)
>> 2) time <- seq(2,94,4)
>> 3) time <- rep(seq(2,22,4),4)
>>
>
> Makes no difference. All three choices will give exactly the same
results.
>
>
> I saw this matter for the cos vector.
>> I suppose that for detecting tidal genes I just have to repeat the
>> analysis
>> with 12.4 cycle.
>>
>
> You need to carefully identify the tidal stage applicable to each of
the
> 24 samples. Clearly the tidal phase will have moved in the second
week.
> For example you might code 'tide' to be between 0 and 1, with 0/1
for high
> tide and 0.5 for low tide. Then:
>
> tides <- sin(2*pi*tide)
> tidec <- cos(2*pi*tide)
>
> Then you need to fit both day/night and tidal cycles to your data at
the
> same time:
>
> ~s1+c1+tides+tidec
>
> It may be difficult to distinguish a tidal cycle from a 12-hour
cycle. You
> might try to check check this by:
>
> ~s1+c1+s2+c2+tides+tidec
>
> Gordon
>
>
> Thanks again
>> Mali
>>
>>
>> On Thu, Apr 3, 2014 at 6:16 AM, Gordon K Smyth <smyth@wehi.edu.au>
wrote:
>>
>> Dear Mali,
>>>
>>> It is easy to do such an analysis directly in edgeR.
>>>
>>> Suppose for example that you are looking for a daily cycle, and
that you
>>> have at least half a dozen time points spanning more than a full
day.
>>> The
>>> method is to create basis vectors for a periodic expression
pattern.
>>> Suppose that 'time' is time in hours:
>>>
>>> s1 <- sin(2*pi*time/24)
>>> c1 <- cos(2*pi*time/24)
>>> design <- model.matrix(~s1+c1)
>>>
>>> This will give a design matrix with three columns: one for the
intercept
>>> and two for the periodic signal. To test for a periodic cycle:
>>>
>>> fit <- glmFit(y,design)
>>> lrt <- glmLRT(fit,coef=2:3)
>>>
>>> If you have lots of time points and you want to make a more
complex curve
>>> you can add a harmonic:
>>>
>>> s2 <- sin(4*pi*time/24)
>>> c2 <- cos(4*pi*time/24)
>>> design <- model.matrix(~s1+c1+s2+c2)
>>>
>>> There are now four coefficients parametrizing the periodic curve.
(These
>>> can be viewed as representing amplitudes and phase shifts for the
two
>>> harmonics.) I think you would seldom want more than two harmonics
for
>>> RNA-seq data.
>>>
>>> This method can easily be adapted to find cell-cycle genes and so
on.
>>>
>>> Best wishes
>>> Gordon
>>>
>>>
>>> Date: Tue, 1 Apr 2014 14:33:27 +0200
>>>
>>>> From: mali salmon <shalmom1@gmail.com>
>>>> To: "bioconductor@r-project.org" <bioconductor@r-project.org>
>>>> Subject: [BioC] detecting oscillating genes from RNA-seq data
>>>>
>>>> Hello List
>>>> I have RNA-seq data from different time points and I would like
to find
>>>> oscillating genes.
>>>> I thought of using the "cycle" package (which is based on Fourier
>>>> score) ,
>>>> but I'm not sure what values to use: FPKM or DESeq/edgeR
normalized
>>>> values.
>>>> Any suggestion what would be more appropriate?
>>>> Thanks
>>>> Mali
>>>>
>>>>
>>>
______________________________________________________________________
>>> The information in this email is confidential and intended solely
for the
>>> addressee.
>>> You must not disclose, forward, print or use it without the
permission of
>>> the sender.
>>>
______________________________________________________________________
>>>
>>>
>>
>
______________________________________________________________________
> The information in this email is confidential and
inte...{{dropped:10}}
On Sun, 6 Apr 2014, mali salmon wrote:
> Thank you so much Prof. Smyth for your reply.
> I have been trying to follow your suggestions, and encountered an
error
> when trying to run estimateGLMTagwiseDisp.
>
> Below is the code I used and the design matrix.
>
> Many thanks
> Mali
>
>> library(edgeR)
>> x <- read.delim("count.matrix", row.names=1,
stringsAsFactors=FALSE)
>> x<-round(x) // since I have RSEM values
>> y <- DGEList(counts=x)
>> keep <- rowSums(cpm(y)>1) >= 4
>> y <- y[keep,]
>> y$samples$lib.size <- colSums(y$counts)
>> y <- calcNormFactors(y)
>> time <- rep(seq(2,22,4),4)
>> s1 <- sin(2*pi*time/24)
>> c1 <- cos(2*pi*time/24)
>> tide<-c(1,0,0.5,1,0,0.5,0.5,1,0,0.5,1,0.5,1,0,0.5,1,0.5,0.5,0.5,1,0
,0.5,1,0) *//
> I used 1 for high tide, 0 for low tide, and 0.5 in-between (rising
or
> falling)*
No, this isn't going to work, for several reasons.
First, you have equated low and high tide as the same value, because a
periodic function returns to the same value at 1 as it had at 0. You
need
to do instead what I suggested, which was to use 0.5 for low tide and
0 or
1 (makes no difference which) for high tide.
Second, you can't simply put every in-between stage to the same value.
This is far, far too rough. The rising and falling stages are not the
same. The successive values for 'tide' have to be at intervals of
4hr/12.4hr = 0.323 for observations that are four hours apart.
Third, you have coded the second week as being in exactly the same
tidal
phase as for the first week at each clock time. This cannot be true,
and
is again far too rough. If the tidal cycle has a period of 12.4
hours,
then there must be a gradual phase shift in the tidal cycle by about
0.8/12.4 each day.
You have to code the tidal values more precisely, and you have to
determine accurately the phase change of the tidal cycle between the
first
week and the second. For the analysis to have any hope of working you
have to be able to interpolate all the tidal values accurately to at
least
one or two decimal places. If you can't do it as accurately as that,
then
it won't be worth doing it all.
It may also be good idea to consult a mathematician or statistician at
your university to help you interpret the results.
Good luck
Gordon
>> tides <- sin(2*pi*tide)
>> tidec <- cos(2*pi*tide)
>> design <- model.matrix(~s1+c1+tides+tidec)
>
>> design
> (Intercept) s1 c1 tides tidec
> 1 1 0.5 8.660254e-01 -2.449294e-16 1
> 2 1 1.0 6.123234e-17 0.000000e+00 1
> 3 1 0.5 -8.660254e-01 1.224647e-16 -1
> 4 1 -0.5 -8.660254e-01 -2.449294e-16 1
> 5 1 -1.0 -1.836970e-16 0.000000e+00 1
> 6 1 -0.5 8.660254e-01 1.224647e-16 -1
> 7 1 0.5 8.660254e-01 1.224647e-16 -1
> 8 1 1.0 6.123234e-17 -2.449294e-16 1
> 9 1 0.5 -8.660254e-01 0.000000e+00 1
> 10 1 -0.5 -8.660254e-01 1.224647e-16 -1
> 11 1 -1.0 -1.836970e-16 -2.449294e-16 1
> 12 1 -0.5 8.660254e-01 1.224647e-16 -1
> 13 1 0.5 8.660254e-01 -2.449294e-16 1
> 14 1 1.0 6.123234e-17 0.000000e+00 1
> 15 1 0.5 -8.660254e-01 1.224647e-16 -1
> 16 1 -0.5 -8.660254e-01 -2.449294e-16 1
> 17 1 -1.0 -1.836970e-16 1.224647e-16 -1
> 18 1 -0.5 8.660254e-01 1.224647e-16 -1
> 19 1 0.5 8.660254e-01 1.224647e-16 -1
> 20 1 1.0 6.123234e-17 -2.449294e-16 1
> 21 1 0.5 -8.660254e-01 0.000000e+00 1
> 22 1 -0.5 -8.660254e-01 1.224647e-16 -1
> 23 1 -1.0 -1.836970e-16 -2.449294e-16 1
> 24 1 -0.5 8.660254e-01 0.000000e+00 1
>
>
> y <- estimateGLMCommonDisp(y,design)
> y <- estimateGLMTrendedDisp(y,design)
> y <- estimateGLMTagwiseDisp(y,design)
>
> *Error in dispCoxReidInterpolateTagwise(y, design, offset = offset,
> dispersion, : *
> * design matrix must be full column rank*
>
>
>> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-redhat-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] splines stats graphics grDevices utils datasets
methods
> [8] base
>
> other attached packages:
> [1] edgeR_3.4.2 limma_3.18.12
>
> loaded via a namespace (and not attached):
> [1] tools_3.0.2
>
>
> On Fri, Apr 4, 2014 at 2:19 AM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote:
>
>>
>> On Thu, 3 Apr 2014, mali salmon wrote:
>>
>> Dear Prof. Smith
>>> Thank you so much for the explanation of how to do the analysis.
>>> If you don't mind I'll describe the design of our experiment just
to be
>>> sure that I'm not doing any mistake.
>>> We work on a sea animal, and we are interested with day/night
oscillating
>>> genes (24 hours cycle) and with tidal genes (12.4 h cycle).
>>> In the first week we have sacrified animals every 4 hours across
48 h, and
>>> then repeated the same procedure the week after.
>>> So in total we have 24 samples, 4 "replicates" of 6 time points
>>> (2am,6am,10am,2pm,6pm,10pm)
>>> From your mail I understand that time is a vector of time points
in hours,
>>> but I'm not sure how to define it, it can be:
>>> 1) time <- rep(seq(2,46,4),2)
>>> 2) time <- seq(2,94,4)
>>> 3) time <- rep(seq(2,22,4),4)
>>>
>>
>> Makes no difference. All three choices will give exactly the same
results.
>>
>>
>> I saw this matter for the cos vector.
>>> I suppose that for detecting tidal genes I just have to repeat the
>>> analysis
>>> with 12.4 cycle.
>>>
>>
>> You need to carefully identify the tidal stage applicable to each
of the
>> 24 samples. Clearly the tidal phase will have moved in the second
week.
>> For example you might code 'tide' to be between 0 and 1, with 0/1
for high
>> tide and 0.5 for low tide. Then:
>>
>> tides <- sin(2*pi*tide)
>> tidec <- cos(2*pi*tide)
>>
>> Then you need to fit both day/night and tidal cycles to your data
at the
>> same time:
>>
>> ~s1+c1+tides+tidec
>>
>> It may be difficult to distinguish a tidal cycle from a 12-hour
cycle. You
>> might try to check check this by:
>>
>> ~s1+c1+s2+c2+tides+tidec
>>
>> Gordon
>>
>>
>> Thanks again
>>> Mali
>>>
>>>
>>> On Thu, Apr 3, 2014 at 6:16 AM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote:
>>>
>>> Dear Mali,
>>>>
>>>> It is easy to do such an analysis directly in edgeR.
>>>>
>>>> Suppose for example that you are looking for a daily cycle, and
that you
>>>> have at least half a dozen time points spanning more than a full
day.
>>>> The
>>>> method is to create basis vectors for a periodic expression
pattern.
>>>> Suppose that 'time' is time in hours:
>>>>
>>>> s1 <- sin(2*pi*time/24)
>>>> c1 <- cos(2*pi*time/24)
>>>> design <- model.matrix(~s1+c1)
>>>>
>>>> This will give a design matrix with three columns: one for the
intercept
>>>> and two for the periodic signal. To test for a periodic cycle:
>>>>
>>>> fit <- glmFit(y,design)
>>>> lrt <- glmLRT(fit,coef=2:3)
>>>>
>>>> If you have lots of time points and you want to make a more
complex curve
>>>> you can add a harmonic:
>>>>
>>>> s2 <- sin(4*pi*time/24)
>>>> c2 <- cos(4*pi*time/24)
>>>> design <- model.matrix(~s1+c1+s2+c2)
>>>>
>>>> There are now four coefficients parametrizing the periodic curve.
(These
>>>> can be viewed as representing amplitudes and phase shifts for the
two
>>>> harmonics.) I think you would seldom want more than two
harmonics for
>>>> RNA-seq data.
>>>>
>>>> This method can easily be adapted to find cell-cycle genes and so
on.
>>>>
>>>> Best wishes
>>>> Gordon
>>>>
>>>>
>>>> Date: Tue, 1 Apr 2014 14:33:27 +0200
>>>>
>>>>> From: mali salmon <shalmom1 at="" gmail.com="">
>>>>> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">
>>>>> Subject: [BioC] detecting oscillating genes from RNA-seq data
>>>>>
>>>>> Hello List
>>>>> I have RNA-seq data from different time points and I would like
to find
>>>>> oscillating genes.
>>>>> I thought of using the "cycle" package (which is based on
Fourier
>>>>> score) ,
>>>>> but I'm not sure what values to use: FPKM or DESeq/edgeR
normalized
>>>>> values.
>>>>> Any suggestion what would be more appropriate?
>>>>> Thanks
>>>>> Mali
>>>>>
>>>>>
>>>>
______________________________________________________________________
>>>> The information in this email is confidential and intended solely
for the
>>>> addressee.
>>>> You must not disclose, forward, print or use it without the
permission of
>>>> the sender.
>>>>
______________________________________________________________________
>>>>
>>>>
>>>
>>
______________________________________________________________________
>> The information in this email is confidential and intended solely
for the
>> addressee.
>> You must not disclose, forward, print or use it without the
permission of
>> the sender.
>>
______________________________________________________________________
>>
>
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}