I am running a model that includes a time of day variable. I have tried formatting the time variable as hours-minutes-seconds or universal time with proxitct. Both occasions cause the output of time to change to a numerical value that is not understandable in the context of time.
Example:
mod = model.matrix( ̴ factor(CaseControl) + factor(Gender) + factor(Smoker) + Age + Blooddraw_time, data = targets
Output:
The column of Blooddrawtime variables now includes: 36900, 33300, 61800, 3300, 68400, 29100, etc, instead of recognisable time values.
Any suggestions as to what has happened here?
Jonelle Villar
Any date/time construct in R (or like, any software) is the number of time units since some base time. So POSIXct is the number of seconds since 1/1/1970 UTC. The print method just shows you something pretty that makes sense.
> times <- Sys.time() - seq(0, 3600*10, 3600)
> times
[1] "2020-09-08 14:01:13 EDT" "2020-09-08 13:01:13 EDT"
[3] "2020-09-08 12:01:13 EDT" "2020-09-08 11:01:13 EDT"
[5] "2020-09-08 10:01:13 EDT" "2020-09-08 09:01:13 EDT"
[7] "2020-09-08 08:01:13 EDT" "2020-09-08 07:01:13 EDT"
[9] "2020-09-08 06:01:13 EDT" "2020-09-08 05:01:13 EDT"
[11] "2020-09-08 04:01:13 EDT"
## but that's not really what the times are - they are really big numbers
> as.numeric(times)
[1] 1599588073 1599584473 1599580873 1599577273 1599573673 1599570073
[7] 1599566473 1599562873 1599559273 1599555673 1599552073
## and if you use them in a formula, they get coerced
> model.matrix(~times)
(Intercept) times
1 1 1599588073
2 1 1599584473
3 1 1599580873
4 1 1599577273
5 1 1599573673
6 1 1599570073
7 1 1599566473
8 1 1599562873
9 1 1599559273
10 1 1599555673
11 1 1599552073
You probably want to do something more reasonable like set the blood draw time as minutes after (after what? I dunno. After you did the thing that the blood draw follows?). That's if the blood draw time is just a nuisance variable. Otherwise it could be a factor if you are doing things like interactions between case/control status and time points.
Thanks for your comments. I will look into setting the minutes or maybe just the hour. Time of blood draw does make a difference with our data when looking at the activities of white blood cell proportions. Anyway, I´ll let you know what happens.
James,
A new day, a new question. I gave some thought today to setting the blood draw time as you suggested.
Rethinking things, I want the variable blooddrawtime in the model because I want to investigate in limma if there is a differential effect associated with time of blood draw, when compared to a model without the time variable. So actually, I don't need to be able to read the time variable as hms. The numbers representing hms will be coerced into e.g. your times column, but does it matter as long as limma can handle it? Or should I make a new column and divide by 60sec to get the total minutes and hopefully a smaller number?
There are two parts to your question. First, does it matter if the time variable is some huge number, rather than a smaller number? Effectively, no, at least I don't think so. If you fit the model with using the POSIXct values, then the parameter estimate is the change in gene expression for each unit change in time (which is seconds). So that's probably OK, but like you say you could divide by 60 to make it changes in gene expression for each minute.
However, using super wacky numbers like that in your model affects the intercept, which you may or may not care about. Even if you change the POSIXct to minutes instead of seconds, your intercept is the gene expression on 1/1/1970, which is probably not what you want. Which is why people tend to do things like what I recommended (set baseline to 0, and the other times as minutes after that point in time), in which case the intercept is the gene expression at baseline, which is interpretable in the context of the experiment.
And I am back. So, the time variable in my dataset is formatted with the hms package in Tidyverse which stores time as the number of seconds since 00:00:00. So this would help with determining my baseline. I tried truncating the seconds in order to calculate the minutes since 00:00, but didn't have any luck. I spoke to our colleague who manages the database and he said that deleted all the values of seconds in the Excel sheet. This means that I still have the format of the number of seconds since 00:00:00. I can't convert time to a numerical variable in order to divide by 60 to get the minutes. So I am not getting anywhere with library(hms). Are you familiar with this library?
I have my date and time variables in two different columns. Maybe I could combine them and then try POSIXct ? Do you have any detailed steps I could follow?
I actually try to avoid the tidyverse if at all possible, so no, I don't know anything about the hms package. But it seems trivial to convert?
> as.numeric( hms(56, 34, 12))/60
[1] 754.9333
Anyway, this conversation is getting off-topic. This support site is intended for help with Bioconductor tools. For general R questions you could try Biostars or Stackoverflow or R-help.
Hi James!
Thanks for your comments. I will look into setting the minutes or maybe just the hour. Time of blood draw does make a difference with our data when looking at the activities of white blood cell proportions. Anyway, I´ll let you know what happens.
Best regards from Bergen
James, A new day, a new question. I gave some thought today to setting the blood draw time as you suggested.
Rethinking things, I want the variable blooddrawtime in the model because I want to investigate in limma if there is a differential effect associated with time of blood draw, when compared to a model without the time variable. So actually, I don't need to be able to read the time variable as hms. The numbers representing hms will be coerced into e.g. your times column, but does it matter as long as limma can handle it? Or should I make a new column and divide by 60sec to get the total minutes and hopefully a smaller number?
Thanks.
There are two parts to your question. First, does it matter if the time variable is some huge number, rather than a smaller number? Effectively, no, at least I don't think so. If you fit the model with using the POSIXct values, then the parameter estimate is the change in gene expression for each unit change in time (which is seconds). So that's probably OK, but like you say you could divide by 60 to make it changes in gene expression for each minute.
However, using super wacky numbers like that in your model affects the intercept, which you may or may not care about. Even if you change the POSIXct to minutes instead of seconds, your intercept is the gene expression on 1/1/1970, which is probably not what you want. Which is why people tend to do things like what I recommended (set baseline to 0, and the other times as minutes after that point in time), in which case the intercept is the gene expression at baseline, which is interpretable in the context of the experiment.
Thank you for the explanation. I understand better what I need to do.
Best greetings from Bergen.
And I am back. So, the time variable in my dataset is formatted with the hms package in Tidyverse which stores time as the number of seconds since 00:00:00. So this would help with determining my baseline. I tried truncating the seconds in order to calculate the minutes since 00:00, but didn't have any luck. I spoke to our colleague who manages the database and he said that deleted all the values of seconds in the Excel sheet. This means that I still have the format of the number of seconds since 00:00:00. I can't convert time to a numerical variable in order to divide by 60 to get the minutes. So I am not getting anywhere with library(hms). Are you familiar with this library?
I have my date and time variables in two different columns. Maybe I could combine them and then try POSIXct ? Do you have any detailed steps I could follow?
I actually try to avoid the tidyverse if at all possible, so no, I don't know anything about the hms package. But it seems trivial to convert?
Anyway, this conversation is getting off-topic. This support site is intended for help with Bioconductor tools. For general R questions you could try Biostars or Stackoverflow or R-help.
Yes, of course. Thank you again.
My browser is acting up. Sorry.
Apologies for the repeat.