Hi All,
I am trying to perform differential gene expression of longitudinal data. In this longitudinal study 62 consecutive patients were enrolled and followed up prospectively. Age, disease activity were documented of all these patients in their each visit. The main challenge here is few patients has 2 visits, few has 3 visits, few has more than three. And few patients has missing information too, like one patients has visit1, no visit2 but there is information for visit3. I would like to find the differential gene expression of these diseased patients over the time. They do not have any controls.
Attached below is my sample data.
Title | Accession | Patient.ID | Visit | cohort | SLEDAI | A.DSD | Age |
C1PL10V1 | GSM1199504 | 1 | 1 | SLE | 2 | 4 | 26 |
C1PL10V2 | GSM1199472 | 1 | 2 | SLE | 2 | 4 | 27 |
C1PL11V1 | GSM1199537 | 2 | 1 | SLE | 2 | 29 | 25 |
C1PL11V2 | GSM1199358 | 2 | 2 | SLE | 2 | 20 | 25 |
C1PL11V3 | GSM1199519 | 2 | 3 | SLE | 0 | 14 | 26 |
C1PL12V1 | GSM1199513 | 3 | 1 | SLE | 10 | 86 | 59 |
C1PL12V3 | GSM1199452 | 3 | 3 | SLE | 10 | 74 | 60 |
C1PL13V1 | GSM1199540 | 4 | 1 | SLE | 10 | 196 | 54 |
C1PL13V2 | GSM1199493 | 4 | 2 | SLE | 4 | 228 | 54 |
C1PL14V1 | GSM1199480 | 5 | 1 | SLE | 4 | 110 | 39 |
C1PL14V2 | GSM1199539 | 5 | 2 | SLE | 6 | 258 | 40 |
C1PL15V1 | GSM1199498 | 6 | 1 | SLE | 8 | 69 | 67 |
C1PL16V1 | GSM1199403 | 7 | 1 | SLE | 4 | 50 | 32 |
C1PL16V2 | GSM1199365 | 7 | 2 | SLE | 4 | 52 | 32 |
C1PL1V1 | GSM1199435 | 8 | 1 | SLE | 12 | 14 | 26 |
C1PL1V2 | GSM1199442 | 8 | 2 | SLE | 8 | 13 | 26 |
C1PL1V3 | GSM1199406 | 8 | 3 | SLE | 12 | 7 | 27 |
C1PL2V2 | GSM1199461 | 9 | 2 | SLE | 0 | 11 | 31 |
C1PL2V3 | GSM1199373 | 9 | 3 | SLE | 2 | 10 | 31 |
C1PL3V1 | GSM1199482 | 10 | 1 | SLE | 4 | 3 | 38 |
C1PL3V2 | GSM1199477 | 10 | 2 | SLE | 6 | 5 | 38 |
I have tried to see other threads related to time course and longitudinal data. But did not find anything on how to handle missing data. I am not sure if ANOVA would be better or if there are any other packages to deal this kind of data. I am thinking of ANOVA because of it is similar to pre-post analysis. But there is no drug treatment here. I would like see change in the patients over time with respect to disease activity. If you see the above table there are few patient ids for those the disease activity remain same and vary for few.
Any help or suggestions would be really appreciated. Thanks in advance.
Best,
Prathyusha
Is the second visit for every patient qualitatively the same? Why do some patients have three or more visits?
The patients are paired samples. The patient were enrolled between 2009 to 2011. Basically the patients had follow up visits. For example the patientid 1 is same patient but quantitatively they are different because if you see the age parameter is different. And the reason was not mentioned why few patients has more than two visits. I am assuming that few patients would have missed their follow up visits. The major challenge is missed follow up visits data (unequal spaced time points). I am not sure how to deal this problem. I am not sure which statistical test is appropriate to deal this kind of data.
Thanks in advance.
I think you could look into mixed models with repeated measures instead of ANOVA, they can handle missing data I think.
Another option is data imputation prior to doing a repeated ANOVA, but I think the above option is usually preferred.
Thank you Chris. I would try mixed models with repeated measures.