variancePartition - testing for individuals
1
0
Entering edit mode
rina ▴ 30
@rina-16738
Last seen 15 months ago
France

Hi all!

I am analyzing the variance sources of TCGA expression data using variancePartition. I want to check among others the effect of individuals in the variance, but when I specify it at the formula, I get the following errors:

> form <- ~  submitter_id
> varPart <- fitExtractVarPartModel(Filt_EXP1, form, clin)
Error in checkModelStatus(fit, showWarnings = showWarnings, colinearityCutoff) : 
  Colinear score = 1 > 0.999 
Covariates in the formula are so strongly correlated that the
parameter estimates from this model are not meaningful.
Dropping one or more of the covariates will fix this problem

> form <- ~ (1|submitter_id)
> varPart <- fitExtractVarPartModel(Filt_EXP1, form, clin)
Error: number of levels of each grouping factor must be < number of observations

> form <- ~ (0|submitter_id)
> varPart <- fitExtractVarPartModel(Filt_EXP1, form, clin)
Error in (function (cl, name, valueClass)  : 
  assignment of an object of class “numeric” is not valid for @‘Dim’ in an object of class “dgTMatrix”; is(value, "integer") is not TRUE

I can see why the above return an error, but at the vignette effect of individuals can be tested for. Is it a specific way I should specify it?

Thanks in advance.

R.

variance formula expression • 1.6k views
ADD COMMENT
0
Entering edit mode
@mikhaelmanurung-17423
Last seen 2.5 years ago
Netherlands

Dear Rina,

To identify the proportion of variance due to between-individual differences, you should provide the information in the metadata.

Your second version of the formula is the correct way to define it (the first formula will treat submitter_id as a continuous variable). And keep in mind to define the variable submitter_idas a factor.

Best,

Mikhael

ADD COMMENT
0
Entering edit mode

Hi Mikhael,

Thank you for your response. I have defined the submitter_id as a factor, but I keep getting the error Error: number of levels of each grouping factor must be < number of observations.

Here is the structure of my data, in case it helps.

 > glimpse(clin[1:3,])
    Observations: 3
    Variables: 42
    $ submitter_id                      <fct> TCGA-3L-AA1B, TCGA-5M-AATE, TCGA-A6-2677
    $ classification_of_tumor           <chr> "not reported", "not reported", "not reported"
    $ last_known_disease_status         <chr> "not reported", "not reported", "not reported"
    $ updated_datetime                  <chr> "2018-09-06T16:20:48.972378-05:00", "2018-09-06T16:20:48.972378-05:00", "2018-09-06T16:20:...
    $ primary_diagnosis                 <chr> "Adenocarcinoma, NOS", "Adenocarcinoma, NOS", "Adenocarcinoma, NOS"
    $ tumor_stage                       <chr> "stage i", "stage iia", "stage iiic"
    $ age_at_diagnosis                  <int> 22379, 27870, 25143
    $ vital_status                      <chr> "alive", "alive", "dead"
    $ morphology                        <chr> "8140/3", "8140/3", "8140/3"
    $ days_to_death                     <dbl> NA, NA, 740
    $ days_to_last_known_disease_status <lgl> NA, NA, NA
    $ created_datetime                  <lgl> NA, NA, NA
    $ state                             <chr> "released", "released", "released"
    $ days_to_recurrence                <lgl> NA, NA, NA
    $ diagnosis_id                      <chr> "6eb0d5b6-cb00-519f-838e-119b548ac582", "77253688-4400-5836-886d-80d5758d41c6", "1b22285c-...
    $ tumor_grade                       <chr> "not reported", "not reported", "not reported"
    $ tissue_or_organ_of_origin         <chr> "Cecum", "Ascending colon", "Colon, NOS"
    $ days_to_birth                     <dbl> -22379, -27870, -25143
    $ progression_or_recurrence         <chr> "not reported", "not reported", "not reported"
    $ prior_malignancy                  <chr> "not reported", "not reported", "not reported"
    $ site_of_resection_or_biopsy       <chr> "Cecum", "Ascending colon", "Cecum"
    $ days_to_last_follow_up            <dbl> 475, 1200, 541
    $ cigarettes_per_day                <lgl> NA, NA, NA
    $ weight                            <dbl> 63.3, 75.4, 55.2
    $ alcohol_history                   <lgl> NA, NA, NA
    $ alcohol_intensity                 <lgl> NA, NA, NA
    $ bmi                               <dbl> 21.15006, 24.06716, 21.56250
    $ years_smoked                      <lgl> NA, NA, NA
    $ exposure_id                       <chr> "44b839cb-c3d7-5a99-9dea-90b839882b9a", "74316476-27f2-5d5f-b3fb-20f69e8a8960", "59056466-...
    $ height                            <dbl> 173, 177, 160
    $ gender                            <chr> "female", "male", "female"
    $ year_of_birth                     <int> 1952, 1935, 1941
    $ race                              <chr> "black or african american", "black or african american", "white"
    $ demographic_id                    <chr> "2a3b1379-9507-580d-9628-4b502a720cc4", "d2b2e4d7-419d-5370-94f3-2adeb4606b07", "0fa0b722-...
    $ ethnicity                         <chr> "not hispanic or latino", "hispanic or latino", "not hispanic or latino"
    $ year_of_death                     <int> NA, NA, NA
    $ treatment_id                      <chr> "08bfaf92-3b30-5724-ac84-dac862df44bc", "44ff7b56-834d-5e65-a072-0f8ca2e577fd", "9e66ff79-...
    $ therapeutic_agents                <lgl> NA, NA, NA
    $ treatment_intent_type             <lgl> NA, NA, NA
    $ treatment_or_therapy              <lgl> NA, NA, NA
    $ bcr_patient_barcode               <chr> "TCGA-3L-AA1B", "TCGA-5M-AATE", "TCGA-A6-2677"
    $ disease                           <chr> "COAD", "COAD", "COAD"




    > clin$submitter_id

    137 Levels: TCGA-3L-AA1B TCGA-5M-AATE TCGA-A6-2677 TCGA-A6-2680 TCGA-A6-2683 TCGA-A6-4107 TCGA-A6-5656 TCGA-A6-5659 ... TCGA-T9-A92H

    > glimpse(Filt_EXP1[1:5,1:3])
    Observations: 5
    Variables: 3
    $ `TCGA-G4-6317` <dbl> 10349, 30, 3331, 403, 333
    $ `TCGA-CM-6164` <dbl> 9263, 51, 3263, 364, 315
    $ `TCGA-CM-4750` <dbl> 11120, 457, 2937, 483, 265

Hope that helps!

Thank you.

ADD REPLY
0
Entering edit mode

How many samples do you have for every submitter_id? I suspect that the warning is caused because you have an equal number of observations and unique submitter_id.

ADD REPLY
0
Entering edit mode

Exactly. It's one sample per patient. This is what I thought the problem was about too, but I don't know how I can solve for this.

ADD REPLY
0
Entering edit mode

What if you include other variables as random effects (such as tumor_stage)?

ADD REPLY
0
Entering edit mode

For form <- ~ (1|tumor_stage)analysis runs as normal

But in the case of form <- ~ (1|tumor_stage) + (1|submitter_id) Error: number of levels of each grouping factor must be < number of observations

ADD REPLY

Login before adding your answer.

Traffic: 655 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6