Question

Merge column to SummarizedExperiment from other dataframe

1

Entering edit mode

s.w.vanderlaan ▴ 30

@swvanderlaan-12768

Last seen 6.5 years ago

Hi,

I have a RangedSummarizedExperiment which looks like this:

class: RangedSummarizedExperiment&nbsp;

dim: 483731 485&nbsp;

metadata(4): creationDate author BBMRIomicsVersion note

assays(1): data

rownames(483731): cg01707559 cg02004872 ... ch.22.47579720R ch.22.48274842R

rowData names(10): addressA addressB ... probeEnd probeTarget

colnames(485): 200397860027_R01C01 200397860027_R02C02 ... 200556930046_R03C01 200556930046_R06C02

colData names(946): STUDY_NUMBER SampleID ... Basename ID

And I have a dataframe which looks like this:

STUDY_NUMBER    UPID    Testosterone    Estradiol    SHBG    Gender
1    1    NA    NA    NA    male
2    2    NA    NA    NA    male
3    3    10.02    62    49.6    male
4    4    NA    NA    NA    male
5    5    NA    NA    NA    female

I would like to merge this table (n rows = 3662), based on STUDY_NUMBER. So I used the following code:

colData(aems450k1.MvaluesQCIMPplaqueSE) <- merge(colData(aems450k1.MvaluesQCIMPplaqueSE), AEDB_Q1_20180223_sex, by = "STUDY_NUMBER", all.x = TRUE)

Which results in the following RangedSummarizedExperiment object:

class: RangedSummarizedExperiment&nbsp;

dim: 483731 485&nbsp;

metadata(4): creationDate author BBMRIomicsVersion note

assays(1): data

rownames(483731): cg01707559 cg02004872 ... ch.22.47579720R ch.22.48274842R

rowData names(10): addressA addressB ... probeEnd probeTarget

colnames: NULL

colData names(952): STUDY_NUMBER SampleID ... Sex T_E2

You'll note that colnames is now NULL. My question therefore:

How can I prevent this from happening?

My second question:

Could this be happening because the order (based on STUDY_NUMBER) of the two dataframes are not the same?

In fact: Could this result in the colData being 'uncoupled' from the Assay data? Reason of I am thinking this, is because an analysis on a variable X in the dataset (not in the merged-data) results in a significant result. After merging (the variable X has not changed!), the exact same analysis is not significant anymore...

Many thanks,

Sander

summarizedexperiment rangedsummarizedexperiment illimina 450k methylation • 2.1k views

ADD COMMENT • link 6.5 years ago s.w.vanderlaan ▴ 30

score 0 · Answer 1 · 2018-10-17

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 5 days ago

United States

When you merge the colData and your data.frame you end up changing the rownames of the resulting DataFrame, which is where the colnames come from. You could just do

cn <- colnames(aems450k1.MvaluesQCIMPplaqueSE)

colData(aems450k1.MvaluesQCIMPplaqueSE) <- merge(colData(aems450k1.MvaluesQCIMPplaqueSE), AEDB_Q1_20180223_sex, by.x = "STUDY_NUMBER", by.y = "STUDY_NUMBER", all.x = TRUE)

colnames(aems450k1.MvaluesQCIMPplaqueSE) <- cn

ADD COMMENT • link 6.5 years ago James W. MacDonald 68k

0

Entering edit mode

I think the issue is that the colData gets a different order than the Assay data, which should not happen. But if I sort = to the merge command everything is just fine, and I can add the colnames later on. So:

dim(aems450k1.MvaluesQCIMPplaqueSE) 

aems450k1.MvaluesQCIMPplaqueSE 

colData(aems450k1.MvaluesQCIMPplaqueSE) <- merge(colData(aems450k1.MvaluesQCIMPplaqueSE), AEDB_Q1_20180223_sex, by = "STUDY_NUMBER", sort = FALSE) 

colnames(aems450k1.MvaluesQCIMPplaqueSE) <- aems450k1.MvaluesQCIMPplaqueSE$ID 

dim(aems450k1.MvaluesQCIMPplaqueSE)

Which results in :

class: RangedSummarizedExperiment 
dim: 483731 485 
metadata(4): creationDate author BBMRIomicsVersion note
assays(1): data
rownames(483731): cg01707559 cg02004872 ... ch.22.47579720R ch.22.48274842R
rowData names(10): addressA addressB ... probeEnd probeTarget
colnames(485): 8918692001_R01C01 8918692001_R02C01 ... 9221198166_R06C01 9221198166_R06C02
colData names(946): STUDY_NUMBER SampleID ... Basename ID

Which is the correct order in the colnames. While without sort =, the order of colnames would be like colnames(485): 9221198166_R06C02 9221198166_R06C01 ... 8918692001_R02C01 8918692001_R01C01.

Does this makes sense?

ADD REPLY • link 6.5 years ago s.w.vanderlaan ▴ 30

score 0 · Answer 2 · 2018-10-17

I think the issue is that the colData gets a different order than the Assay data, which should not happen. But if I sort = to the merge command everything is just fine, and I can add the colnames later on. So:

dim(aems450k1.MvaluesQCIMPplaqueSE) 

aems450k1.MvaluesQCIMPplaqueSE 

colData(aems450k1.MvaluesQCIMPplaqueSE) <- merge(colData(aems450k1.MvaluesQCIMPplaqueSE), AEDB_Q1_20180223_sex, by = "STUDY_NUMBER", sort = FALSE) 

colnames(aems450k1.MvaluesQCIMPplaqueSE) <- aems450k1.MvaluesQCIMPplaqueSE$ID 

dim(aems450k1.MvaluesQCIMPplaqueSE)

Which results in :

class: RangedSummarizedExperiment 
dim: 483731 485 
metadata(4): creationDate author BBMRIomicsVersion note
assays(1): data
rownames(483731): cg01707559 cg02004872 ... ch.22.47579720R ch.22.48274842R
rowData names(10): addressA addressB ... probeEnd probeTarget
colnames(485): 8918692001_R01C01 8918692001_R02C01 ... 9221198166_R06C01 9221198166_R06C02
colData names(946): STUDY_NUMBER SampleID ... Basename ID

Which is the correct order in the colnames. While without sort =, the order of colnames would be like colnames(485): 9221198166_R06C02 9221198166_R06C01 ... 8918692001_R02C01 8918692001_R01C01.

Does this makes sense?