How to obtain the two gene expression datasets from a dual channel GEO dataset?
1
0
Entering edit mode
@davidechicco-19845
Last seen 2.4 years ago
Canada

Dear Biocoductor community,

I am trying to analyze a GEO dataset that is based on a dual channel platform, but I don't know how to split the two gene expressions. The dataset is the following one: GSE7339 . As you can see, each sample has signals of two channels: LKN and LKT.

I downloaded this dataset with the following commands:

listOfBiocPackages <- c("annotate", "GEOquery")
library("easypackages")
libraries(list.of.packages)

GSE_code <- "GSE7339"
gset <- getGEO(GSE_code,  GSEMatrix =TRUE, getGPL=FALSE)

if (length(gset) > 1) idx <- grep(thisGEOplatform, attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]

And now my gset@phenoData@data variable contains the following fields:

str(gset@phenoData@data)
'data.frame':   100 obs. of  57 variables:
 $ title                            : chr  "Lung cancer 01" "Lung cancer 02" "Lung cancer 03" "Lung cancer 04" ...
 $ geo_accession                    : chr  "GSM176905" "GSM176906" "GSM176907" "GSM176908" ...
 $ status                           : chr  "Public on Mar 30 2007" "Public on Mar 30 2007" "Public on Mar 30 2007" "Public on Mar 30 2007" ...
 $ submission_date                  : chr  "Mar 22 2007" "Mar 22 2007" "Mar 22 2007" "Mar 22 2007" ...
 $ last_update_date                 : chr  "Mar 30 2007" "Mar 30 2007" "Mar 30 2007" "Mar 30 2007" ...
 $ type                             : chr  "RNA" "RNA" "RNA" "RNA" ...
 $ channel_count                    : chr  "2" "2" "2" "2" ...
 $ source_name_ch1                  : chr  "LK01N" "LK02N" "LK03N" "LK04N" ...
 $ organism_ch1                     : chr  "Homo sapiens" "Homo sapiens" "Homo sapiens" "Homo sapiens" ...
 $ characteristics_ch1              : chr  "Non-tumorus tissue" "Non-tumorus tissue" "Non-tumorus tissue" "Non-tumorus tissue" ...
 $ characteristics_ch1.1            : chr  "Age:71" "Age:71" "Age:73" "Age:62" ...
 $ characteristics_ch1.2            : chr  "Gender:Male" "Gender:Male" "Gender:Male" "Gender:Female" ...
 $ characteristics_ch1.3            : chr  "Right lung" "Right lung" "Left lung" "Right lung" ...
 $ characteristics_ch1.4            : chr  "Tissue: resected Adenocarcinoma" "Tissue: resected Adenocarcinoma" "Tissue: resected Squamous cell carcinoma" "Tissue: resected Adenocarcinoma" ...
 $ molecule_ch1                     : chr  "total RNA" "total RNA" "total RNA" "total RNA" ...
 $ extract_protocol_ch1             : chr  "TriZol procedure" "TriZol procedure" "TriZol procedure" "TriZol procedure" ...
 $ label_ch1                        : chr  "Cy3" "Cy3" "Cy3" "Cy3" ...
 $ label_protocol_ch1               : chr  "Amino Allyl aRNA was synthesis by Amino Allyl MessageAmp?  aRNA Amplification Kit (Ambion). CyeDye Coupling and"| __truncated__ "Amino Allyl aRNA was synthesis by Amino Allyl MessageAmp?  aRNA Amplification Kit (Ambion). CyeDye Coupling and"| __truncated__ "Amino Allyl aRNA was synthesis by Amino Allyl MessageAmp?  aRNA Amplification Kit (Ambion). CyeDye Coupling and"| __truncated__ "Amino Allyl aRNA was synthesis by Amino Allyl MessageAmp?  aRNA Amplification Kit (Ambion). CyeDye Coupling and"| __truncated__ ...
 $ taxid_ch1                        : chr  "9606" "9606" "9606" "9606" ...
 $ source_name_ch2                  : chr  "LK01T" "LK02T" "LK03T" "LK04T" ...
 $ organism_ch2                     : chr  "Homo sapiens" "Homo sapiens" "Homo sapiens" "Homo sapiens" ...
 $ characteristics_ch2              : chr  "tumor tissue" "tumor tissue" "tumor tissue" "tumor tissue" ...
 $ characteristics_ch2.1            : chr  "Histlogical Type:Adenocarcinoma" "Histlogical Type:Adenocarcinoma" "Histlogical Type:Squamous cell carcinoma" "Histlogical Type:Adenocarcinoma" ...
 $ characteristics_ch2.2            : chr  "Age:71" "Age:71" "Age:73" "Age:62" ...
 $ characteristics_ch2.3            : chr  "Gender:Male" "Gender:Male" "Gender:Male" "Gender:Female" ...
 $ characteristics_ch2.4            : chr  "Right lung cancer" "Right lung cancer" "Left lung cancer" "Right lung cancer" ...
 $ characteristics_ch2.5            : chr  "Stage:çV" "Stage:çV" "Stage:çU" "Stage:çV" ...
 $ characteristics_ch2.6            : chr  "LN Metastasis:Positive" "LN Metastasis:Positive" "LN Metastasis:Positive" "LN Metastasis:Positive" ...
 $ characteristics_ch2.7            : chr  "Postperative Tumor Recurrence:Recurrence" "Postperative Tumor Recurrence:Non-recurrence" "Postperative Tumor Recurrence:Non-recurrence" "Postperative Tumor Recurrence:Recurrence" ...
 $ molecule_ch2                     : chr  "total RNA" "total RNA" "total RNA" "total RNA" ...
 $ extract_protocol_ch2             : chr  "TriZol procedure" "TriZol procedure" "TriZol procedure" "TriZol procedure" ...
 $ label_ch2                        : chr  "Cy5" "Cy5" "Cy5" "Cy5" ...
 $ label_protocol_ch2               : chr  "Amino Allyl aRNA was synthesis by Amino Allyl MessageAmp?  aRNA Amplification Kit (Ambion). CyeDye Coupling and"| __truncated__ "Amino Allyl aRNA was synthesis by Amino Allyl MessageAmp?  aRNA Amplification Kit (Ambion). CyeDye Coupling and"| __truncated__ "Amino Allyl aRNA was synthesis by Amino Allyl MessageAmp?  aRNA Amplification Kit (Ambion). CyeDye Coupling and"| __truncated__ "Amino Allyl aRNA was synthesis by Amino Allyl MessageAmp?  aRNA Amplification Kit (Ambion). CyeDye Coupling and"| __truncated__ ...
 $ taxid_ch2                        : chr  "9606" "9606" "9606" "9606" ...
 $ hyb_protocol                     : chr  "Hybridized for 16 h at 42 C. Hybridization buffer and washing protocol was followed by the protocol supplied by"| __truncated__ "Hybridized for 16 h at 42 C. Hybridization buffer and washing protocol was followed by the protocol supplied by"| __truncated__ "Hybridized for 16 h at 42 C. Hybridization buffer and washing protocol was followed by the protocol supplied by"| __truncated__ "Hybridized for 16 h at 42 C. Hybridization buffer and washing protocol was followed by the protocol supplied by"| __truncated__ ...
 $ scan_protocol                    : chr  "ScanArray HT (PerkinElmer Japan Co., Ltd.) was used for scanning. Array images were analyzed with DNASIS Array "| __truncated__ "ScanArray HT (PerkinElmer Japan Co., Ltd.) was used for scanning. Array images were analyzed with DNASIS Array "| __truncated__ "ScanArray HT (PerkinElmer Japan Co., Ltd.) was used for scanning. Array images were analyzed with DNASIS Array "| __truncated__ "ScanArray HT (PerkinElmer Japan Co., Ltd.) was used for scanning. Array images were analyzed with DNASIS Array "| __truncated__ ...
 $ description                      : chr  "LK01" "LK02" "LK03" "LK04" ...
 $ data_processing                  : chr  "This data were analyzed by DNASIS array software(Hitachi Software Engineering), which converted the signal inte"| __truncated__ "This data were analyzed by DNASIS array software(Hitachi Software Engineering), which converted the signal inte"| __truncated__ "This data were analyzed by DNASIS array software(Hitachi Software Engineering), which converted the signal inte"| __truncated__ "This data were analyzed by DNASIS array software(Hitachi Software Engineering), which converted the signal inte"| __truncated__ ...
 $ platform_id                      : chr  "GPL1293" "GPL1293" "GPL1293" "GPL1293" ...
 $ contact_name                     : chr  "YASUMITSU,,MORIYA" "YASUMITSU,,MORIYA" "YASUMITSU,,MORIYA" "YASUMITSU,,MORIYA" ...
 $ contact_department               : chr  "Thoracic surgery" "Thoracic surgery" "Thoracic surgery" "Thoracic surgery" ...
 $ contact_institute                : chr  "Chiba University" "Chiba University" "Chiba University" "Chiba University" ...
 $ contact_address                  : chr  "1-8-1, Inohana" "1-8-1, Inohana" "1-8-1, Inohana" "1-8-1, Inohana" ...
 $ contact_city                     : chr  "Chuo-ku, Chiba" "Chuo-ku, Chiba" "Chuo-ku, Chiba" "Chuo-ku, Chiba" ...
 $ contact_zip/postal_code          : chr  "2608670" "2608670" "2608670" "2608670" ...
 $ contact_country                  : chr  "Japan" "Japan" "Japan" "Japan" ...
 $ supplementary_file               : chr  "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM176nnn/GSM176905/suppl/GSM176905.txt.gz" "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM176nnn/GSM176906/suppl/GSM176906.txt.gz" "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM176nnn/GSM176907/suppl/GSM176907.txt.gz" "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM176nnn/GSM176908/suppl/GSM176908.txt.gz" ...
 $ data_row_count                   : chr  "10368" "10368" "10368" "10368" ...
 $ Age:ch1                          : chr  "71" "71" "73" "62" ...
 $ Age:ch2                          : chr  "71" "71" "73" "62" ...
 $ Gender:ch1                       : chr  "Male" "Male" "Male" "Female" ...
 $ Gender:ch2                       : chr  "Male" "Male" "Male" "Female" ...
 $ Histlogical Type:ch2             : chr  "Adenocarcinoma" "Adenocarcinoma" "Squamous cell carcinoma" "Adenocarcinoma" ...
 $ LN Metastasis:ch2                : chr  "Positive" "Positive" "Positive" "Positive" ...
 $ Postperative Tumor Recurrence:ch2: chr  "Recurrence" "Non-recurrence" "Non-recurrence" "Recurrence" ...
 $ Stage:ch2                        : chr  "çV" "çV" "çU" "çV" ...
 $ Tissue:ch1                       : chr  "resected Adenocarcinoma" "resected Adenocarcinoma" "resected Squamous cell carcinoma" "resected Adenocarcinoma" ...

Do you have any suggestion on what commands I could use to split the dual channel dataset into two datasets?

Thanks

dual-channel GEO gene-expression • 1.4k views
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 16 hours ago
WEHI, Melbourne, Australia

The limma package provides extensive functionality for analysing two-color microarrays, see the limma User's Guide. Two color arrays are analysed with both channels in place -- you don't explicitly split them into two datasets -- otherwise all the advantages of the two color design would be lost and your data would become very noisy.

See https://support.bioconductor.org/p/128149 for a recent example of a limma two-color analysis on this forum.

limma even allows you to make comparisons between the treatment conditions as if the two channels were on separate arrays, see the limma User's Guide section on "Separate Channel Analysis of Two-Color Data". See

Smyth, GK, and Altman, NS (2013). Separate-channel analysis of two-channel microarrays: recovering inter-spot information. BMC Bioinformatics 14, 165. http://www.biomedcentral.com/1471-2105/14/165

for a discussion of the efficiency differences between one and two channel microarrays.

The GSE7339 study design has taken advantage of the two-color microarrays to adjust each tumor for the baseline expression of non-tumor tissue from the same patient. That seems to me to be a well motivated strategy to control for patient to patient variability. In this approach, you simply analyse the log-ratios (log2(Red/Green)) as if they were primary observations. GEO provides the normalized log-ratios for you, otherwise you can reproduce them from the raw data files using limma. You obviously have the right to make your own analysis decisions, but I would be very cautious about disassembling this experimental design.

ADD COMMENT
0
Entering edit mode

Hi Gordon Smyth Thanks for your reply and your help. I briefly checked the limma package but I was unable to understand how to use it for my scope (that is to obtain the gene expression for the controls and the gene expression for the tumor patients). Maybe you can help me understanding.

Considering the GSE7339 study design, is there a way for me to do so? Are there some functions that can let me have an R variable with gene expression of the controls (with probeset ID's on the rows and samples on the columns, or viceversa) and another R variable with gene expression of the tumor patients (with probeset ID's on the rows and samples on the columns, or viceversa)?

If now I run exprs(gset), I can only see one table.

Thank you!

ADD REPLY
0
Entering edit mode

Have you read the limma User's Guide? There is a chapter that explains it.

ADD REPLY
0
Entering edit mode

You seem to be assuming that this dataset contains controls and patients, but actually it doesn't. All the samples are from tumor patients.

Regarding your other questions, what you want to do is quite easy but unwise for the reasons I already explained. No one will provide code that produces the output you have asked for because it is such a bad thing to do. If you know better, then perhaps you can read a bit of documentation and extract out what you want yourself. You will need to use limma to read and process the raw data files. You cannot use getGEO.

ADD REPLY

Login before adding your answer.

Traffic: 544 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6