getProbeInfo and exprs return two different dimensions
1
0
Entering edit mode
Haoran • 0
@46f52cee
Last seen 2.6 years ago
United States

Hi, I was trying to get intensity and annotation for affymetrix microarray probes. I extract probe-level intensity and probe annotations using "getProbeInfo" and "exprs" command from oligo, but the result shows exprs() has a smaller dimension than the dataframe I got using getProbeInfo()

I download my CEL file from GEO, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE147788

Series GSE147788

please help me,

##Probe intensity
> pro_int <- exprs(data.raw)
> head(pro_int)
  GSM4445944_9SR20982A_A01.CEL.gz GSM4445946_9SR20982A_A03.CEL.gz
1                            7097                            8199
2                             269                             226
3                            7148                            7855
4                             161                             162
5                             188                             197
6                             202                             252
> dim(pro_int)
[1] 6892960       2

> ##probe f-id
> pin <- oligo::getProbeInfo(data.raw)
> head(pin)
  fid         man_fsetid
1   6 PSR1700199794.hg.1
2   8           24657315
3   8 PSR1300152110.hg.1
4   9 PSR0200224250.hg.1
5  11           24587906
6  11 PSR0300183028.hg.1
> dim(pin)
[1] 8132393       2
CELfile MicroarrayData AffymetrixChip • 1.1k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 34 minutes ago
United States

It appears like you still have the probe level data. If you do

dat <- rma(data raw)

The result will then be at the transcript level and should match your annotation data.

ADD COMMENT
0
Entering edit mode

Thanks James, but what I want to do is find probes that contain SNPs. I already got the probe_id for probes that have snps. for example:
probe_id probe_sequence probe_seqname probe_strand probe_start probe_stop probe_blocks RSID1

2769426 CACACTGCTTTCTTTCTGGAGCTAA chr3 - 23762 23786 23762-23786 rs139173350

765447 TTTTCACACTGCTTTCTTTCTGGAG chr3 - 23766 23790 23766-23790 rs139173350

2664564 ATCTTTTTCACACTGCTTTCTTTCT chr3 - 23770 23794 23770-23794 rs139173350

3187315 TTTCATCTTTTTCACACTGCTTTCT chr3 - 23774 23798 23774-23798 rs139173350

65454 TGTGTTTCATCTTTTTCACACTGCT chr3 - 23778 23802 23778-23802 rs139173350

364306 AATCTGTGTTTCATCTTTTTCACAC chr3 - 23782 23806 23782-23806 rs139173350

5702188 AGTGTGTTGTCTGTTGCCAAGGGTT chr3 - 24046 24070 24046-24070 rs559636578

Now I just want to link this dataframe with probe intensity of the CEL file, what should I do? What I got is the Probe_id from the example, all probe intensity from CEL file, and the annotation that do not match the probe number in CEl file...

ADD REPLY
0
Entering edit mode

The probe_id in the data you present above is the same thing as what oligo calls the 'fid'. It also happens to be the row.name for the raw data. I downloaded one of the files you are using as an example.

> raw.data <- read.celfiles("GSM4445944_9SR20982A_A01.CEL.gz")
Platform design info loaded.
Reading in : GSM4445944_9SR20982A_A01.CEL.gz
> head(exprs(raw.data))
  GSM4445944_9SR20982A_A01.CEL.gz
1                            7097
2                             269
3                            7148
4                             161
5                             188
6                             202

## read in your data
> snp.dat <- matrix(scan("clipboard", "c"), ncol = 8, byrow = TRUE)
Read 56 items
> snp.dat
     [,1]      [,2]                        [,3]   [,4] [,5]    [,6]   
[1,] "2769426" "CACACTGCTTTCTTTCTGGAGCTAA" "chr3" "-"  "23762" "23786"
[2,] "765447"  "TTTTCACACTGCTTTCTTTCTGGAG" "chr3" "-"  "23766" "23790"
[3,] "2664564" "ATCTTTTTCACACTGCTTTCTTTCT" "chr3" "-"  "23770" "23794"
[4,] "3187315" "TTTCATCTTTTTCACACTGCTTTCT" "chr3" "-"  "23774" "23798"
[5,] "65454"   "TGTGTTTCATCTTTTTCACACTGCT" "chr3" "-"  "23778" "23802"
[6,] "364306"  "AATCTGTGTTTCATCTTTTTCACAC" "chr3" "-"  "23782" "23806"
[7,] "5702188" "AGTGTGTTGTCTGTTGCCAAGGGTT" "chr3" "-"  "24046" "24070"
     [,7]          [,8]         
[1,] "23762-23786" "rs139173350"
[2,] "23766-23790" "rs139173350"
[3,] "23770-23794" "rs139173350"
[4,] "23774-23798" "rs139173350"
[5,] "23778-23802" "rs139173350"
[6,] "23782-23806" "rs139173350"
[7,] "24046-24070" "rs559636578"

> exprs(raw.data)[as.numeric(snp.dat[,1]),,drop = FALSE]
        GSM4445944_9SR20982A_A01.CEL.gz
2769426                              34
765447                               44
2664564                              48
3187315                              31
65454                                26
364306                               35
5702188                              75
ADD REPLY
0
Entering edit mode

Thanks, that helps!!

ADD REPLY

Login before adding your answer.

Traffic: 958 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6