How to use brainarray custom cdf with any bioconductor package?
2
0
Entering edit mode
rishi.dasroy ▴ 20
@rishidasroy-7142
Last seen 9 months ago
Finland

I am trying to analyse affymetrix exon array with latest relaease of brainarray custom cdf.

I have tried 'affy' package with follwing command

> Data <- ReadAffy(cdfname ='moex10stmmrefseqcdf')
Error:

The affy package is not designed for this array type.
Please use either the oligo or xps package.

 

Next I have tried 'oligo' with following command

> affyExonFS <- read.celfiles(exonCELs,pkgname = "moex10stmmrefseqcdf")

Loading required package: moex10stmmrefseqcdf
Loading required package: AnnotationDbi
Loading required package: GenomeInfoDb

Attaching package: ‘AnnotationDbi’

The following object is masked from ‘package:GenomeInfoDb’:

    species

Platform design info loaded.
Reading in : 250313_1.CEL
Reading in : 250313_2.CEL
Reading in : 250313_3.CEL
Reading in : 250313_4.CEL

Error in (function (classes, fdef, mtable)  :
  unable to find an inherited method for function ‘kind’ for signature ‘"environment"’

 

Please let me know what is wrong here?

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=fi_FI.UTF-8    LC_NUMERIC=C            LC_TIME=en_GB           LC_COLLATE=en_GB        LC_MONETARY=fi_FI.UTF-8
 [6] LC_MESSAGES=en_GB       LC_PAPER=en_GB          LC_NAME=C               LC_ADDRESS=C            LC_TELEPHONE=C         
[11] LC_MEASUREMENT=en_GB    LC_IDENTIFICATION=C    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] pd.moex10st.mm.aceviewg_0.0.1 affy_1.44.0                   BiocInstaller_1.16.1          pd.moex.1.0.st.v1_3.10.0     
 [5] RSQLite_1.0.0                 DBI_0.3.1                     moex10stmmrefseqcdf_19.0.0    AnnotationDbi_1.28.1         
 [9] GenomeInfoDb_1.2.3            oligo_1.30.0                  Biostrings_2.34.0             XVector_0.6.0                
[13] IRanges_2.0.0                 S4Vectors_0.4.0               Biobase_2.26.0                oligoClasses_1.28.0          
[17] BiocGenerics_0.12.1          

loaded via a namespace (and not attached):
 [1] affxparser_1.38.0     affyio_1.34.0         bit_1.1-12            codetools_0.2-8       ff_2.2-13             foreach_1.4.2        
 [7] GenomicRanges_1.18.3  iterators_1.0.7       preprocessCore_1.28.0 splines_3.1.2         tools_3.1.2           zlibbioc_1.12.0     
affy oligo customcdf • 5.6k views
ADD COMMENT
0
Entering edit mode
@stephen-piccolo-6761
Last seen 4.3 years ago
United States

Rishi,

Sorry for the delay in replying. If you look at the code from the SCAN.UPC package http://www.bioconductor.org/packages/release/bioc/html/SCAN.UPC.html), you can find an example of how to use BrainArray mappings with exon arrays. Or you might also want to try normalizing (SCAN function) with this package rather than trying to write something from scratch.

-Steve

ADD COMMENT
0
Entering edit mode

Hi Steve,

Thanks for your answer. I have visited SCAN.UPC page and it looks promising.

Can you explain and provide examples more how to do DE of gene and exons using this?

Thanks

rishi

ADD REPLY
0
Entering edit mode

SCAN.UPC is used to normalize and summarize data, not test for differential expression. For that you would want to use something like the limma package. But if you want examples for how to use SCAN.UPC, you should read the vignette that can be accessed from the landing page:

http://bioconductor.org/packages/release/bioc/vignettes/SCAN.UPC/inst/doc/SCAN.vignette.pdf

Also note that your original post indicated that you want to summarize the data using the RefSeq mappings for the Mouse Exon 1.0 ST array. Since RefSeq is a transcript-based annotation database, you cannot use that to do DE of genes or exons. If you want genes, you should likely use the moex10stmmentrezgcdf package or the moex10stmmensgcdf package (for Entrez Gene and Ensembl gene mappings, respectively). If you care about exons, then I believe your only choice is moex10stmmensecdf.

ADD REPLY
0
Entering edit mode

Hi James,

Thanks for your response. I have normalized the exons through SCAN.UPC package using  moex10st_Mm_ENSE.cdf to check alternative splicing . Although there are 337485 exons in this cdf , but after normalization with SCAN.UPC I have only got normalization value of 228057 exons.

However normalization with RMA (FIRMA) has given intensities of 337485 exons.

Why there is a less number of exons while processing with SCAN.UPC?

I used following command to normalize the exons.

normalized_exon_xprsn_set = SCAN(celFilePath, outFilePath="exon_normalize_moex10stmmenseprobe_19.0.0.txt", probeSummaryPackage = "moex10stmmenseprobe",exonArrayTarget = "probeset")

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=fi_FI.UTF-8    LC_NUMERIC=C            LC_TIME=en_GB           LC_COLLATE=en_GB        LC_MONETARY=fi_FI.UTF-8
 [6] LC_MESSAGES=en_GB       LC_PAPER=en_GB          LC_NAME=C               LC_ADDRESS=C            LC_TELEPHONE=C         
[11] LC_MEASUREMENT=en_GB    LC_IDENTIFICATION=C    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] moex10stmmenseprobe_19.0.0 AnnotationDbi_1.28.1       GenomeInfoDb_1.2.4         IRanges_2.0.1              S4Vectors_0.4.0           
 [6] Biobase_2.26.0             BiocGenerics_0.12.1        data.table_1.9.4           plyr_1.8.1                 limma_3.22.3              
[11] biomaRt_2.22.0            

loaded via a namespace (and not attached):
 [1] bitops_1.0-6     chron_2.3-45     colorspace_1.2-4 DBI_0.3.1        digest_0.6.8     ggplot2_1.0.0    grid_3.1.2       gtable_0.1.2    
 [9] labeling_0.3     MASS_7.3-37      munsell_0.4.2    proto_0.3-10     Rcpp_0.11.3      RCurl_1.95-4.5   reshape2_1.4.1   RSQLite_1.0.0   
[17] scales_0.2.4     stringr_0.6.2    tools_3.1.2      XML_3.98-1.1   

ADD REPLY
0
Entering edit mode

When you used fRMA, did you somehow use BrainArray mappings?

The BrainArray mappings do not include all probes (some are considered to be low quality). So those would be excluded when you are using SCAN.UPC.

ADD REPLY
0
Entering edit mode

Yes I have used Brainarray gene and exon mappings both for fiRMA and SCAN.UPC. I have received same number of genes by both the method but it is different in case of exons.
 

ADD REPLY
0
Entering edit mode

Rishi,

Ah OK. Since these are exon arrays, you would also want to specify a value for the "exonArrayTarget" parameter. There are three types of probes within exon arrays: "core," "extended," and "full," and these are supported by varying levels of evidence. By default it will use just "core" probes because they are of the highest quality. That is probably why you are seeing a difference in the number of exons. If you want to use all probes, you would specify exonArrayTarget="probeset". The SCAN.UPC documentation provides more detail on this.

ADD REPLY
0
Entering edit mode
rishi.dasroy ▴ 20
@rishidasroy-7142
Last seen 9 months ago
Finland

Stephen ,

Thank you so much for quick response. But I normalized them with exonArrayTarget="probeset" parameter.

I have also checked the number of exons defined in the brainarray cdf. It is 337485 which is more than what I am getting from the SCAN.UPC

 

> t <-as.data.frame(table(moex10stmmenseprobe$Probe.Set.Name))
> head(t)
                   Var1 Freq
1 ENSMUSE00000097910_at    4
2 ENSMUSE00000097912_at    8
3 ENSMUSE00000097938_at    4
4 ENSMUSE00000097939_at    4
5 ENSMUSE00000097942_at    4
6 ENSMUSE00000097957_at    4
> dim(t)
[1] 337485      2
ADD COMMENT
1
Entering edit mode

Would you be willing to send me an email with a sample CEL file, the exact BrainArray version you are working with, and the R command you are using to normalize it? It would be probably be best to send the CEL file via Dropbox or some other web-based service due to size. Thanks.

ADD REPLY
0
Entering edit mode

I was trying to normalize 64 CEL files with a single command and the system was not responding (with 8 cores and 16 gb RAM, it was running for 56 hours ) . But I still found the resultant file given in "outFilePath". Now I understand the process may not be able to save all the exons in the file.

According to your suggestion I have processed a single CEL file and got desired number of exons. Now I will process them one by one.

BIG sorry for wasting your time and thank you very much for your help.

ADD REPLY

Login before adding your answer.

Traffic: 608 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6