QC for Affymetrix miRNA 4.0 arrays: Error from qc/QCReport
2
0
Entering edit mode
@federicocomoglio-4524
Last seen 7.4 years ago
Switzerland

Hi,

I'm analyzing >30 Affymetrix miRNA 4.0 microarrays. As the corresponding miRNA 4.0 CDF is not available from BioC, I downloaded it from the Affymetrix website. I then created a CDF environment using make.cdf.env (makecdfenv package) and read in the data as:

rawData <- ReadAffy( )
rawData@cdfName <- 'mirna40'

The returned AffyBatch object seems perfectly fine to me. It has meaningful row.names and runs smoothly through rma (affy).

However, I would like to perform extensive QC for these data before proceding with differential expression analysis. To this end, I understand that the QCReport (affyQCReport package) and/or the qc (simpleaffy) functions are valuable options. Unfortunately, a call to either function currently raises the error below:

QCReport( rawData, file = 'QC.pdf' )

Error in ans[[i]][, i.probes] : subscript out of bounds

qc( rawData )
Error in ans[[i]][, i.probes] : subscript out of bounds

Debugging suggests that the error is generated by the signalDist function, but I was unable to go further.

I would really appreciate your help on this. Thanks a lot in advance.

Federico

 

sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
 [1] hgu95av2cdf_2.15.0   affydata_1.13.1      affyQCReport_1.44.0
 [4] lattice_0.20-29      BiocInstaller_1.16.1 makecdfenv_1.42.0
 [7] affyio_1.34.0        simpleaffy_2.42.0    gcrma_2.38.0
[10] genefilter_1.48.1    affy_1.44.0          Biobase_2.26.0
[13] BiocGenerics_0.12.1

loaded via a namespace (and not attached):
 [1] affyPLM_1.42.0        annotate_1.44.0       AnnotationDbi_1.28.1
 [4] Biostrings_2.34.1     DBI_0.3.1             GenomeInfoDb_1.2.4
 [7] grid_3.1.2            IRanges_2.0.1         preprocessCore_1.28.0
[10] RColorBrewer_1.1-2    RSQLite_1.0.0         S4Vectors_0.4.0
[13] splines_3.1.2         stats4_3.1.2          survival_2.37-7
[16] tools_3.1.2           XML_3.98-1.1          xtable_1.7-4
[19] XVector_0.6.0         zlibbioc_1.12.0
simpleaffy affy affyQCReport • 2.0k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 4 hours ago
United States

Both the simpleaffy and affyQCReport packages were designed with the original 3'-biased arrays in mind. The miRNA arrays don't have the same content, so both of these packages will tend to fail because the miRNA arrays do not fulfill the expectations that particular probesets will exist on the array.

In addition, the miRNA arrays are difficult to QC because in general most of the transcripts are either expressed at relatively low concentrations or not at all. And there is content on the array for any number of different species (and Affy may or may not re-use the same probes for different species, depending on conservation).

Add in the fact that miRNA transcripts are usually 21-23 nt long, and the Affy probes are 25 nt long (so each probe is usually longer than the transcript being measured, and the probeset is made up of the same probe, just distributed across the array), and things like the affyRNADeg() plot no longer make sense.

Long story short, you are pretty much on your own with these arrays.

ADD COMMENT
0
Entering edit mode
@federicocomoglio-4524
Last seen 7.4 years ago
Switzerland

Hi Jim,

thank you for your insightful answer. I do agree with you that QC such as RNA degration do not make sense for these arrays. However, spike-in controls should be meaningful. In addition, even a simple boxplot of raw intensity values fail in a call to 

boxplot( rawData )

raising the same error as above.

ADD COMMENT
0
Entering edit mode

I have never used the affy package and the (unsupported) CDF file for these arrays, instead using oligo, which is much better suited.

> dat1 <- read.celfiles(filenames = samps$File[1:6])
Loading required package: pd.mirna.4.0
Loading required package: RSQLite
Loading required package: DBI
Platform design info loaded.
Reading in : ../CEL/A12258.CEL
Reading in : ../CEL/A10033.CEL
Reading in : ../CEL/Z08140.CEL
Reading in : ../CEL/Z08062.CEL
Reading in : ../CEL/A12263.CEL
Reading in : ../CEL/A10016.CEL
> boxplot(dat1)
Warning message:
'isIdCurrent' is deprecated.
Use 'dbIsValid' instead.
See help("Deprecated")

## the above warnings have to do with changes to the RSQLite package, and will not affect the analysis, and will go away in the next release

> dat1
ExpressionFeatureSet (storageMode: lockedEnvironment)
assayData: 292681 features, 6 samples
  element names: exprs
protocolData
  rowNames: A12258.CEL A10033.CEL ... A10016.CEL (6 total)
  varLabels: exprs dates
  varMetadata: labelDescription channel
phenoData
  rowNames: A12258.CEL A10033.CEL ... A10016.CEL (6 total)
  varLabels: index
  varMetadata: labelDescription channel
featureData: none
experimentData: use 'experimentData(object)'
Annotation: pd.mirna.4.0
ADD REPLY

Login before adding your answer.

Traffic: 561 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6