I have a problem with normalisation in metagenomeSeq and would appreciate some help.
I have multifactorial data (root_type x genotype) and am looking for differential OTUs. I have created a biom table in Qiime and proceed as follows:
bt=load_biom("otu_table_json.biom") pData(bt) #removing some control samples as these don't have a plant (and hence no root or genotype! samplesToLose= is.element(pData(bt)$Description,c("NO_PLANT_NA")) bt_subset=bt[,!samplesToLose] bt_subset=filterData(bt_subset,present=5,depth=30) #require an OTU to be present in at least 3 samples with a minimum of 30 reads #and another round rareFeatures=which(rowSums(MRcounts(bt_subset)>0)<30) bt_subset=bt_subset[-rareFeatures,] #find the appropriate normalisation btp=cumNormStat(bt_subset,pFlag=TRUE,main="BX data") bt_subset=cumNorm(bt_subset,p=btp)
At this point I get the default normalisation of 0.5
When I go through the rest of the analysis I get lots of differential OTUs but, when checking original data, these have all been generated by the failure of 4 of my samples (out of 44) to normalise.
rawC<-MRcounts(bt[,p],norm=FALSE,log=TRUE) normC<-MRcounts(bt[,p],norm=TRUE,log=TRUE) plot(x=rawC,y=normC,main=p)
If p is a 'good' sample I see a sensible normalisation curve (i.e. smooth with a bend at low values)
If p is 'bad' sample I get the error
Error in if (x <= 0.5) { : missing value where TRUE/FALSE needed
and the graph is now very noisy with some big outliers. These are what are skewing my final outcomes.
There's nothing obviously different between 'good' and 'bad' samples. They have similar numbers of counts and a plot of a good vs bad replicate shows a typical relationship (i.e. a straighline with some noise on it, increasing at lower count numbers).
Any help would be much appreciated.
Thanks
Steve Rolfe
Here's a link to the images
http://imgur.com/Z2a5luv
Sample 2 is 'good' , 3 is bad, but a plot of the raw counts for 2 vs 3 looks just fine