Question

Normalisation problem in metagenomeSeq

0

Entering edit mode

Stephen Rolfe ▴ 20

@stephen-rolfe-5431

Last seen 9.0 years ago

I have a problem with normalisation in metagenomeSeq and would appreciate some help.

I have multifactorial data (root_type x genotype) and am looking for differential OTUs. I have created a biom table in Qiime and proceed as follows:

bt=load_biom("otu_table_json.biom")
pData(bt)

#removing some control samples as these don't have a plant (and hence no root or genotype!

samplesToLose= is.element(pData(bt)$Description,c("NO_PLANT_NA"))

bt_subset=bt[,!samplesToLose]
bt_subset=filterData(bt_subset,present=5,depth=30)  #require an OTU to be present in at least 3 samples with a minimum of 30 reads
#and another round
rareFeatures=which(rowSums(MRcounts(bt_subset)>0)<30)
bt_subset=bt_subset[-rareFeatures,]
#find the appropriate normalisation
btp=cumNormStat(bt_subset,pFlag=TRUE,main="BX data")
bt_subset=cumNorm(bt_subset,p=btp)

At this point I get the default normalisation of 0.5

When I go through the rest of the analysis I get lots of differential OTUs but, when checking original data, these have all been generated by the failure of 4 of my samples (out of 44) to normalise.

rawC<-MRcounts(bt[,p],norm=FALSE,log=TRUE)
normC<-MRcounts(bt[,p],norm=TRUE,log=TRUE)
plot(x=rawC,y=normC,main=p)

If p is a 'good' sample I see a sensible normalisation curve (i.e. smooth with a bend at low values)

If p is 'bad' sample I get the error

Error in if (x <= 0.5) { : missing value where TRUE/FALSE needed

and the graph is now very noisy with some big outliers. These are what are skewing my final outcomes.

There's nothing obviously different between 'good' and 'bad' samples. They have similar numbers of counts and a plot of a good vs bad replicate shows a typical relationship (i.e. a straighline with some noise on it, increasing at lower count numbers).

Any help would be much appreciated.

Thanks

Steve Rolfe

normalization metagenomeseq • 2.0k views

ADD COMMENT • link updated 9.0 years ago by Joseph Nathaniel Paulson ▴ 280 • written 9.0 years ago by Stephen Rolfe ▴ 20

0

Entering edit mode

Here's a link to the images

http://imgur.com/Z2a5luv

Sample 2 is 'good' , 3 is bad, but a plot of the raw counts for 2 vs 3 looks just fine

ADD REPLY • link 9.0 years ago Stephen Rolfe ▴ 20

score 0 · Answer 1 · 2016-04-19

0

Entering edit mode

Joseph Nathaniel Paulson ▴ 280

@joseph-nathaniel-paulson-6442

Last seen 8.0 years ago

United States

Hi Steve,

Can you email me (so you don't have to post it on the public forum) an MRE - the MRexperiment object (bt_subset) and I'd be happy to take a look?

[ jpaulson ]@jimmy dot harvard dot edu