Question

How to access normalized data in the NanoStringDiff package?

0

Entering edit mode

casey.rimland ▴ 170

@caseyrimland-14915

Last seen 6.6 years ago

University of Cambridge, National Insti…

I am trying to use NanoStringDiff for differential expression analysis of a nanostring data-set with 506 endogenous genes in the set. I was wondering how/if there is a way to output the normalized data that NanoStringDiff uses to run the differential expression LRT tests? I have been able to run the differential expression analyses correctly (I hope!), but now would like to know if there is a way to access the normalized data to use for PCA plots, heatmaps, etc? I tried assay(exprs) but it just gave me the raw counts. Thanks!

#Load data

path<-paste(dir,"nanostring_R.csv",sep="/")
designs <- data.frame(group=c("WT_IL13", "WT_IL13", "WT_IL13", "WT_CTRL", "WT_CTRL", "WT_CTRL", "RA1_IL13", "RA1_IL13", "RA1_IL13", "RA1_CTRL", "RA1_CTRL", "RA1_CTRL"))

#Create a Nanostring dataset
nanostringdata <- createNanoStringSetFromCsv(path = path, header = TRUE, designs = designs)

#Run DE analysis
pheno=pData(nanostringdata)
group=pheno$group
design.full=model.matrix(~0+group)
design.full

NanoStringData_Norm <- estNormalizationFactors(nanostringdata)

#Get Results for pairwise contrasts
result_WT <- glm.LRT(NanoStringData_Norm,design.full,contrast=c(0,0,-1,1))

nanostringdiff nanostring NanoStringDiff • 2.4k views

ADD COMMENT • link updated 6.7 years ago by James W. MacDonald 68k • written 6.7 years ago by casey.rimland ▴ 170

score 1 · Answer 1 · 2018-06-07

1

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 15 hours ago

United States

I don't think there is a direct accessor, but this is what is done to the data prior to fitting any model:

    c = positiveFactor(NanoStringData)
    d = housekeepingFactor(NanoStringData)
    k = c * d
    lamda_i = negativeFactor(NanoStringData)
    Y = exprs(NanoStringData)
    Y_n = sweep(Y, 2, lamda_i, FUN = "-")
    Y_nph = sweep(Y_n, 2, k, FUN = "/")
    Y_nph[Y_nph <= 0] = 0.1

And then

     Y_nph <- log(Y_nph)

will give you data that you can plot.

ADD COMMENT • link 6.7 years ago James W. MacDonald 68k

0

Entering edit mode

Thank you!

I just gave the code a try and I got stuck on this step with a warning message:

Y_n = sweep(Y, 2, lamda_i, FUN = "-")

Warning message:
In max(cumDim[cumDim <= lstats]) :
no non-missing arguments to max; returning -Inf

Anything I might be doing wrong? The code runs through but there are just NA in the final log(Y_nph)

ADD REPLY • link 6.7 years ago casey.rimland ▴ 170

0

Entering edit mode

That error comes from some checking in sweep to make sure that the length of lambda_i is reasonable for the dimensions of the matrix you are sweeping on. So there appears to be a problem with either your Y matrix or whatever you are getting for lambda_i. You need to take a look at those data and see what's up.

ADD REPLY • link 6.7 years ago James W. MacDonald 68k

0

Entering edit mode

I was trying to run it before calling the estNormalizationFactors. Fixed it now and have the output. Thank you bunches!

ADD REPLY • link 6.7 years ago casey.rimland ▴ 170

0

Entering edit mode

Hello,

I get similar situation like above.

To get normalized data for plotting, I tried to use NanoStringDataNormalization , but that normalized data looks not consistent to the logFC provided by glm.LRT.

I found this comment and compared the normalized matrix using this code (without the last log transformation) after estNormalizationFactors with raw data, and that by NanoStringDataNormalization with the same raw data, but those two are quite different.

which one should I use?

ps. I really appreciate your package though.

ADD REPLY • link 5.9 years ago ysksuh • 0

0

Entering edit mode

You cannot generate log fold changes you get from a generalized linear model 'by hand'. In other words, there is no formula that you can plug data into, in order to get the results the GLM will provide. The parameters for the GLM are estimated using an iterative procedure that you won't be able to replicate, and the 'normalized' data we are talking about are just gross estimates that are useful for plotting.

ADD REPLY • link 5.9 years ago James W. MacDonald 68k