RUVg with differential transcript expression?
1
0
Entering edit mode
@brian-gudenas-14726
Last seen 6.8 years ago
Clemson

Hello all,

I am curious if the RUVg normalization method (ERCC spike in normalization) is applicable to transcript-level counts for differential transcript expression, I know RUVseq was developed for gene-level counts but it seems like it would still be OK for transcript level count analysis?

My  idea is to use Kallisto + Sleuth for DTE analysis coupled with RUVg spike in normalization since my samples are from different RNA library protocols (Poly A+, Poly A-, total RNA).

 so = sleuth_prep(s2c_tmp, transformation_function = function(x) log2(x+0.5))

 Expr = so$obs_norm %>%  dplyr::select(target_id, sample, est_counts)
 Emat = tidyr::spread(Expr, key  = sample, est_counts)

 ## need to round est_counts to integer for RUVseq
 Emat = round(Emat)

 # Filter transcripts
 filter <- apply(Emat, 1, function(x) length(x[x > 5]) >= 2) ##remove non-expressed transcripts
 filtered <- Emat[filter,]

 spikes <- rownames(filtered)[grep("^ERCC", rownames(filtered))]
 x <- as.factor(s2c_tmp$condition)

 set <- newSeqExpressionSet(as.matrix(filtered),
                             phenoData = data.frame(x, row.names=colnames(filtered)))

  ## USE RUVg to normalize samples based on control spikeins
  set1 <- RUVg(set, spikes, k=1)
  ## get normalization factors
  wts = pData(set1)[ , 2:ncol(pData(set1))]

  so$sample_to_covariates$W1 = wts

  ## Use W1 (RUVg factors) as batch
  so <- sleuth_fit(so, ~ W1 + condition, 'full')
  so <- sleuth_fit(so, ~ W1, 'reduced')
  ## Wald test for Log 2 Fold-change
  so = sleuth_wt(so, "conditionX")

 

Does this approach/ code seem reasonable?

ruvseq sleuth • 2.2k views
ADD COMMENT
2
Entering edit mode
davide risso ▴ 980
@davide-risso-5075
Last seen 8 months ago
University of Padova

Hi Brian,

I have no direct experience with applying RUVSeq factors to a Sleuth analysis, but your code looks reasonable to me. I would examine the distribution of p-values with and without RUV factors in the model to see if it helped (you want to see a uniform distribution plus a spike at 0 corresponding to DE genes).

Since you know the source of unwanted variation, another potentially useful approach is to include directly the indicator of the protocol as a covariate in the model. I would try both approaches and see which gives more reasonable results.

Hope this helps.

Davide

ADD COMMENT
0
Entering edit mode

Thanks for your help Davide, just wanted to post a quick follow-up if this may help others. I checked the p-value distributions and  sleuth + RUVg does appear uniform with a spike a 0, so it does appear to work.  I also tried your second approach, solely including the protocol as a covariate, similar p-value distribution, however i found about 25% more differentially expressed transcripts (qval < 0.05) than the Sleuth + RUVg method, so i went with this approach. 

ADD REPLY

Login before adding your answer.

Traffic: 636 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6