RUVseq, DESeq2, and different codesets for nanostring data
0
0
Entering edit mode
@xiaofeiwang18266-13498
Last seen 15 months ago
Singapore

If I have 2 different codesets from nanostring, how should I run "RUV_total" function?

"Different codesets" means the dimension of 2 or more sets is different, such as below. There are a set of Endogenous genes are in common between the 2 sets. The house keeping (hk), negative, and positive genes are same for 2 sets. I have 2 ways in my mind to run "RUV_total" function. But, I have no idea which one is more reasonable, or better ideas?

First, I can merge the codesets, and then run "RUV_total" function on the merged one. For example, there are 241 Endogenous + 20 (hk, neg, pos) genes for 80 samples after merging. In this way, the genes which are not in common were removed, my concern is that if the removal of uncommon genes will have effect on ruv normalization.

Second, I can do normalization on codeset1 and 2, respectively. Then, the RUV factors can be combined to be pData and passed to DESeq2. In this case, the "countData" is still same as the above way.

Not sure if the "codeset size" (like the library size for RNAseq) has effect on ruv normalization or counts(dds, normalized=TRUE) function.

Any ideas or suggestions would be appreciated!

enter image description here

RUVSeq DESeq2 • 2.0k views
ADD COMMENT
0
Entering edit mode

I don't have an answer from the DESeq2 side, I'll leave this for the RUV devels.

ADD REPLY
0
Entering edit mode

I played in more details. @Michael love, do you have any comments about my play? I also copied this thread to GitHub (CBCS_normalization). Thanks a lot!

I calculated the W_1 for those 2 ways, respectively. The plot is as below. "preMer_w1" means, RUV_total was run on each of codeset individually. "postMer_w1" means, RUV_total was run on the merging codesets. In this case, only the common genes were kept in the merging data frame. The W_1 factor is different, which is within my expectation because different datasets were used, although I have no idea if some correlation was expected between them.

Then, I passed the W_1 to DESeq2 and got that the normalized counts are exactly same (RLE plots were exactly same), which is surprising to me.

The codes is as blow:

# run RUV_total on merging data
RUVnorm.dat_codeset262_375_post <- RUV_total(raw_codeset262_375_merging, pData_codeset262_375, fData_codeset262_375, k=1)

dds_codeset262_375_post <- DESeqDataSetFromMatrix(countData = counts(RUVnorm.dat_codeset262_375_post$set[1:241,]),
                              colData = pData(RUVnorm.dat_codeset262_375_post$set),
                              design = ~ W_1)
dds_codeset262_375_post <- DESeq(dds_codeset262_375_post)

ruv_normal_sizefactor_codeset262_375_post <- counts(dds_codeset262_375_post, normalized=TRUE)


# run RUV_total on each codeset, respectively, and then the w_1 from 2 runs was combined (in pData_preMerge) and passed to DESeq2 with merging raw counts.

RUVnorm_codeset262 <- RUV_total(raw_trt_codeset262, pData_codeset262, fData_codeset262, k=1)
RUVnorm_codeset375<- RUV_total(raw_trt_codeset375, pData_codeset375, fData_codeset375, k=1)

pData_preMerge <- rbind(pData(RUVnorm_codeset262$set), pData(RUVnorm_codeset375$set))

dds_codeset262_375_preMerg <- DESeqDataSetFromMatrix(countData = counts(raw_codeset262_375_merging[1:241,]),
                              colData = pData_preMerge,
                              design = ~ W_1)
dds_codeset262_375_preMerg <- DESeq(dds_codeset262_375_preMerg)

ruv_normal_sizefactor_codeset262_375_preMerg <- counts(dds_codeset262_375_preMerg, normalized=TRUE)

# countData is same for dds_codeset262_375_post and dds_codeset262_375_preMerg.

# results: ruv_normal_sizefactor_codeset262_375_post and ruv_normal_sizefactor_codeset262_375_preMerg are exactly same.

enter image description here

ADD REPLY
0
Entering edit mode

counts(dds, normalized=TRUE) scales to account for size factor (sequencing depth) but doesn't remove variance associated with the design variables.

ADD REPLY
0
Entering edit mode

oh, I see the design variables are not used when estimating the size factors, and counts(dds, normalized=TRUE) is providing counts scaled by size or normalization factors.

How could I get the normalized counts accounting for both size factors and design variables? So, I can use those normalized counts for RLE plots. I tried to google, but didn't find an answer.

ADD REPLY
0
Entering edit mode

In the DESeq2 vignette, we suggest you can use the VST then followed by regressing out batch variables.

ADD REPLY
0
Entering edit mode

So, the code should be like this in the vignette as below. But, I have no idea what is the batch variable in my case. If it will be the different codeset, I think I need to combine pData(RUVnorm.dat_codeset262_375_post$set) and pData_preMerge, and set a new variable of "batch"? Then, run

DESeqDataSetFromMatrix(countData=rbind(counts(RUVnorm.dat_codeset262_375_post$set[1:241,]), counts(raw_codeset262_375_merging[1:241,])), colData = newCombColData, design=~ batch + W1)

vignette codes:

dds <- DESeqDataSetFromMatrix(countData = cts,
                              colData = coldata,
                              design= ~ batch + condition)
dds <- DESeq(dds)

vsd <- vst(dds, blind=FALSE)

mat <- assay(vsd)
mm <- model.matrix(~condition, colData(vsd))
mat <- limma::removeBatchEffect(mat, batch=vsd$batch, design=mm)
assay(vsd) <- mat
ADD REPLY
1
Entering edit mode

It's up to you what you want to regress out (and for what analyses/purposes) I was just pointing to code that allows for "...accounting for both size factors and design variables".

ADD REPLY

Login before adding your answer.

Traffic: 600 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6