Question

edgeR subsetting DGEList by column/sample

0

Entering edit mode

mnaymik ▴ 10

@mnaymik-7522

Last seen 7.4 years ago

United States

I saw this post from a while ago regarding a similar issue: edgeR: Problem with subsetting a DGEList in latest package version

>d$samples[1:6,]

sample lib.size norm.factor type time

preExercise_TAGGCTGACTTGAG.1 856 1.1020236 B pre
preExercise_TCCATCCTCGTTAG.1 1033 1.2198739 B pre
pbmc001_TTGAGGACTTTCAC.1 703 1.2050717 B pre
pbmc001_AGTCGCCTGCTTAG.1 1230 1.0304974 B post
pbmc001_TACTACACAGCACT.1 1053 0.9790636 C post
pbmc001_TAAACAACCCTTAT.1 895 1.1032946 D pre

...

I am trying to do differential expression of things only of type 'B', with the time frame as the group 'post vs pre'. I though the easiest way would be to just subset d via:

d.B = d[,grep('B',d$samples$type)]

But I get the error:

Error in `$<-.data.frame`(`*tmp*`, "group", value = integer(0)) :
replacement has 0 rows, data has 226

Is there a proper way of doing differential expression on just a subset of the DGEList?

I got around this by employing the method from the post Iinked:

B=grep('B',d$samples$type)
test=DGEList(d$counts)
test=test[,B]

Then replacing test$samples with its proper subset from d:

test$samples=d$samples[Bcells,]

This just seems sort of hacky...

edger subsetting dgelist • 7.4k views

ADD COMMENT • link updated 8.8 years ago by Gordon Smyth 52k • written 8.8 years ago by mnaymik ▴ 10

0

Entering edit mode

Something seems strange.

Can you start a new R session, call library(edgeR) and then come back here to update your question with the contents provided by copy/pasting the output of sessionInfo()

ADD REPLY • link 8.8 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

> library(edgeR)

Loading required package: limma

> sessionInfo()

R version 3.3.1 (2016-06-21)

Platform: x86_64-apple-darwin13.4.0 (64-bit)

Running under: OS X 10.11.5 (El Capitan)

locale:

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:

[1] stats graphics grDevices utils datasets methods base

other attached packages:

[1] edgeR_3.14.0 limma_3.28.14

ADD REPLY • link 8.8 years ago mnaymik ▴ 10

0

Entering edit mode

Can you post a minimal working example of this behaviour?

ADD REPLY • link 8.8 years ago Aaron Lun ★ 28k

score 3 · Answer 1 · 2016-07-22

3

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 17 days ago

United States

Now that you've verified you're running the latest version of edgeR, I've looked a bit more closely at your example and error.

It seems that you have somehow constructed a DGEList (d) with a $samples data.frame that doesn't have a group column -- what were the commands you used to construct d?

In any event, try adding a group column, like so:

d$samples <- transform(y$samples, group=paste(type, time, sep="_"))

Then try subsetting by columns again ...

Also, adding such a group column can be useful in your downstream analysis since you can now analyze your experiment as a one-way layout:

design <- model.matrix(~ 0 + group, d$samples)

You can then construct contrasts with makeContrasts that are easy-to-interpret arithmetic over the columns of design.

ADD COMMENT • link 8.8 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Since I was using the time column as my group I had set the samples$group=NULL. Later I had been setting group = time which if I do before subsetting It works just fine. I did not realize group was that sensitive. Thanks!

ADD REPLY • link 8.8 years ago mnaymik ▴ 10

0

Entering edit mode

Or just

d$samples$group <- paste(type, time, sep=".")

would also do the job.

ADD REPLY • link 8.8 years ago Gordon Smyth 52k

score 0 · Answer 2 · 2016-07-23

A DGEList object needs to satisfy some minimum conditions to be a valid object. If you change a DGEList object so that it no longer satisfies these minimum conditions, then operations such as subsetting can no longer be guaranteed to work.

help("DGEList-class") explains what a DGEList object is assumed to contain. It explains that 'group', 'lib.size' and 'norm.factors' are compulsory columns for the d$samples data.frame, so you cannot remove them.