Question

read counts for each gene for edgeR

0

Entering edit mode

mictadlo ▴ 10

@mictadlo-10885

Last seen 5.3 years ago

Hi,

I received read counts for each gene for each BAM file which was generated by HTseq-count.

```

augustus_masked-lcl_ScwjSwM_1-processed-gene-0.2        0
augustus_masked-lcl_ScwjSwM_100-processed-gene-0.0      1
augustus_masked-lcl_ScwjSwM_1000-processed-gene-0.1     0
augustus_masked-lcl_ScwjSwM_1000-processed-gene-0.3     0
augustus_masked-lcl_ScwjSwM_1000-processed-gene-1.13    1
augustus_masked-lcl_ScwjSwM_1000-processed-gene-2.0     0

```

Looking at this https://f1000research.com/articles/5-1438/v2 they combined all BAM files and then ran `fc <- featureCounts(all.bam, annot.inbuilt="mm10")`

Is it possible to combine all the counts from each BAM file and create an compatible table for edgeR?

Thank you in advance

edger htseqcounts • 3.7k views

ADD COMMENT • link updated 7.6 years ago by Gordon Smyth 52k • written 7.6 years ago by mictadlo ▴ 10

score 3 · Accepted Answer · 2017-09-04

3

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 7 hours ago

WEHI, Melbourne, Australia

readDGE() will read the htseq-count output files into R and edgeR.

Just make a vector files containing the names of the htseq-count output files that contain the counts. Then

library(edgeR)
dge <- readDGE(files, header=FALSE)

ADD COMMENT • link 7.6 years ago Gordon Smyth 52k

0

Entering edit mode

This script htseq-combine_all.R script describes how to merge HTSeq results. Is it the best way to do it?

ADD REPLY • link 7.6 years ago mictadlo ▴ 10

0

Entering edit mode

The script seems very complicated. Have you tried the solution I suggested, which is just one line instead of a whole script?

ADD REPLY • link 7.6 years ago Gordon Smyth 52k

0

Entering edit mode

Hi, Sorry for not posting earlier you suggestion with readDGE but I got the same error:

Error in DGEList(data[, -1], group = group, genes = data[, 1, drop = FALSE]) :
Length of 'group' must equal number of columns in 'counts'

# where are we?
> basedir <- "~/projects/mel/Sporobolus_pyramadalis/"
> setwd(basedir)
>
> cntdir <- paste(basedir, "Htseq_read_counts", sep="/")
> pat <- "_001.sorted.out"
> files <- list.files(path = cntdir,
+                             pattern = pat,
+                             all.files = TRUE,
+                             recursive = FALSE,
+                             ignore.case = FALSE,
+                             full.names = TRUE,
+                             include.dirs = FALSE)
>

> data <- readDGE(files, header=FALSE)
Meta tags detected: __no_feature, __ambiguous, __too_low_aQual, __not_aligned, __alignment_not_unique

> targets <- read.delim("phenoSp.txt", stringsAsFactors=FALSE)

> ## ----group---------------------------------------------------------------
> group <- paste(targets$CellType, targets$Status, sep=".")
> group <- factor(group)
> table(group)
group
Leaves.1.5     Leaves.2     Leaves.3 Root_tip.1.5   Root_tip.2   Root_tip.3
           4            4            4            4            4            4
>
> library(edgeR)
> y <- DGEList(data[,-1], group=group,
+              genes=data[,1,drop=FALSE])
Error in DGEList(data[, -1], group = group, genes = data[, 1, drop = FALSE]) :
Length of 'group' must equal number of columns in 'counts'

What did I miss?

ADD REPLY • link 7.6 years ago mictadlo ▴ 10

0

Entering edit mode

Well, it's not actually a question of what you've missed, but what you're doing that you don't need to. readDGE() already created a DGEList for you, so you don't need to run DGEList(). Why are you throwing out the first sample by data[,-1]? Naturally that will leave you with one column short.

Just skip that -- you can start doing analysis directly on the DGEList that you got from readDGE().

ADD REPLY • link 7.6 years ago Gordon Smyth 52k

0

Entering edit mode

Thank you. I have tried to follow https://f1000research.com/articles/5-1438/v2 but I will try look for examples how to do it without DGEList.

ADD REPLY • link 7.6 years ago mictadlo ▴ 10

0

Entering edit mode

Thank you for your help it works like advertise:

y <- readDGE(files, header=FALSE, group = group)
keep <- rowSums(cpm(y)>1) >= 2

ADD REPLY • link 7.6 years ago mictadlo ▴ 10