Question

edgeR filtering: how to remove meta tags and lowly expressed reads (cpm)

0

Entering edit mode

lynski008 • 0

@lynski008-11394

Last seen 8.7 years ago

I'm attempting to use edgeR to analyse my RNA-seq data. I have managed to use readDGE to create a data object which I have called DG and it looks like this if I print it:

An object of class DGEList
$samples
                      files group lib.size norm.factors
counts3D7_1-1 counts3D7_1-1     1 13251744            1
counts3D7_2-1 counts3D7_2-1     1 13809955            1
counts3D7_3-1 counts3D7_3-1     1 12328705            1
counts3D7_1-2 counts3D7_1-2     1 12605616            1
counts3D7_2-2 counts3D7_2-2     1 13392599            1
22 more rows ...

$counts
                     Samples
Tags                  counts3D7_1-1 counts3D7_2-1 counts3D7_3-1 counts3D7_1-2 counts3D7_2-2 counts3D7_3-2 counts3D7_1-3 counts3D7_2-3 counts3D7_3-3 countsBLM_1-1 countsBLM_2-1 countsBLM_3-1 countsBLM_1-2 countsBLM_2-2 countsBLM_3-2 countsBLM_1-3
  rna_PF3D7_0100100-1            49            24            27            14             8             6            15             9            15            11             5             8             4             5             6             6
  rna_PF3D7_0100200-1            17            17            23            11            13            13             3             6             6            31            15            15             9             9             4             2
  rna_PF3D7_0100300-1            15            10             4             2             4             5             2             4             5             5            11             4             1             2             2             1
  rna_PF3D7_0100400-1            44            45            46            28            33            35            38            33            53           116            87            82            65            66            41            88
  rna_PF3D7_0100500-1             0             0             0             0             1             2             1             1             0             0             2             2             1             1             0             5

I now wish to move onto the filtering stage but I am stuck. There are two things I wish to filter out. When I created the DG data object a warning came up that said "Meta tags detected: __no_feature, __ambiguous, __too_low_aQual, __not_aligned, __alignment_not_unique" so I wish to remove expression values with these meta tags i.e. non-aligned features. Secondly, in accordance with edgeR's recommendations I wish to remove features which do not have at least 1 read per million in n samples, where n for my dataset =3. Can anyone help me with putting together the code to achieve this? I've read through the edgeR manual but am new to R and edgeR and am not really sure where to start. Any help is greatly appreciated. Thanks,

edger filtering cpm • 2.4k views

ADD COMMENT • link updated 8.6 years ago by Gordon Smyth 52k • written 8.7 years ago by lynski008 • 0

score 1 · Answer 1 · 2016-08-31

1

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

To remove the meta tags:

MetaTags <- grep("^__", rownames(y))
y <- y[-MetaTags, ]

Here I have assumed your DGEList object is called y.

To filter on expression:

IsExpr <- rowSums(cpm(y) > 1) >= 3
y <- y[IsExpr, ]

For more detail about filtering, see the work flow paper: http://f1000research.com/articles/5-1438