I'm attempting to use edgeR to analyse my RNA-seq data. I have managed to use readDGE to create a data object which I have called DG and it looks like this if I print it:
An object of class DGEList $samples files group lib.size norm.factors counts3D7_1-1 counts3D7_1-1 1 13251744 1 counts3D7_2-1 counts3D7_2-1 1 13809955 1 counts3D7_3-1 counts3D7_3-1 1 12328705 1 counts3D7_1-2 counts3D7_1-2 1 12605616 1 counts3D7_2-2 counts3D7_2-2 1 13392599 1 22 more rows ... $counts Samples Tags counts3D7_1-1 counts3D7_2-1 counts3D7_3-1 counts3D7_1-2 counts3D7_2-2 counts3D7_3-2 counts3D7_1-3 counts3D7_2-3 counts3D7_3-3 countsBLM_1-1 countsBLM_2-1 countsBLM_3-1 countsBLM_1-2 countsBLM_2-2 countsBLM_3-2 countsBLM_1-3 rna_PF3D7_0100100-1 49 24 27 14 8 6 15 9 15 11 5 8 4 5 6 6 rna_PF3D7_0100200-1 17 17 23 11 13 13 3 6 6 31 15 15 9 9 4 2 rna_PF3D7_0100300-1 15 10 4 2 4 5 2 4 5 5 11 4 1 2 2 1 rna_PF3D7_0100400-1 44 45 46 28 33 35 38 33 53 116 87 82 65 66 41 88 rna_PF3D7_0100500-1 0 0 0 0 1 2 1 1 0 0 2 2 1 1 0 5
I now wish to move onto the filtering stage but I am stuck. There are two things I wish to filter out. When I created the DG data object a warning came up that said "Meta tags detected: __no_feature, __ambiguous, __too_low_aQual, __not_aligned, __alignment_not_unique" so I wish to remove expression values with these meta tags i.e. non-aligned features. Secondly, in accordance with edgeR's recommendations I wish to remove features which do not have at least 1 read per million in n samples, where n for my dataset =3. Can anyone help me with putting together the code to achieve this? I've read through the edgeR manual but am new to R and edgeR and am not really sure where to start. Any help is greatly appreciated. Thanks,
That worked perfectly. Thank you very much Gordon.