Question

adding "symbol" column in my differential expressed genes in EdgeR

0

Entering edit mode

alihakimzadeh73 • 0

@alihakimzadeh73-20840

Last seen 4.3 years ago

Hi,

I try to do differential expression analysis by "EdgeR", I have the "counts.csv " which achieved by "HTSeq". i want to add the "symbol" column with the gene symbol corresponding to the Gene ID to a data frame in EdgeR to have also the symbols in up and down-regulated genes table. Here is my code which i used for differential expression analysis:

library(edgeR)
library(org.Hs.eg.db)
x<-read.csv("counts.csv")
y<-DGEList(counts=x[,2:51], genes = x[,1]) 
y <- calcNormFactors(y)
group <- factor(c(rep("high",30),rep("low",20)))
time<-factor(c("pre","post","pre","post","pre","post","pre","post","pre","post",
           "pre","post","pre","post","pre","post","pre","post","pre","post",
           "pre","post","pre","post","pre","post","pre","post","pre","post",
           "pre","post","pre","post","pre","post","pre","post","pre","post",
           "pre","post","pre","post","pre","post","pre","post","pre","post"))
data.frame(sample=colnames(y),group,time) #data frame
design<-model.matrix(~group+time) 
y<-estimateDisp(y,design)
fit<-glmFit(y,design)
lrt<-glmLRT(fit,coef = 2)
deg <-topTags(lrt, n = Inf , p= 0.05)$table
up <-deg[deg$logFC > 0,]
down <-deg[deg$logFC < 0,]
write.csv(up, file="up.csv")
write.csv(down, file="down.csv")

i try to use this code to insert symbols to my data frame but it doesn't work! can anyone help me to go through it?

mp=gsub("\\..*","",row.names(y))
y$symbol<- mapIds(org.Hs.eg.db, keys= row.names(mp),
                  keytype="ENSEMBL", column="SYMBOL")

This is also the results table that i got:

head(up)

              genes    logFC     logCPM         LR        PValue           FDR
5659  ENSG00000125207   4.383522   0.5658056 1139.1891   1.003436e-249     5.797650e-245
25588 ENSG00000222057  4.589772   -0.2246701  980.9821 2.444029e-215   7.060555e-211
50136 ENSG00000261177  5.207807    -0.1559902  810.7058 2.537797e-178   4.887627e-174
35996 ENSG00000236941  2.595311    2.0293394  790.5476 6.126318e-174   8.849161e-170
29288 ENSG00000227615  5.668960    0.4194818  767.6254 5.902889e-169    6.821142e-165
17466 ENSG00000196564  4.778656    -0.8531067  715.6777 1.165581e-157   1.122415e-153

the error phrase that i received is:

Error in mapIds_base(x, keys, column, keytype, ..., multiVals = multiVals) : mapIds must have at least one key to match against.

Thanks

EdgeR • 2.3k views

ADD COMMENT • link updated 5.0 years ago by James W. MacDonald 67k • written 5.0 years ago by alihakimzadeh73 • 0

0

Entering edit mode

Yunshun Chen ▴ 890

@yunshun-chen-5451

Last seen 1 day ago

Australia

You didn't show what your mp is. You set keys=row.names(mp) in mapIds(). But are they Ensembl Ids as required?

Note that mapIds() is not an edgeR function. This questions is more related to how to use annotation packages such as org.Hs.eg.db.

ADD COMMENT • link 5.0 years ago Yunshun Chen ▴ 890

0

Entering edit mode

mp is row.names(y), indeed, my main problem is the annotation and adding symbols to report the genes which are upregulated and downregulated in the experiment. unfortunately, as you see I couldn't make it happen.

ADD REPLY • link 5.0 years ago alihakimzadeh73 • 0

1

Entering edit mode

I can see how you defined mp. But have you ever checked what your mp actually is? You need to extract Endembl Ids from the genes column of y$genes and use it as keys in mapIds().

ADD REPLY • link 5.0 years ago Yunshun Chen ▴ 890

0

Entering edit mode

Thanks, Yunshun. I understand why it doesn't work correctly.

ADD REPLY • link 5.0 years ago alihakimzadeh73 • 0

score 3 · Accepted Answer · 2019-11-18

You have an error in your code, and you are overlooking something. First the error. This code won't do what you apparently think it does:

y$symbol <- mapIds(org.Hs.eg.db, keys= row.names(mp),
                  keytype="ENSEMBL", column="SYMBOL")

Because a DGEList doesn't have a 'symbol' list item. Well, a DGEList is just a list, and you can add any list item to it that you like, but it won't be used for anything, because none of the code in edgeR expects a 'symbol' list item so it will be ignored.

> class(d)
[1] "DGEList"
attr(,"package")
[1] "edgeR"
> d$symbol <- "HERESASYMBOLFORYA"
> d
An object of class "DGEList"
$counts
       x0 x1 x2
gene.1  2  8 58
gene.2  3  2  5
gene.3  7  2  4
gene.4  5  3  4
gene.5  2  1  1
95 more rows ...

$samples
   group lib.size norm.factors
x0     1      379    0.9840312
x1     1      384    1.0295185
x2     1      473    0.9870905

$common.dispersion
[1] 0.09547519

$AveLogCPM
[1] 15.76788 13.64082 13.90450 13.82198 12.97364
95 more elements ...

$symbol
[1] "HERESASYMBOLFORYA"

> topTags(glmLRT(glmFit(d, design, dispersion = dispersion.true), 2))
Coefficient:  x 
             logFC   logCPM        LR       PValue        FDR
gene.1   1.3630814 16.06110 11.868112 0.0005710326 0.05710326
gene.66 -1.7069713 14.11364  8.224193 0.0041335577 0.20667789
gene.5  -1.1682427 14.17484  4.740671 0.0294575861 0.69947257
gene.13 -1.1283106 14.23087  4.523089 0.0334403966 0.69947257
gene.11  1.4918801 13.53773  4.090369 0.0431282295 0.69947257
gene.77 -1.3691516 13.65080  3.995847 0.0456125163 0.69947257
gene.19  1.0616309 14.19212  3.876618 0.0489630801 0.69947257
gene.50 -1.1771528 13.66889  3.502580 0.0612733098 0.71180393
gene.39  0.9668069 14.01184  2.988546 0.0838554094 0.71180393
gene.91  1.1008979 13.71190  2.849609 0.0913961719 0.71180393

## put some random stuff in the 'genes' list item
> d$genes$whatevs <- sample(letters, 100, TRUE)
> topTags(glmLRT(glmFit(d, design, dispersion = dispersion.true), 2))
Coefficient:  x 
        whatevs      logFC   logCPM        LR       PValue        FDR
gene.1        x  1.3630814 16.06110 11.868112 0.0005710326 0.05710326
gene.66       z -1.7069713 14.11364  8.224193 0.0041335577 0.20667789
gene.5        e -1.1682427 14.17484  4.740671 0.0294575861 0.69947257
gene.13       a -1.1283106 14.23087  4.523089 0.0334403966 0.69947257
gene.11       m  1.4918801 13.53773  4.090369 0.0431282295 0.69947257
gene.77       b -1.3691516 13.65080  3.995847 0.0456125163 0.69947257
gene.19       o  1.0616309 14.19212  3.876618 0.0489630801 0.69947257
gene.50       g -1.1771528 13.66889  3.502580 0.0612733098 0.71180393
gene.39       o  0.9668069 14.01184  2.988546 0.0838554094 0.71180393
gene.91       c  1.1008979 13.71190  2.849609 0.0913961719 0.71180393

## now we get the annotations added from the `genes` list item

From your topTags output we can infer that you do have a 'genes' list item in your DGEList, and you can just add whatever extras you need to that, and you can use the genes column of the existing genes list item as your keys, because they are obviously already Ensembl IDs.

So what you really want to do is

``` y$genes$symbol <- mapIds(org.Hs.eg.db, y$genes$genes, "SYMBOL","ENSEMBL")