Adding gene names (or symbols) in my DESeq result
1
0
Entering edit mode
ytlin610 ▴ 10
@ytlin610-16767
Last seen 4.7 years ago
East Lansing, Michigan

Hi, I'm working on the DE analysis of my RNA-seq data from the green algae Chlamydomonas, and I'm able to generate a normal DE result by DESeq2 like this:

  baseMean log2FoldChange lfcSE stat pvalue padj
  <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
Cre01.g000450.v5.5 256.1055 -0.2995 0.2954 -1.0140 0.3106 0.7465
Cre01.g000500.v5.5 44.3266 -0.7029 0.3880 -1.8114 0.0701 0.3764
Cre01.g000600.v5.5 2.3502 1.5752 1.8795 0.8381 0.4020 0.8108
Cre01.g000650.v5.5 5.7842 1.3050 0.8817 1.4802 0.1388 0.5241
Cre01.g000850.v5.5 4.7789 -0.0103 0.7810 -0.0132 0.9895 0.9999
... ... ... ... ... ... ...
Cre36.g759647.v5.5 10.3085 0.2125 1.1183 0.1900 0.8493 0.9771
Cre39.g760097.v5.5 2.7385 0.8043 1.6105 0.4994 0.6175 0.9069
Cre43.g760547.v5.5 2.9478 -2.4908 1.6740 -1.4879 0.1368 0.5233
Cre44.g760747.v5.5 633.6948 -0.0325 0.2354 -0.1380 0.8902 0.9846
Cre48.g761197.v5.5 5.6491 -0.3471 1.0296 -0.3371 0.7360 0.9423

 

I've also downloaded a text file of gene symbol and transcript ID from JGI (https://phytozome.jgi.doe.gov/pz/portal.html):

Cre01.g000050.t1.1 RWP14
Cre01.g000150.t1.2 ZRT2
Cre01.g000650.t1.1 AMX2
Cre01.g000850.t1.2 CPLD38
Cre01.g000900.t1.2 CPLD20
Cre01.g001400.t1.1 ZMP1
Cre01.g001750.t1.2 TIG1
Cre01.g002200.t1.1 RPB6
Cre01.g002500.t1.2 COP2
Cre01.g003050.t1.2 SEC8
Cre01.g004250.t1.2 TCTEX1
Cre01.g004300.t1.2 ASN1
Cre01.g004450.t1.2 CPLD42
Cre01.g004500.t1.2 LEU1L
Cre01.g004550.t1.2 FAP190
Cre01.g004600.t1.1 RWP12
Cre01.g005150.t1.1 SGA1
Cre01.g005450.t1.2 RSP10
Cre01.g005550.t1.2 ARL2

 

I'm wondering if there is a direct way to add a column of gene symbols to my DE result by mapping the transcripts ID to the text file above? I've done some research and I'm not sure if the org.Hs.eg.db package can help me to do it Thanks!

 

rnaseq gene names deseq2 • 8.9k views
ADD COMMENT
0
Entering edit mode

Not an expert, but I had the same question recently and I did it through

>library(fuzzyjoin)
>regex_left_join(dataframe, genelist,by=c("IDcol"="transID"))

where IDcol is the column containing your IDs in your data frame and transID, the IDs column in your list.

Hope that helps!

ADD REPLY
1
Entering edit mode

Hi Rina, thank you for the response, I've successfully joined the two data frames with this package, thanks a lot!

ADD REPLY
1
Entering edit mode
@mikelove
Last seen 2 days ago
United States

There are numerous ways to do this in R, and Rina, has provided one above. 

I think the easiest approach is:

res <- results(dds, tidy=TRUE)

This will put the row names as a column called "row".

Then once your table of additional gene symbols has the first column called "row", you can just do:

names(new.table) <- c("row", "symbol")
m <- merge(res, new.table, all.x=TRUE)

The all.x=TRUE argument says that it should include rows of res even if there is no matching row of new.table.

ADD COMMENT
0
Entering edit mode

Hi Michael, I've also tried your method and it worked great! I can finally label the gene names on my volcano plots, many thanks!

ADD REPLY

Login before adding your answer.

Traffic: 670 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6