Question

Adding gene names (or symbols) in my DESeq result

0

Entering edit mode

ytlin610 ▴ 10

@ytlin610-16767

Last seen 4.7 years ago

East Lansing, Michigan

Hi, I'm working on the DE analysis of my RNA-seq data from the green algae Chlamydomonas, and I'm able to generate a normal DE result by DESeq2 like this:

	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
	<numeric>	<numeric>	<numeric>	<numeric>	<numeric>	<numeric>
Cre01.g000450.v5.5	256.1055	-0.2995	0.2954	-1.0140	0.3106	0.7465
Cre01.g000500.v5.5	44.3266	-0.7029	0.3880	-1.8114	0.0701	0.3764
Cre01.g000600.v5.5	2.3502	1.5752	1.8795	0.8381	0.4020	0.8108
Cre01.g000650.v5.5	5.7842	1.3050	0.8817	1.4802	0.1388	0.5241
Cre01.g000850.v5.5	4.7789	-0.0103	0.7810	-0.0132	0.9895	0.9999
...	...	...	...	...	...	...
Cre36.g759647.v5.5	10.3085	0.2125	1.1183	0.1900	0.8493	0.9771
Cre39.g760097.v5.5	2.7385	0.8043	1.6105	0.4994	0.6175	0.9069
Cre43.g760547.v5.5	2.9478	-2.4908	1.6740	-1.4879	0.1368	0.5233
Cre44.g760747.v5.5	633.6948	-0.0325	0.2354	-0.1380	0.8902	0.9846
Cre48.g761197.v5.5	5.6491	-0.3471	1.0296	-0.3371	0.7360	0.9423

I've also downloaded a text file of gene symbol and transcript ID from JGI (https://phytozome.jgi.doe.gov/pz/portal.html):

Cre01.g000050.t1.1	RWP14
Cre01.g000150.t1.2	ZRT2
Cre01.g000650.t1.1	AMX2
Cre01.g000850.t1.2	CPLD38
Cre01.g000900.t1.2	CPLD20
Cre01.g001400.t1.1	ZMP1
Cre01.g001750.t1.2	TIG1
Cre01.g002200.t1.1	RPB6
Cre01.g002500.t1.2	COP2
Cre01.g003050.t1.2	SEC8
Cre01.g004250.t1.2	TCTEX1
Cre01.g004300.t1.2	ASN1
Cre01.g004450.t1.2	CPLD42
Cre01.g004500.t1.2	LEU1L
Cre01.g004550.t1.2	FAP190
Cre01.g004600.t1.1	RWP12
Cre01.g005150.t1.1	SGA1
Cre01.g005450.t1.2	RSP10
Cre01.g005550.t1.2	ARL2
…	…

I'm wondering if there is a direct way to add a column of gene symbols to my DE result by mapping the transcripts ID to the text file above? I've done some research and I'm not sure if the org.Hs.eg.db package can help me to do it Thanks!

rnaseq gene names deseq2 • 8.9k views

ADD COMMENT • link updated 6.3 years ago by Michael Love 43k • written 6.3 years ago by ytlin610 ▴ 10

0

Entering edit mode

Not an expert, but I had the same question recently and I did it through

>library(fuzzyjoin)
>regex_left_join(dataframe, genelist,by=c("IDcol"="transID"))

where IDcol is the column containing your IDs in your data frame and transID, the IDs column in your list.

Hope that helps!

ADD REPLY • link 6.3 years ago rina ▴ 30

1

Entering edit mode

Hi Rina, thank you for the response, I've successfully joined the two data frames with this package, thanks a lot!

ADD REPLY • link 6.3 years ago ytlin610 ▴ 10

score 1 · Answer 1 · 2018-08-03

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 2 days ago

United States

There are numerous ways to do this in R, and Rina, has provided one above.

I think the easiest approach is:

res <- results(dds, tidy=TRUE)

This will put the row names as a column called "row".

Then once your table of additional gene symbols has the first column called "row", you can just do:

names(new.table) <- c("row", "symbol")
m <- merge(res, new.table, all.x=TRUE)

The all.x=TRUE argument says that it should include rows of res even if there is no matching row of new.table.

ADD COMMENT • link 6.3 years ago Michael Love 43k

0

Entering edit mode

Hi Michael, I've also tried your method and it worked great! I can finally label the gene names on my volcano plots, many thanks!

ADD REPLY • link 6.3 years ago ytlin610 ▴ 10