Cell-type specific RNAseq analysis
1
0
Entering edit mode
@merienne-nicolas-6729
Last seen 7.1 years ago
Switzerland

Dear all,

 

We are working with RNAseq data to characterize specific cell populations. We have extracted 4 distinct cell populations (A, B, C and D) and performed Illumina RNAseq on these sample. Reads were mapped with TopHat and counts were determined with HTseq count. The sequencing platform has advised us to use edgeR-voom for data normalization and transformation and limma package for identification of differentially expressed genes. We compared each cell population one by one with these contrasts:

A vs B

A vs C

A vs D

B vs C

B vs D

C vs D

We obtained our lists of up and down regulated transcripts for each contrasts. However, we are interested to identify genes that are specifically expressed in one cell type and not in the others. We thought of 2 methods for this:

-first: take the 3 contrasts implying each cell populations (i.e A vs B, A vs C and A vs D for the cell population A) and extract genes that are differentially expressed in the 3 contrasts. With this, we obtained a few number of "cell-type specific transcripts" (classically between 100-200).

-second: design new contrasts comparing each cell type with all the other (i.e A vs (B+C+D)) and apply limma. With this method, the vast majority of the genes have significant adjusted p values (but all have negative logFC, indicating they are not specific for the cell population A...)

It seems evident for us that the second method is not suitable but the reasons are not really clear (we are thinking that pooling all the populations creates an imbalance for the analysis, as if we are comparing A with mean of B+C+D). However, is our first method right or is there another way to statistically identify cell-type specific mRNA?

 

Please, do not hesitate to indicate me if my explanations are not clear.

Thank you in advance.

 

Best regards,

 

Nicolas 

rnaseq • 1.5k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States
Hi Nicolas, You can do it either way (well, the second way with a modification), but you are doing two different things. In the first case you are finding genes that are consistently differentially expressed between A and the other three cell types. Think Venn diagram, where your genes are in the center of a three-circle Venn diagram. There will be some genes that are unique to each individual contrast, as well as those that are in the individual intersections. The second way, what you really want to do is the contrast (A vs (B+C+D)/3), where you are comparing the A cell type versus the mean expression of the other three types. If you don't take the average of the B+C+D, what you are testing for are genes where the expression in A is equal to the sum of the expression in B, C, and D (or conversely, you are looking for genes in A that are 3X the average expression in B+C+D). So for glmLRT, you would do something like contrast = c(1,-0.33,-0.33, -0.33), assuming that your design is A B C D. Does that make sense? Best, Jim On Thu, Sep 18, 2014 at 3:25 PM, Merienne Nicolas on Biostar < noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User Merienne Nicolas <https: support.bioconductor.org="" u="" 6729=""/> wrote Question: > Cell-type specific RNAseq analysis > <https: support.bioconductor.org="" p="" 61550=""/>: > > Dear all, > > > > We are working with RNAseq data to characterize specific cell populations. > We have extracted 4 distinct cell populations (A, B, C and D) and performed > Illumina RNAseq on these sample. Reads were mapped with TopHat and counts > were determined with HTseq count. The sequencing platform has advised us to > use edgeR-voom for data normalization and transformation and limma package > for identification of differentially expressed genes. We compared each cell > population one by one with these contrasts: > > A vs B > > A vs C > > A vs D > > B vs C > > B vs D > > C vs D > > We obtained our lists of up and down regulated transcripts for each > contrasts. However, we are interested to identify genes that are > specifically expressed in one cell type and not in the others. We thought > of 2 methods for this: > > -first: take the 3 contrasts implying each cell populations (i.e A vs B, A > vs C and A vs D for the cell population A) and extract genes that are > differentially expressed in the 3 contrasts. With this, we obtained a few > number of "cell-type specific transcripts" (classically between 100-200). > > -second: design new contrasts comparing each cell type with all the other > (i.e A vs (B+C+D)) and apply limma. With this method, the vast majority of > the genes have significant adjusted p values (but all have negative logFC, > indicating they are not specific for the cell population A...) > > It seems evident for us that the second method is not suitable but the > reasons are not really clear (we are thinking that pooling all the > populations creates an imbalance for the analysis, as if we are comparing A > with mean of B+C+D). However, is our first method right or is there another > way to statistically identify cell-type specific mRNA? > > > > Please, do not hesitate to indicate me if my explanations are not clear. > > Thank you in advance. > > > > Best regards, > > > > Nicolas > > ------------------------------ > > You may reply via email or visit Cell-type specific RNAseq analysis > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT
0
Entering edit mode

Hi Jim,

 

Thank you for your answer. Indeed, results seem more correct using your contrast matrix. We have the impression that the first method is more severe (less genes that are consistently differentially expressed among contrasts). We will check if these genes consistently differentially expressed are found in the significant gene set of the second one. We are thinking that the second method is statistically more robust, do you think it is right?

 

Thank you.

Best,

 

Nicolas

ADD REPLY
0
Entering edit mode
Hi Nicolas, I don't know if one is more robust than the other. They are just different. Certainly the second method involves an actual statistical test, whereas the first method is a simple grouping of 'like' genes, but people do both, routinely. Best, Jim On Thu, Sep 18, 2014 at 1:13 PM, Merienne Nicolas on Biostar < noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User Merienne Nicolas <https: support.bioconductor.org="" u="" 6729=""/> wrote Comment: > Cell-type specific RNAseq analysis > <https: support.bioconductor.org="" p="" 61550="" #61552="">: > > Hi Jim, > > > > Thank you for your answer. Indeed, results seem more correct using your > contrast matrix. We have the impression that the first method is more > severe (less genes that are consistently differentially expressed among > contrasts). We will check if these genes consistently differentially > expressed are found in the significant gene set of the second one. We are > thinking that the second method is statistically more robust, do you think > it is right? > > > > Thank you. > > Best, > > > > Nicolas > > ------------------------------ > > You may reply via email or visit > C: Cell-type specific RNAseq analysis > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY

Login before adding your answer.

Traffic: 669 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6