Question

Tissue specific genes with limma

0

Entering edit mode

Ron Ophir ▴ 270

@ron-ophir-1010

Last seen 10.2 years ago

Dear Limma users, In our study we would like to identify tissue specific genes, i.e., genes that are differentially expressed in a specific tissue. From practical reason each RNA extract is a mixture of tissues. These RNA sample was hybridized to Affymetrix chips. I thought that linear model is a good algorithm to extract the relative contribution of tissue to each gene expression (correct if I wrong up to here). Therefore I prepared a design matrix as follow: LP ML ADL ABL T S B LER 1 1 1 1 1 1 1 LER 1 1 1 1 1 1 1 M7 0 1 1 1 1 0 0 M5 1 0 1 1 1 1 1 M7 0 1 1 1 1 0 0 M5 1 0 1 1 1 1 1 AD 1 0 1 0 1 0 0 M2 1 0 1 1 1 1 1 Trichom 1 0 1 1 0 1 1 Stipuls 1 0 1 1 1 0 1 Stipuls 1 0 1 1 1 0 1 AB 1 0 0 1 0 1 0 AB 1 0 0 1 0 1 0 AD 1 0 1 0 1 0 0 LER 1 1 1 1 1 1 1 M2 1 0 1 1 1 1 1 Where LER for example is the RNA sample that has a mixture of all tissues LER= LP+ML+ADL+ABL+T+S and the rest of the row are the RNA mixtures of any set of tissues signed by 1. We also assume no interaction and that the tissues are in equal amount therefore we expect by linear models to find the relative contribution of each tissue to the gene expression. First is the above matrix is the right matrix or should I set the replicates to its proportion in order not to violate the assumption that the tissues are present in equal amount in all mixtures, like this: LP ML ADL ABL T S B LER 0.3 0.3 0.3 0.3 0.3 0.3 0.3 LER 0.3 0.3 0.3 0.3 0.3 0.3 0.3 M7 0 0.5 0.5 0.5 0.5 0 0 M5 0.5 0 0.5 0.5 0.5 0.5 0.5 M7 0 0.5 0.5 0.5 0.5 0 0 M5 0.5 0 0.5 0.5 0.5 0.5 0.5 AD 0.5 0 0.5 0 0.5 0 0 M2 0.5 0 0.5 0.5 0.5 0.5 0.5 Trichom 1 0 1 1 0 1 1 Stipuls 0.5 0 0.5 0.5 0.5 0 0.5 Stipuls 0.5 0 0.5 0.5 0.5 0 0.5 AB 0.5 0 0 0.5 0 0.5 0 AB 0.5 0 0 0.5 0 0.5 0 AD 0.5 0 0.5 0 0.5 0 0 LER 0.3 0.3 0.3 0.3 0.3 0.3 0.3 M2 0.5 0 0.5 0.5 0.5 0.5 0.5 Second, to identify tissue specific genes we would like to have the summation of a specific tissue for all mixtures. In details, as a result of linear model fit we expect to get a matrix of expression values for each gene, which like design matrix rows are RNA samples and columns are tissues. Where the observed value of LER mixture, for example, equal for sum of the values of the relative contribution of each tissue: LER= 0.5(from LP)+4(from ML)+3(from ADL)+1.2(from ABL)+0.3(from T)+1(from S)=10 where 10 is the observed expression value for a given mixture for a given gene and 0.5,4,3,1.2,0.3,1 are the deduced expression values from the linear fit for each tiisues. What we are interesting is finding the summation for each gene over the columns, i.e., LP = 0.5(relative LP contribution in LER)+0.6(M2)+1.2(M5)+0(M7)+1(Trichom)+3(AB)+2(AD) for each tissue. In limma if we set in the design one of the tissues as a reference (tissue that exist in all mixture) we will get the differential expression of all other tissues relative to it, however we are looking to the absolute expression. In other words I am looking for the absolute expression of each gene for each tissue rather than having the differential expression which is the usually the final result in limma. Is it possible to do that? Ron

limma limma • 1.0k views

ADD COMMENT • link updated 19.9 years ago by Gordon Smyth 51k • written 19.9 years ago by Ron Ophir ▴ 270

score 0 · Answer 1 · 2005-01-13

>Date: Thu, 13 Jan 2005 10:49:20 +0200 >From: "Ron Ophir" <ron.ophir@weizmann.ac.il> >Subject: [BioC] Tissue specific genes with limma >To: <bioconductor@stat.math.ethz.ch> >Message-ID: <s1e6524f.098@wisemail.weizmann.ac.il> >Content-Type: text/plain; charset=US-ASCII > >Dear Limma users, >In our study we would like to identify tissue specific genes, i.e., >genes that are differentially >expressed in a specific tissue. From practical reason each RNA extract >is a mixture of tissues. These RNA sample was hybridized to Affymetrix >chips. I thought that linear model is a good algorithm to extract the >relative contribution of tissue to each gene expression (correct if I >wrong up to here). Therefore I prepared a design matrix as follow: > > LP ML ADL ABL T S B > LER 1 1 1 1 1 1 1 > LER 1 1 1 1 1 1 1 > M7 0 1 1 1 1 0 0 > M5 1 0 1 1 1 1 1 > M7 0 1 1 1 1 0 0 > M5 1 0 1 1 1 1 1 > AD 1 0 1 0 1 0 0 > M2 1 0 1 1 1 1 1 > Trichom 1 0 1 1 0 1 1 > Stipuls 1 0 1 1 1 0 1 > Stipuls 1 0 1 1 1 0 1 > AB 1 0 0 1 0 1 0 > AB 1 0 0 1 0 1 0 > AD 1 0 1 0 1 0 0 > LER 1 1 1 1 1 1 1 > M2 1 0 1 1 1 1 1 >Where LER for example is the RNA sample that has a mixture of all >tissues LER= LP+ML+ADL+ABL+T+S and the rest of the row are the RNA >mixtures of any set of tissues signed by 1. We also assume no >interaction and that the tissues are in equal amount therefore we expect >by linear models to find the relative contribution of each tissue to the >gene expression. >First is the above matrix is the right matrix or should I set the >replicates to its proportion in order not to violate the assumption that >the tissues are present in equal amount in all mixtures, like this: > LP ML ADL ABL T S B > LER 0.3 0.3 0.3 0.3 0.3 0.3 0.3 > LER 0.3 0.3 0.3 0.3 0.3 0.3 0.3 > M7 0 0.5 0.5 0.5 0.5 0 0 > M5 0.5 0 0.5 0.5 0.5 0.5 0.5 > M7 0 0.5 0.5 0.5 0.5 0 0 > M5 0.5 0 0.5 0.5 0.5 0.5 0.5 > AD 0.5 0 0.5 0 0.5 0 0 > M2 0.5 0 0.5 0.5 0.5 0.5 0.5 > Trichom 1 0 1 1 0 1 1 > Stipuls 0.5 0 0.5 0.5 0.5 0 0.5 > Stipuls 0.5 0 0.5 0.5 0.5 0 0.5 > AB 0.5 0 0 0.5 0 0.5 0 > AB 0.5 0 0 0.5 0 0.5 0 > AD 0.5 0 0.5 0 0.5 0 0 > LER 0.3 0.3 0.3 0.3 0.3 0.3 0.3 > M2 0.5 0 0.5 0.5 0.5 0.5 0.5 I can't see how you've obtained the entries in this matrix. >Second, to identify tissue specific genes we would like to have the >summation of a specific tissue for all mixtures. In details, >as a result of linear model fit we expect to get a matrix of expression >values for each gene, which like design matrix rows are RNA samples and >columns are tissues. Where the observed value of LER mixture, for >example, equal for sum of the values of the relative contribution of >each tissue: LER= 0.5(from LP)+4(from ML)+3(from ADL)+1.2(from >ABL)+0.3(from T)+1(from S)=10 where 10 is the observed expression value >for a given mixture for a given gene and 0.5,4,3,1.2,0.3,1 are the >deduced expression values from the linear fit for each tiisues. What we >are interesting is finding the summation for each gene over the columns, >i.e., LP = 0.5(relative LP contribution in >LER)+0.6(M2)+1.2(M5)+0(M7)+1(Trichom)+3(AB)+2(AD) for each tissue. In >limma if we set in the design one of the tissues as a reference (tissue >that exist in all mixture) we will get the differential expression of >all other tissues relative to it, however we are looking to the absolute >expression. In other words I am looking for the absolute expression of >each gene for each tissue rather than having the differential expression >which is the usually the final result in limma. >Is it possible to do that? In principle linear modeling can do this, but you need to ensure that you've pre-processed the data in an appropriate way and that the model that you're fitting matches the data. I am not sure about this. The design matrices, together with the fact that expression can't be negative in any tissue, implies that the overall expression is higher in some target samples that in others. For example, 'M70' is the same as 'LER' but without the contributions of tissues, LP, S and B. You are asserting that expression is lower in M70 than in LER for all genes expressed in LP, S or B, and that all other genes have equal expression in M70 and LER. Is this what you intend? If it is, then you can't use quantile normalization across chips as done for example by rma(). You would need specialist assistance. Really you should collaborate with some one about your experiment in more detail. Gordon >Ron