Hello everyone, I'm trying to find the best way to do my analysis - specifically, my contrasts. I have samples nested and grouped, thus I'm afraid I have made some errors here.
My experiment is as follows: I have RNASeq data consisting of the following features for each patient: ID, Age, Sex, and whether he/she has any number of active phenotypes (marked 1..N). From each patient, two samples were taken, lesional and non-lesional (L & N respectively).
Here is a sample table; For simplicity, let's assume only two phenotypes are measured.
PatientID Age Sex SampleType Phenotype1 Phenotype2 ,..., PhenotypeN
1 10 M L FALSE TRUE
1 10 M N FALSE TRUE
2 20 F L TRUE TRUE
2 20 F N TRUE TRUE
3 30 M L TRUE FALSE
3 30 M N TRUE FALSE
...
My research questions can be split into these four "prototypes":
- What are the DEGs between Lesional and Non-Lesional samples, for those with Phenotype1=<FALSE> and phenotype2=<TRUE>, or any other combination of the two (±adjusted for age & sex)
- What is the "difference of differences" between phenotype1 (TRUE vs FALSE) and SampleType (L vs NL)?
- What are the DEGs between Lesional and Non-Lesional samples, for those with Phenotype1=<FALSE> regardless of phenotype 2, and vice-versa (±adjusted for age & sex)
- What are the DEGs between Phenotype1 = <TRUE> and Phenotype1=<FALSE>, within the lesional group and within the non-lesional group?
I have read some posts regarding these questions, but I'm still uncertain:
mydesign<-model.matrix(~Age + Sex + SampleType*Phenotype1 + SampleType*Phenotype2, data=sample_data)
colnames(mydesign)<-make.names(colnames(mydesign))
#colnames(mydesign): "X.Intercept." "Age" "SexM" "SampleTypeL"
# "Phenotype1TRUE" "Phenotype2TRUE" "SampleTypeL.Phenotype1TRUE" "SampleTypeL.Phenotype2TRUE"
makeContrasts(LvsN_P1FALSE_P2TRUE = SampleTypeL + SampleType.Pheonotype2TRUE
LvsN_vs_P1TRUEvsFALSE = SampleTypeL.Pheonotype2TRUE
LvsN_P1TRUE_P2dontcare = SampleTypeL + (SampleTypeL.Phenotype2TRUE/2) + SampleTypeL.Phenotype1TRUE
P1TRUEvsFALSE_SampleTypeL = Phenotype1TRUE + SampleTypeL.Phenotype1TRUE
levels=mydesign)
Edited: I'll explain the rationale behind "LvsN_P1TRUE_P2dontcare":
LvsN_P1TRUE_P2dontcare =
(Mean of all samples where [SampleType=L] and [Phenotype1 = TRUE]) - (Mean of all samples where [SampleType=N] and [Phenotype1 = TRUE]) =
((Samples of [SampleType=L], [Phenotype1 = TRUE], [Phenotype1 = FALSE]) + (Samples of [SampleType=L], [Phenotype1 = TRUE], [Phenotype1 = FALSE])) / 2 -
((Samples of [SampleType=N], [Phenotype1 = TRUE], [Phenotype1 = FALSE]) + (Samples of [SampleType=N], [Phenotype1 = TRUE], [Phenotype1 = FALSE])) / 2 =
(SampleTypeL + SampleTypeL + SampleTypeL.Phenotype1TRUE + SampleTypeL.Phenotype1TRUE + SampleTypeL.Phenotype2TRUE)/2 =
(SampleTypeL + (SampleTypeL.Phenotype2TRUE/2) + SampleTypeL.Phenotype1TRUE)
Am I correct here? is that the way to create the contrasts?
Thank you! Jonathan
Cross-posted https://www.biostars.org/p/9562210/
Should I delete one of them? If so, I'm not sure which places is "more suited" for this type of question.
Bioconductor support is the primary help forum for Bioconductor packages, and for limma in particular.
I am the limma maintainer and senior author. I try to make sure that all questions about limma posted to Bioconductor Support receive an answer, either from me or from someone else. I don't feel any responsiblity however to answer questions that are crossposted to multiple sites.
Thank you. The question was deleted from the other site to prevent cross-posting.