I am not sure how to model the microarray dataset (10226 obs. of 120 variables). I have been given a microarray dataset where the groups are divided up into different types of TB patients with different disease states:
Culture confirmed TB, Culture negative TB, healthy contact, healthy LTBI, Other diagnoses,
18 35 12 11 44
I am trying to distill DE ncRNAs in the TB patients. Thus far, I have the actual microarray data and an information file read into a variable called "clinical". Clinical contains a column that specifies each sample label ID e.g. "35-801197" while another details the corresponding "Culture confirmed" category of the sample's disease condition.
Initially I considered labelling the columns of the microarray data variable with the sample label IDs as follows:
"ptr <- match( colnames(E.ncRNA), paste('X',clinical$ncRNA.array.ID,sep='') )
colnames(E.ncRNA) <- clinical$Sample.label[ptr]
"
But then I considered that if I want to distinguish DE ncRNAs based on the disease state:
"ptr <- match( colnames(E.ncRNA), paste('X',clinical$ncRNA.array.ID,sep='') )
colnames(E.ncRNA) <- clinical$Sample.label[ptr]"
When I tried making a design matrix and fitting a linear model to this relabelled dataset (as follows), in either case, I got the following errors:
>grouping <- factor(sub(" ", "", clinical2$Diagnostic.category))
> design <- model.matrix(~0 + grouping)
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
> colnames(design) <- levels(grouping)
I'm pretty inexperienced with limma but I was hoping someone could point me in the right direction in terms of experimental design and code.
Thanks