Dear all,
I realize most facets of my question have been discussed here, but I am having hard time pulling all the answers together and generalizing.
I have 36 iPSC, 9 primary CM, and 18 biopsy samples. My quality control plots showed batch effects in the data and I added 3 batches to the design matrix, using, e.g. this:
A: Limma, blocking batch effect
discussion (and reviewing others on the topic)
After I added batches to the design matrix and got the "Coefficients not estimable" message, I found an answer from Gordon Smyth: "You can't expect to be able to estimate more things than your experiment contains information about. In this case, your batches are entirely confounded with patients+treatment, and hence you can't estimate coefficients for them." in one of the posts (and the differential expression was the same with and without the "+batches" in the matrix).
There are three questions I have here:
1) Does it mean that if I have ALL (36+9+18) samples in the design matrix, adding "batches" explicitly is introducing too many variables and there is no way to explicitly include batch effects?
2) In this case, does the regression on the samples implicitly take care of the batch effect?
3a) (if the answer to (2) is "Yes" ) Since there are a lot of discussions on batch effect inclusion, it should be needed for something. In what cases is it needed?
3b) (if the answer to (2) is "No") What can I do?
Thank you. Any help is greatly appreciated.
Slava
Thank you for the comment. Added upon request:
the sample table is the following:
<caption>Sample table</caption>iPSC0 | iPSC3 | iPSC7 | iPSC10 | iPSC14 | iPSC20 | iPSC28 | iPSC35 | iPSC45 | iPSC60 | iPSC90 | iPSC120 | Prim | Fetal | LV | Heart1 |
Sorry, I lumped biological replicates together, but the table still did not fit. There should be 2 more columns: "Heart2" and "Heart diseased"
There are 3 biological replicates for all iPSC entries, 9 replicates for "Prim" entry, 6 replicates for "Fetal" and 3 replicates for "LV", "Heart1", "Heart2", and "Heart diseased".
The code that I used to generate the original (before attempting a batch correction) design matrix:
RNAs <- factor(c("iPSC0","iPSC0","iPSC0", "iPSC3","iPSC3","iPSC3", "iPSC7","iPSC7","iPSC7", "iPSC10","iPSC10","iPSC10", "iPSC14","iPSC14","iPSC14", "iPSC20","iPSC20","iPSC20", "iPSC28","iPSC28","iPSC28", "iPSC35","iPSC35","iPSC35", "iPSC45","iPSC45","iPSC45", "iPSC60","iPSC60","iPSC60", "iPSC90","iPSC90","iPSC90", "iPSC120","iPSC120","iPSC120", "Prim","Prim","Prim","Prim","Prim","Prim","Prim","Prim","Prim", "Fetal","Fetal","Fetal","Fetal","Fetal","Fetal", "Heart1","Heart1","Heart1", "LV","LV","LV", "HeartDis","HeartDis","HeartDis", "Heart2","Heart2","Heart2") ) ### print(RNAs) design <- model.matrix(~0 + RNAs) colnames(design) <- levels(RNAs)
My attempt to account for batches (gse
is an eset read from the series_matrix file with getGEO()
) :
colstring <- substring(pData(gse)[,"source_name_ch1"],1,4) batches <- factor(colstring, levels=unique(colstring)) # print(batches) designBC <- model.matrix(~0 + RNAs + batches) colnames(designBC)[1:nlevels(RNAs)] <- levels(RNAs)
Thank you!
Addition #2
Sorry about misunderstanding. I think the table below is what you requested:
RNAs Batches
iPSC0 | iPSC |
iPSC0 | iPSC |
iPSC0 | iPSC |
iPSC3 | iPSC |
iPSC3 | iPSC |
iPSC3 | iPSC |
iPSC7 | iPSC |
iPSC7 | iPSC |
iPSC7 | iPSC |
iPSC10 | iPSC |
iPSC10 | iPSC |
iPSC10 | iPSC |
iPSC14 | iPSC |
iPSC14 | iPSC |
iPSC14 | iPSC |
iPSC20 | iPSC |
iPSC20 | iPSC |
iPSC20 | iPSC |
iPSC28 | iPSC |
iPSC28 | iPSC |
iPSC28 | iPSC |
iPSC35 | iPSC |
iPSC35 | iPSC |
iPSC35 | iPSC |
iPSC45 | iPSC |
iPSC45 | iPSC |
iPSC45 | iPSC |
iPSC60 | iPSC |
iPSC60 | iPSC |
iPSC60 | iPSC |
iPSC90 | iPSC |
iPSC90 | iPSC |
iPSC90 | iPSC |
iPSC120 | iPSC |
iPSC120 | iPSC |
iPSC120 | iPSC |
Prim | Prim |
Prim | Prim |
Prim | Prim |
Prim | Prim |
Prim | Prim |
Prim | Prim |
Prim | Prim |
Prim | Prim |
Prim | Prim |
Fetal | Comm |
Fetal | Comm |
Fetal | Comm |
Fetal | Comm |
Fetal | Comm |
Fetal | Comm |
Heart1 | Comm |
Heart1 | Comm |
Heart1 | Comm |
LV | Comm |
LV | Comm |
LV | Comm |
HeartDis | Comm |
HeartDis | Comm |
HeartDis | Comm |
Heart2 | Comm |
Heart2 | Comm |
Heart2 | Comm |
Thank you!
The answers to this kind of question depend on what your sample table looks like. Can you show the full table and the code that you're using to generate your design matrix? (Note that there is an "insert table" button in the edit bar when editing your question that you can use to add your sample table.)
Can you make a data frame with one row per sample, with a column for "RNAs" and a second column for "batches"? This is the table I'm asking you to include.