Entering edit mode
Bernard Lee Kok Bang
▴
10
@bernard-lee-kok-bang-6675
Last seen 10.4 years ago
Dear all, I would like to ask a question in regards to microarray data
normalization.
Scenario;
I have in hand a collection of 300 cancer cell lines (multiple cancer
types) raw ?.CEL? files, all from the same study/batch. My aim is to
obtain the gene expression values and use them downstream. However I
am only interested in a subset of these .CEL files, for example I am
only interested in NON-blood cancer cell lines (n=250).
I?m wondering which of these two options is more appropriate for my
scenario:
Option 1:
1) Normalize all 300 .CEL by rma.
2) After normalization, manually remove the 50 blood samples I am
NOT interested in
3) Use the normalized data of 250 samples for downstream analysis
Option 2:
1) Normalize ONLY the 250 .CEL by rma (imagine as if the 50 blood
samples does not exists)
2) Use the normalized data of 250 samples for downstream analysis
My downstream analysis simply involves ranking the gene from highest
expression to the lowest.
>From my point of view, I am favoring the first option. This is
because since I have all the solid tumor and blood cell line data, I
might as well normalized them altogether first before manually
excluding the blood cell line, as to my knowledge the purpose of
normalization is to remove batch effects?? So the larger the sample
size during rma normalization the better??
Thanks in advance.
Bernard Lee
Research Assistant
Cancer Research Initiatives Foundation (CARIF)
University of Malaya (UM)