Adding Batch Factor in linear model...
0
0
Entering edit mode
@mdmamunur-rashid-3595
Last seen 10.3 years ago
-------- Original Message -------- Subject: Re: [BioC] Fitting arrayweights in linear model Date: Tue, 3 Nov 2009 14:23:52 +0000 From: Md.Mamunur Rashid <mamunur.rashid@kcl.ac.uk> To: Wei Shi <shi@wehi.edu.au>, bioconductor <bioconductor@stat.math.ethz.ch> Dear Wei, Sorry for replying late. I was on vacation and went back to home. thanks for your suggestion. I have tried reading some previous mails from bioc forum to find a way to add batch factor in the linear model. But did not clearly understand how to do that. In my linear model I am comparing three conditions. So I have created a contrast matrix: fit1<- lmfit(normalized_matrix, design, array_weights) contrast<- make.contrast(group2-gropu1,group3-group2,group3-group1,levels= design) confit<- contrasts.fit(fit1 ,contrast) efit<- eBayes() I have tried the same R script with a new batch of 96 samples from the same 3 groups. These 96 sample does not really have that much variation among them . Yet again the top list of genes have a very high adjusted p.value. Q1. Does this mean that there is not not enough evidence of differential expression. ?? Q2. Is it a very good idea to analyze 96 samples in one go, or should I subset the file( ExpressionSet) to work with smaller set of data. Cause I have got this impression that for this large data set variations in one of the slides (illumina HT12 containing 12 arrays) do a lot of damage when the whole data set is normalized. ...????? (I have done this part to check if my algorithm is ok or not ..) I have tested the R script for contrasting among male and female group. and found the evidence of differential expression among the genes found in Y chromosome. This finding gave me a idea that my R script is ok (anyway this is very straight forward) ****Please any suggestion about this assumption. contrast<- make.contrast(male-female, levels= design) Can you please suggest me few more gene filtering methods. I will start with the gene filter package. Thanks in advance. regards, Mamun On 09/30/2009 05:26 AM, Wei Shi wrote: > Dear Mamun: > > Sorry for replying to you late. Your boxplot of raw data shows > that there is a batch effect in your data. You might need to include > this batch effect in your linear model fitting. > > Arrays with low weights will not be dropped when you include array > weights in your linear model fitting. The idea of giving weights to > arrays is to keep all the arrays in the analysis. > > Hope this helps. > > Cheers, > Wei > > > Md.Mamunur Rashid wrote: >> Dear Wei, >> >> Thanks for your reply. I know it's a long email, but I tried to give details about the raw data >> Please have a look when you have some spare time. >> >> >> 1. About the raw data : >> ----------------------- >> Yes this arrays are from the same bead chip. The experiments were done by three individuals (each >> person conducted experiments with 3 beadchips each consisting 12 arrays, 96 intotal). I think this >> is the reason for high variance among 3 sets of arrays. 1-32, 33-64, 65-96. But after normalization >> ("quantile"-lumi package) the data looks quite normal. >> >> >> I have attached two snapshot of boxplot of the raw data and normalized data. >> >> A. Normalized data -http://www.esnips.com/doc/ad4c8a55-2166-49 11-8510-df81eaea63df/data_96_illumina_norm >> >> B. Raw data -http://www.esnips.com/doc/0a965715-b5ec-44da- b64c-4d025c63827b/96_illumina_raw >> >> If you look at the two snapshots above : Expression range for raw data in most cases are below 6. But >> after normalization it came slightly above 6. Is it possible at all or something is going wrong during >> the normalization. >> >> >> One question here. Does illumina data require intra-array normalization.?? To the extent of my little >> study I know that they do not need it as it is already done. Please let me know what do you think. >> >> >> 2. About Arrayweight >> -------------------- >> >> As you can see below the the results have not been improved much compare to the old topTable result. >> >> 1. Do you think this high variance in arrays are causing this problem.??? >> 2. Does fitting the arrayweight in the linear model along the normalized object drops the downweighted arrays >> or that needed to be done manually?? >> >> >> 3. Pre-processing with beadarray >> -------------------------------- >> >> I have the done the same analysis using beadArray package(data import and pre processing) and had the similar >> result. So there can be two possibilities which can lead to this problem. >> >> A. Problems in the raw data. >> B. Or I am doing a wrong linear modeling of the data.(as that is the common part in the both analysis) >> >> I have attached my code fragment here. please have a look: >> >> >> *** What are the other bio packages/tests I can use for identifying differentially expressed genes>? >> >> Thanks in advance >> >> best regards, >> Md.Mamunur Rashid >> >> >> Code Fragment >> ----------------------------------- >> >> About fitting the weights : I have done the following >> >> data_96_nuId_E holds the normalized object >> >> >> >> data_96_sampleType<- c("I","I","I","I","I","S","S","S","I","C","S", "S","C","I","C","I","I","I","I","S","I","S","I","S","C","C","S","I","C ","S","I","S","S","C","C","S","S","I","I","I","S","I","S","I","I","C", "I","C","S","C","I","I","S","C","S","S","S","C","C","I","S","S","C","I ","I","C","C","S","S","S","C","C","S","I","C","C","C","S","C","C","I", "I","C","S","C","C","C","S","C","S","C","I","I","S","I","C") >> data_96_design<- model.matrix(~0+data_96_sampleType) >> colnames(data_96_design)<- c('C','I','S') >> >> ## filtring out the non-expressed genes >> presentCount<- detectionCall(data_96_raw_nuID) >> data_96_Matrix<- data_96_Matrix_old[presentCount> 0, ] >> data_96_probeList<- rownames(data_96_Matrix) >> >> >> arraw_E_96<- arrayWeights(data_96_nuID_E,data_96_design) >> >> data_96_contrast<- makeContrasts (C-I,I-S,S-C,levels=data_96_design) >> >> fitw_nu_96<- lmFit(data_96_Matrix,data_96_design,weights=arraw_E_96) >> fitw_nu_con_96<- contrasts.fit(fitw_nu_96,data_96_contrast) >> fitw_e_nu_96<- eBayes(fitw_nu_con_96) >> topTable(fitw_e_nu_96,coef=1,adjust="BH") >> >> >> ID logFC AveExpr t P.Value adj.P.Val >> 3564 oEptU7gl5ULB7ghAf4 -0.11681240 6.341389 -4.749578 7.204301e-06 0.1484950 >> 3900 cUSJ2.kBKqPAk0eZPY 0.31230030 6.881320 4.561405 1.514923e-05 0.1561280 >> 10705 uHk1A0FNxKMV0k7a50 -0.13640964 6.616042 -4.197525 6.070562e-05 0.2895144 >> 14598 WokXUp_QIC4pd_s1.U -0.09998362 6.342947 -4.111694 8.338387e-05 0.2895144 >> 4324 xXLgIpJ_AeI7gRNJ_U 0.11230531 6.478843 4.072272 9.634276e-05 0.2895144 >> 6399 HHteoiUdSJz9wh.e64 0.16605833 7.322017 4.034305 1.106350e-04 0.2895144 >> 13446 x65aXRiHVTIIUQiv00 -0.23601528 6.487643 -3.986742 1.314198e-04 0.2895144 >> 13445 HrlpdGIdVMghRCK.TU -0.20278431 6.470915 -3.986291 1.316337e-04 0.2895144 >> 16912 BgFUhUUgIQu3JDl_sU 0.11902363 6.815447 3.978352 1.354541e-04 0.2895144 >> 9813 NAv6twbtRAh7QuCDro 0.09999394 6.511654 3.958318 1.455733e-04 0.2895144 >> B >> 3564 3.5478599 >> 3900 2.8790555 >> 10705 1.6341835 >> 14598 1.3504633 >> 4324 1.2214803 >> 6399 1.0980586 >> 13446 0.9445695 >> 13445 0.9431198 >> 16912 0.9176269 >> 9813 0.8534456 >> >> >> >> >> >> About fitting the arrayweights in the linear model : >> previously I have produced arrayweights >> >> On 09/21/2009 12:59 AM, Wei Shi wrote: >> >>> Dear Mamun: >>> >>> Sorry for replying to you late. I am just back from the holiday. >>> >>> Some of your arrays have pretty low weights. Arrays 37-42 have >>> weights of around 0.2. Are these arrays from the same bead chip? You >>> can try put array weights in the linear model fitting to see if you >>> can get DE genes. >>> >>> Cheers, >>> Wei >>> >>> Md.Mamunur Rashid wrote: >>> >>>> Dear Wei, >>>> >>>> Thanks again for your reply. I have measured the array-weights using limma's arrayweights() method. There seem to be >>>> a good amount of variation among the raw data. But I am not really sure whether it is acceptable or not??. >>>> >>>> But the normalized data have an average value of 1.022609 >>>> with min = 0.5641711 max =1.417277. >>>> >>>> Also 2 more question : >>>> >>>> 1. can you explain a bit more about how can I filter out the genes whose detection p-values >>>> are larger than cut-off (e.g. 0.01) in all samples?? >>>> 2. how can I check batch effect and chip effect in the raw data.?? >>>> >>>> Thanks in advance. Any help from anyone also is appreciated. >>>> >>>> regards >>>> Mamun >>>> >>>> >>>> Below is the result from arrayWeigts() method.(Table 1 (weights for raw data) Table 2(weights of normalized data)) >>>> *** Also attached a text file with the weights. >>>> >>>> ArrayWeight of Raw Data: >>>> >>>> arrayWeights(data_96_raw,data_96_design) >>>> >>>> >>>> 1 2 3 4 5 6 7 8 >>>> 3.2861102 0.6091478 2.6927503 2.2778201 2.4391841 0.7802437 1.1378995 0.5714981 >>>> 9 10 11 12 13 14 15 16 >>>> 0.8091165 0.5260511 0.7565424 0.8354660 0.2929977 0.4950789 0.4512856 0.1283279 >>>> 17 18 19 20 21 22 23 24 >>>> 0.6713553 0.6275333 0.5441217 0.2711778 0.7278274 0.3323277 0.5786381 0.7375983 >>>> 25 26 27 28 29 30 31 32 >>>> 0.5793944 0.4018671 0.6658730 1.6805341 0.7210032 1.2802383 0.8902326 0.6715769 >>>> 33 34 35 36 37 38 39 40 >>>> 0.2959048 0.8854763 0.7587111 1.5067705 0.1766820 0.1639351 0.1739882 0.1833592 >>>> 41 42 43 44 45 46 47 48 >>>> 0.2443871 0.2188281 0.2945594 0.2607380 0.3419603 0.5932775 0.3396003 0.7229979 >>>> 49 50 51 52 53 54 55 56 >>>> 0.8274153 1.7944800 0.8306203 1.0738534 1.5052515 2.9798751 2.0836544 1.5010941 >>>> 57 58 59 60 61 62 63 64 >>>> 2.3827591 2.7575819 1.5967670 3.3397084 2.0576009 2.9660262 3.8417611 1.3469622 >>>> 65 66 67 68 69 70 71 72 >>>> 1.7334743 2.7855044 4.5417142 3.2551286 3.9634064 3.8054377 3.8020479 4.2732205 >>>> 73 74 75 76 77 78 79 80 >>>> 0.2659556 0.3440704 0.6404601 0.8697107 1.0736648 0.8835502 0.9912724 0.4334270 >>>> 81 82 83 84 85 86 87 88 >>>> 0.8801461 1.0309162 2.5352693 1.8366521 2.5501944 4.5388081 4.2603196 1.7665255 >>>> 89 90 91 92 93 94 95 96 >>>> 2.5511084 2.3867529 4.0237262 2.1647555 3.3101943 2.5863558 1.8618118 0.2397050 >>>> >>>> >>>> >>>> >>>> Array Weights of normalized data: >>>> >>>> 1 2 3 4 5 6 7 8 >>>> 0.8908085 1.1978748 1.1454450 1.0215452 1.3605699 0.8141813 1.4172765 1.2958270 >>>> 9 10 11 12 13 14 15 16 >>>> 1.0522608 1.2562065 1.3531495 1.1489293 1.0596229 1.1217643 0.9156484 0.6705476 >>>> 17 18 19 20 21 22 23 24 >>>> 0.9643412 1.2120423 1.0672521 0.9983735 0.8096782 0.9379575 1.0493924 0.7614746 >>>> 25 26 27 28 29 30 31 32 >>>> 1.1573255 1.2735353 1.3795986 0.9499952 1.3602425 1.2549726 1.1772558 1.4158351 >>>> 33 34 35 36 37 38 39 40 >>>> 1.3659316 1.0658774 1.3185647 0.9017821 0.7915704 0.6326567 1.0325512 0.6812818 >>>> 41 42 43 44 45 46 47 48 >>>> 1.0142990 1.1921695 1.1476346 0.8255798 1.2012711 1.1893762 0.9367947 1.1594407 >>>> 49 50 51 52 53 54 55 56 >>>> 1.1072266 1.0561352 0.8488650 1.1756689 1.0554286 0.9326934 1.2836555 0.9665945 >>>> 57 58 59 60 61 62 63 64 >>>> 1.3293081 1.3027377 1.3459009 1.2834152 0.9657114 1.0041629 0.8823282 0.6843356 >>>> 65 66 67 68 69 70 71 72 >>>> 0.8883121 0.9374396 0.9799943 0.9733364 1.2045700 1.0693506 0.7730626 0.9408430 >>>> 73 74 75 76 77 78 79 80 >>>> 0.6685933 1.0492289 0.9952222 0.9690452 1.0150076 1.0148188 0.6687192 0.5641711 >>>> 81 82 83 84 85 86 87 88 >>>> 0.7872238 0.8793660 0.8553373 0.9662603 0.6143880 1.0340769 0.9842437 0.6626941 >>>> 89 90 91 92 93 94 95 96 >>>> 1.0270788 0.8590296 1.0807271 0.8162435 1.0398548 0.8993595 1.2094240 0.5715857 >>>> >>>> >>>> >>>> >>>> >>>> >>>> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives:http://news.gmane.org/gmane.science.biology.inf ormatics.conductor >> [[alternative HTML version deleted]]
Normalization GO Normalization GO • 1.0k views
ADD COMMENT

Login before adding your answer.

Traffic: 694 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6