Can I have your code? I tested the function below, but it seems fine:
In below test, myLoad$beta is the 450K dataset included in ChAMP. I used it as an example, we can see that there is no NA in it, so I manually added 2000 NA value in it, then used champ.Impute() function to remove it. Finally, the result seems correct. There is no NA in final myImpute$beta.
> TestData <- myLoad$beta
> sum(is.na(TestData))
[1] 0
> TestData[sample(1:length(TestData),2000)] <- NA
> sum(is.na(TestData))
[1] 2000
> myImpute <- champ.impute(TestData)
[===========================]
[<<< ChAMP.IMPUTE START >>>>]
-----------------------------
<method>:Combine. (Suitable for Large Data Set)
(1): 2 Probes contain 0.2 or above NA will be removed.
(2): 0 Samples contain 0.1 or above NA will be removed.
Cluster size 404381 broken into 187865 216516
Cluster size 187865 broken into 136737 51128
Cluster size 136737 broken into 35195 101542
Cluster size 35195 broken into 30533 4662
Cluster size 30533 broken into 3146 27387
Cluster size 3146 broken into 1407 1739
Done cluster 1407
Cluster size 1739 broken into 988 751
Done cluster 988
Done cluster 751
...
> sum(is.na(myImpute$beta))
[1] 0
Thank you for your reply!
My code is:
champ.impute(beta=myLoad_Combine$beta,
pd=myLoad_Combine$pd,
SampleCutoff=0.5)
> traceback()
7: cat("Cluster size", p, "broken into", size, "\n")
6: knnimp.split(x, k, imiss, irmiss, p, n, maxp = maxp)
5: knnimp.internal(x[index, ], k, imiss[index, ], irmiss[index],
p, n, maxp)
4: knnimp.split(x, k, imiss, irmiss, p, n, maxp = maxp)
3: knnimp(x, k, maxmiss = rowmax, maxp = maxp)
2: impute.knn(beta)
1: champ.impute(beta = myLoad_Combine$beta, pd = myLoad_Combine$pd,
SampleCutoff = 0.5)
> sumis.na(myLoad_Combine$beta))
[1] 1830898
And yes, I know I have a high number of NAs...
Wait a minute, you write results to myImpute? Wow.. I feel stupid now. Of course I need to assign a target.
I just recieved an output with 0 NAs. Thank you for your help, as always!