Hi.
I'm using this function to impute missing values in my arrays. After using it, I run a linear model for each gene, then I take t-values from each model and I plot in a boxplot. Some of the outliers in the original data (without imputation) are genes with just two data points. I expected to see those genes be "removed" as outliers after using impute.knn, but I keep seeing the boxplots exactly the same with and without imputation. That's not what I think it should happen, I tough that as those genes borrowed info from similar genes the t-value would lessen.
Some genes with no data point at all I get data after imputation, but I think that is beacause it takes the average of the column.
Just in case it helps, I just normalized and then impute and used K=10 and K=100.
Thanks in advance.
Well. I understand that so, but if no k neighbor are found (because for example a gene has no data at all), the average of the column is used to fill that empty point. I do not understand why the average is also used when out of seven points; a gene has two values and five NA values. Cannot find K neighbors?
Example
After Knn
the average for each column is around 0.08574 as it is normalized, which is close to what you get for some NA values.