Partial instance reduction for noise elimination

Journal Article
, Mona Jamjoom, Khalil El Hindi . 2016
Magazine \ Newspaper: 
Pattern Recognition Letters
Volume Number: 
74
Pages: 
30-37
Publication Abstract: 

Real-world data are usually noisy, causing many machine-learning algorithms to overfit their data. Various Instance Reduction (IR) techniques have been proposed to filter out noisy instances and clean the data. This paper presents Partial Instance Reduction (PIR) or partial outlier elimination techniques. Unlike IR techniques, which eliminate all suspicious instances, PIR techniques partially eliminate a suspicious instance by eliminating some of its attribute values. If this fails to change the status of an instance from outlier to normal, the entire instance is eliminated. The main advantage of partial elimination is that it allows us to retain significant parts of instances, which is particularly useful when the training data is scarce. This work compares PIR and IR techniques using 50 benchmark data sets, both with and without noise. Our empirical results show that PIR techniques significantly outperform the IR techniques on many benchmark datasets. Whereas IR techniques eliminate a large number of instances that are not outliers, PIR techniques manage to save parts of these instances that are useful for classification.