Data Mining and Knowledge Discovery Handbook, 2 Edition part 91. Knowledge Discovery demonstrates intelligent computing at its best, and is the most desirable and interesting end-product of Information Technology. To be able to discover and to extract knowledge from data is a task that many researchers and practitioners are endeavoring to accomplish. There is a lot of hidden knowledge waiting to be discovered – this is the challenge created by today’s abundance of data. Data Mining and Knowledge Discovery Handbook, 2nd Edition organizes the most current concepts, theories, standards, methodologies, trends, challenges and applications of data mining (DM) and knowledge discovery. | 880 Nitesh V. Chawla Cleaning Rule NCL to remove the majority class examples. The author computes three nearest neighbors for each of the Ei examples in the training set. If Ei belongs to the majority class and it is misclassified by its three nearest neighbors then Ei is removed. If Ei belongs to the minority class and it is misclassified by its three nearest neighbors then the majority class examples among the three nearest neighbors are removed. This approach can reach a computational bottleneck for very large datasets with a large majority class. Japkowicz 2000a discussed the effect of imbalance in a dataset. She evaluated three strategies under-sampling resampling and a recognition-based induction scheme. She considered two sampling methods for both over and undersampling. Random resampling consisted of oversampling the smaller class at random until it consisted of as many samples as the majority class and focused resampling consisted of oversampling only those minority examples that occurred on the boundary between the minority and majority classes. Random under-sampling involved under-sampling the majority class samples at random until their numbers matched the number of minority class samples focused under-sampling involved under-sampling the majority class samples lying further away. She noted that both the sampling approaches were effective and she also observed that using the sophisticated sampling techniques did not give any clear advantage in the domain considered. However her oversampling methodologies did not construct any new examples. Ling and Li 1998 also combined over-sampling of the minority class with undersampling of the majority class. They used lift analysis instead of accuracy to measure a classifier s performance. They proposed that the test examples be ranked by a confidence measure and then lift be used as the evaluation criteria. In one experiment they under-sampled the majority class and noted that the best lift index is obtained when