Data Mining: Classification and Prediction presents about Classification with decision trees; Artificial Neural Networks; Algorithm for decision tree induction; Attribute Selection Measure; Extracting Classification Rules from Trees . | Data Mining: Classification and Prediction Duong Tuan Anh HCMC University of Technology July 2011 Outline 1. Classification with decision trees 2. Artificial Neural Networks 1. CLASSIFICATION WITH DECISION TREES Classification is the process of learning a model that describes different classes of data. The classes are predetermined. Example: In a banking application, customers who apply for a credit card may be classify as a “good risk”, a “fair risk” or a “poor risk”. Hence, this type of activity is also called supervised learning. Once the model is built, then it can be used to classify new data. The first step, of learning the model, is accomplished by using a training set of data that has already been classified. Each record in the training data contains an attribute, called the class label, that indicates which class the record belongs to. The model that is produced is usually in the form of a decision tree or a set of rules. Some of the important issues with . | Data Mining: Classification and Prediction Duong Tuan Anh HCMC University of Technology July 2011 Outline 1. Classification with decision trees 2. Artificial Neural Networks 1. CLASSIFICATION WITH DECISION TREES Classification is the process of learning a model that describes different classes of data. The classes are predetermined. Example: In a banking application, customers who apply for a credit card may be classify as a “good risk”, a “fair risk” or a “poor risk”. Hence, this type of activity is also called supervised learning. Once the model is built, then it can be used to classify new data. The first step, of learning the model, is accomplished by using a training set of data that has already been classified. Each record in the training data contains an attribute, called the class label, that indicates which class the record belongs to. The model that is produced is usually in the form of a decision tree or a set of rules. Some of the important issues with regard to the model and the algorithm that produces the model include: the model’s ability to predict the correct class of the new data, the computational cost associated with the algorithm the scalability of the algorithm. Let examine the approach where the model is in the form of a decision tree. A decision tree is simply a graphical representation of the description of each class or in other words, a representation of the classification rules. Example Example : Suppose that we have a database of customers on the AllEletronics mailing list. The database describes attributes of the customers, such as their name, age, income, occupation, and credit rating. The customers can be classified as to whether or not they have purchased a computer at AllElectronics. Suppose that new customers are added to the database and that you would like to notify these customers of an upcoming computer sale. To send out promotional literature to every new customers in the database can be quite .