Data Mining and Knowledge Discovery Handbook, 2 Edition part 22

Data Mining and Knowledge Discovery Handbook, 2 Edition part 22. Knowledge Discovery demonstrates intelligent computing at its best, and is the most desirable and interesting end-product of Information Technology. To be able to discover and to extract knowledge from data is a task that many researchers and practitioners are endeavoring to accomplish. There is a lot of hidden knowledge waiting to be discovered – this is the challenge created by today’s abundance of data. Data Mining and Knowledge Discovery Handbook, 2nd Edition organizes the most current concepts, theories, standards, methodologies, trends, challenges and applications of data mining (DM) and knowledge discovery. | 190 Paola Sebastiani Maria M. Abad and Marco F. Ramoni By repeating this procedure for each case in the database we compute fitted values for each variable Yi and then define the blanket residuals by rik yik y ik for numerical variables and by cik 5 yik y ik for categorical variables where the function 8 a b takes value 5 0 when a b and 5 1 when a b. Lack of significant patterns in the residuals rik and approximate symmetry about 0 will provide evidence in favor of a good fit for the variable Yi while anomalies in the blanket residuals can help to identify weaknesses in the dependency structure that may be due to outliers or leverage points. Significance testing of the goodness of fit can be based on the standardized residuals . - rik ik TWO where the variance V yi is computed from the fitted values. Under the hypothesis that the network fits the data well we would expect to have approximately 95 of the standardized residuals within the limits -2 2 . When the variable Yi is categorical the residuals cik identify the error in reproducing the data and can be summarized to compute the error rate for fit. Because these residuals measure the difference between the observed and fitted values anomalies in the residuals can identify inadequate dependencies in the networks. However residuals that are on average not significantly different from 0 do not necessarily prove that the model is good. A better validation of the network should be done on an independent test set to show that the model induced from one particular data set is reproducible and gives good predictions. Measures of the predictive accuracy can be the monitors based on the logarithmic scoring function Good 1952 . The basic intuition is to measure the degree of surprise in predicting that the variable Yi will take a value yih in the hth case of an independent test set. The measure of surprise is defined by the score Sih - logp yih MB yi h where MB yi h is the configuration of the Markov blanket of Yi in the .

Không thể tạo bản xem trước, hãy bấm tải xuống
TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
13    279    1    21-05-2024
366    86    8    21-05-2024
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.