Data Mining and Knowledge Discovery Handbook, 2 Edition part 124

Data Mining and Knowledge Discovery Handbook, 2 Edition part 124. Knowledge Discovery demonstrates intelligent computing at its best, and is the most desirable and interesting end-product of Information Technology. To be able to discover and to extract knowledge from data is a task that many researchers and practitioners are endeavoring to accomplish. There is a lot of hidden knowledge waiting to be discovered – this is the challenge created by today’s abundance of data. Data Mining and Knowledge Discovery Handbook, 2nd Edition organizes the most current concepts, theories, standards, methodologies, trends, challenges and applications of data mining (DM) and knowledge discovery. | 1210 Nissan Levin and Jacob Zahavi U A random disturbance Assuming all other factors are equal one can check whether a variable say Xk is significant by testing the hypothesis. Ho Pk 0 H1 Pk 0 The test statistics for testing the hypothesis is given by t Pk s Pk Where Pk the coefficient estimate of Pk s pk the standard error of the coefficient estimate In small samples the test statistics t is distributed as the t student distribution with n J 1 degrees of freedom. In Data Mining applications where the sample size is very large often containing as many as several hundred observations or more the t -distribution may be approximated by the normal distribution. Given the test statistics and its sampling distribution one calculates the minimum probability level to reject Ho where it is true P value P Value 2P T s fa And if the resulting P value is smaller than or equal to a predefined level of significance often denoted by a one rejects Ho otherwise one does not reject Ho. The level of significance a is the upper bound on the probability of Type-I error rejecting Ho when true . It is the proportion of times that we reject Ho when true out of all possible samples of size n drawn from the population. In fact the P-value is just one realization of this phenomenon. It is the actual Type-I error probability for the given sample statistics. Now suppose that Xk is an insignificant variable having no relation whatsoever to the dependent variable Y . the correlation coefficient between Xk and Y is zero . Then if we build the regression model based on a sample of observations there is a probability of a that Xk will turn out significant just by pure chance thus making it into the model and resulting in Type-I error in contradiction to the fact that Xk and Y are not correlated. Extending the analysis to the case of multiple insignificant predictors even a small Type-I error may result in several of those variables making it into the model as significant. Taking this to the .

Không thể tạo bản xem trước, hãy bấm tải xuống
TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.