Data Mining and Knowledge Discovery Handbook, 2 Edition part 56. Knowledge Discovery demonstrates intelligent computing at its best, and is the most desirable and interesting end-product of Information Technology. To be able to discover and to extract knowledge from data is a task that many researchers and practitioners are endeavoring to accomplish. There is a lot of hidden knowledge waiting to be discovered – this is the challenge created by today’s abundance of data. Data Mining and Knowledge Discovery Handbook, 2nd Edition organizes the most current concepts, theories, standards, methodologies, trends, challenges and applications of data mining (DM) and knowledge discovery. | 530 Yoav Benjamini and Moshe Leshno X 1 x11 x1k 1 21 x2k . 1 . . . 1 xk1 xMk The estimates of the P s are given in matrix form by p XtXj 1 X Y. Note that in linear regression analysis we assume that for a given xi . xk yi is distributed as N Po k 1 PjXjijG2 . There is a large class of general regression models where the relationship between the yis and the vector x is not assumed to be linear that can be converted to a linear model. Machine learning approach when compared to regression analysis aims to select a function f e F from a given set of functions F that best approximates or fits the given data. Machine learning assumes that the given data xi yij i 1 . M is obtained by a data generator producing the data according to an unknown distribution p x yj p xjp y xj. Given a loss function W y f xjj the quality of an approximation produced by the machine learning is measured by the expected loss the expectation being under the unknown distribution p x yj. The subject of statistical machine learning is the following optimization problem min f eF y W y f xjjdp x yj when the density function p x yj is unknown but a random independent sample of xi yij is given. If F is the set of all linear function of x and W y f xjj y f xjj2 then if p y xj is normally distributed then the minimization of is equivalent to linear regression analysis. Generalized Linear Models Although in many cases the set of linear function is good enough to model the relationship between the stochastic response y as a function of x it may not always suffice to represent the relationship. The generalized linear model increases the family of functions F that may represent the relationship between the response y and x. The tradeoff is between having a simple model and a more complex model representing the relationship between y and x. In the general linear model the distribution of y given x does not have to be normal but can be any of the distributions in the exponential family .