Data Mining and Knowledge Discovery Handbook, 2 Edition part 9

Data Mining and Knowledge Discovery Handbook, 2 Edition part 9. Knowledge Discovery demonstrates intelligent computing at its best, and is the most desirable and interesting end-product of Information Technology. To be able to discover and to extract knowledge from data is a task that many researchers and practitioners are endeavoring to accomplish. There is a lot of hidden knowledge waiting to be discovered – this is the challenge created by today’s abundance of data. Data Mining and Knowledge Discovery Handbook, 2nd Edition organizes the most current concepts, theories, standards, methodologies, trends, challenges and applications of data mining (DM) and knowledge discovery. | 60 Christopher . Burges d d d d d d - E uaCua - E Pape p C E Paq - E E p a 1 a 1 p 1 q 1 a 1 p 1 Introducing Lagrange multipliers 0 ab to enforce the orthogonality constraints Burges 2004 the objective function becomes d d d d F E E W2p - E Kab E PapPbp - a 1 p 1 a b 1 p 1 Choosing8 rnab oa5ab and taking derivatives with respect to pcq gives E pcq cpcq. Both this and the constraints can be satisfied by choosing pcq 0 Vq c and pcq 8cq otherwise the objective function is then maximized if the first d largest kp are chosen. Note that this also amounts to a proof that the greedy approach to PCA dimensional reduction - solve for a single optimal direction which gives the principal eigenvector as first basis vector then project your data into the subspace orthogonal to that then repeat - also results in the global optimal solution found by solving for all directions at once. The same is true for the directions that maximize the variance. Again note that this argument holds however your data is distributed. PCA Maximizes Mutual Information on Gaussian Data Now consider some proposed set of projections W G Mdq where the rows of W are orthonormal so that the projected data is y Wx y G Rd x G Rd d d. Suppose that x N 0 C . Then since the y s are linear combinations of the x s they are also normally distributed with zero mean and covariance Cy 1 m E yu i 1 m W ETxi-x- W WCW . It s interesting to ask how W can be chosen so that the mutual information between the distribution of the x s and that of the y s is maximized Baldi and Hornik 1995 Diamantaras and Kung 1996 . Since the mapping W is deterministic the conditional entropy H y x vanishes and the mutual information is just I x y H y - H y x H y . Using a small fixed bin size we can approximate this by the differential entropy H y -y p y log2 p y dy 2log2 e 2n d 2 k g2 det cy This is maximized by maximizing det Cy det WCW over choice of W subject to the constraint that the rows of W are orthonormal. The general .

Không thể tạo bản xem trước, hãy bấm tải xuống
TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.