Data Mining and Knowledge Discovery Handbook, 2 Edition part 39

Data Mining and Knowledge Discovery Handbook, 2 Edition part 39. Knowledge Discovery demonstrates intelligent computing at its best, and is the most desirable and interesting end-product of Information Technology. To be able to discover and to extract knowledge from data is a task that many researchers and practitioners are endeavoring to accomplish. There is a lot of hidden knowledge waiting to be discovered – this is the challenge created by today’s abundance of data. Data Mining and Knowledge Discovery Handbook, 2nd Edition organizes the most current concepts, theories, standards, methodologies, trends, challenges and applications of data mining (DM) and knowledge discovery. | 360 Steve Donoho is similar to paper citations in academia. A paper that is cited often is considered to contain important ideas. A paper that is seldom or never cited is considered to be less important. The following paragraphs present two algorithms for incorporating link information into search engines PageRank Page et al. 1998 and Kleinberg s Hubs and Authorities Kleinberg 1999 . The PageRank algorithm takes a set of interconnected pages and calculates a score for each. Intuitively the score for a page is based on how many other pages point to that page and what their scores are. A page that is pointed to by a few other important pages is probably itself important. Similarly a pages that is pointed to by numerous other marginally important pages is probably itself important. But a page that is not pointed to by anything probably isn t important. A more formal definition taken from Page et al. 1998 is Let u be a web page. Then let Fu be the set of pages u points to and Bu be the set of pages that point to u. Let Nu Fu be the number of links from u. Then let E u be an a priori score assigned to u. Then R u the score for u is calculated R u E RNV E u veBu Nv So the score for a page is some constant plus the sum of the scores of its incoming links. Each incoming link has the score of the page it is from divided by the number of outgoing links from that page so a page s score is divided evenly among its outgoing links . The constant E u serves a couple functions. First it counterbalances the effect of sinks in the network. These are pages or groups of pages that are dead ends - they are pointed to but they don t point out to any other pages. E u provides a source of score that counterbalances the sinks in the network. Secondly it provides a method of introducing a priori scores if certain pages are known to be authoritative. The PageRank algorithm can be combined with other techniques to create a search engine. For example PageRank is first used to assign a score to

Không thể tạo bản xem trước, hãy bấm tải xuống
TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.