Data Mining and Knowledge Discovery Handbook, 2 Edition part 85

Data Mining and Knowledge Discovery Handbook, 2 Edition part 85. Knowledge Discovery demonstrates intelligent computing at its best, and is the most desirable and interesting end-product of Information Technology. To be able to discover and to extract knowledge from data is a task that many researchers and practitioners are endeavoring to accomplish. There is a lot of hidden knowledge waiting to be discovered – this is the challenge created by today’s abundance of data. Data Mining and Knowledge Discovery Handbook, 2nd Edition organizes the most current concepts, theories, standards, methodologies, trends, challenges and applications of data mining (DM) and knowledge discovery. | 820 Moty Ben-Dov and Ronen Feldman The above are examples of the researches which has been done to implement the HMM for IE tasks. The results we get for IE by using the HMM are good comparing to other techniques but there are few problems in using HMM. The main disadvantage of using an HMM for Information extraction is the need for a large amount of training data the more training data we have the better results we get. To build such training data it a time consuming task. We need to do lot of manually tagging which must to be done by experts of the specific domain we are working with. The second one is that the HMM model is a flat model so the most it can do is assign a tag to each token in a sentence. This is suitable for the tasks where the tagged sequences do not nest and where there are no explicit relations between the sequences. Part-of-speech tagging and entity extraction belong to this category and indeed the HMM-based PoS taggers and entity extractors are state-of-the-art. Extracting relationships is different because the tagged sequences can and must nest and there are relations between them which must be explicitly recognized. Stochastic Context-Free Grammars A stochastic context-free grammar SCFG Lari and Young 1990 Collins 1996 Kammeyer and Belew 1996 Keller and Lutz 1997a Keller and Lutz 1997b Osborne and Briscoe 1998 is a quintuple G T N S R P where T is the alphabet of terminal symbols tokens N is the set of nonterminals S is the starting nonterminal R is the set of rules and P R defines their probabilities. The rules have the form n s1s2. sk where n is a nonterminal and each s either token or another nonterminal. As can be seen SCFG is a usual context-free grammar with the addition of the P function. Similarly to a canonical non-stochastic grammar SCFG is said to generate or accept a given string sequence of tokens if the string can be produced starting from a sequence containing just the starting symbol S and one by one expanding .

Không thể tạo bản xem trước, hãy bấm tải xuống
TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
7    682    3    17-05-2024
1    389    3    17-05-2024
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.