Master thesis in Information technology: Enhancing the quality of machine translation system using cross lingual word embedding models

The purpose of this thesis is to propose two models for using cross-lingual word embedding models to address the above impediment. The first model enhances the quality of the phrase-table in SMT, and the remaining model tackles the unknown word problem in NMT. | Master thesis in Information technology Enhancing the quality of machine translation system using cross lingual word embedding models Enhancing the quality of Machine Translation System Using Cross-Lingual Word Embedding Models Nguyen Minh Thuan Faculty of Information Technology University of Engineering and Technology Vietnam National University Hanoi Supervised by Associate Professor. Nguyen Phuong Thai A thesis submitted in fulfillment of the requirements for the degree of Master of Science in Computer Science November 2018 2 ORIGINALITY STATEMENT I hereby declare that this submission is my own work and to the best of my knowledge it contains no materials previously published or written by another person or substan- tial proportions of material which have been accepted for the award of any other degree or diploma at University of Engineering and Technology UET Coltech or any other educational institution except where due acknowledgement is made in the thesis. Any contribution made to the research by others with whom I have worked at UET Coltech or elsewhere is explicitly acknowledged in the thesis. I also declare that the intellectual content of this thesis is the product of my own work except to the extent that assistance from others in the project s design and conception or in style presentation and linguistic expression is acknowledged. Hanoi November 15th 2018 Signed . i ii ABSTRACT In recent years Machine Translation has shown promising results and received much interest of researchers. Two approaches that have been widely used for machine trans- lation are Phrase-based Statistical Machine Translation PBSMT and Neural Ma- chine Translation NMT . During translation both approaches rely heavily on large amounts of bilingual corpora which require much effort and financial support. The lack of bilingual data leads to a poor phrase-table which is one of the main compo- nents of PBSMT and the unknown word problem in NMT. In contrast monolingual data are .

Không thể tạo bản xem trước, hãy bấm tải xuống
TÀI LIỆU LIÊN QUAN
TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.