Integrating image features with convolutional sequence to sequence network for multilingual visual question answering

Visual question answering is a task that requires computers to give correct answers for the input questions based on the images. This task can be solved by humans with ease, but it is a challenge for computers. The VLSP2022-EVJVQA shared task carries the Visual question answering task in the multilingual domain on a newly released dataset UIT-EVJVQA, in which the questions and answers are written in three different languages: English, Vietnamese, and Japanese. | Journal of Computer Science and Cybernetics 2024 1- DOI no 1813-9663 18155 INTEGRATING IMAGE FEATURES WITH CONVOLUTIONAL SEQUENCE-TO-SEQUENCE NETWORK FOR MULTILINGUAL VISUAL QUESTION ANSWERING TRIET M. THAI SON T. LUU University of Information Technology Ho Chi Minh City Viet Nam Vietnam National University Ho Chi Minh City Viet Nam Abstract. Visual question answering is a task that requires computers to give correct answers for the input questions based on the images. This task can be solved by humans with ease but it is a challenge for computers. The VLSP2022-EVJVQA shared task carries the Visual question answering task in the multilingual domain on a newly released dataset UIT-EVJVQA in which the questions and answers are written in three different languages English Vietnamese and Japanese. We approached the challenge as a sequence-to-sequence learning task in which we integrated hints from pre-trained state-of-the-art VQA models and image features with a convolutional sequence-to-sequence network to generate the desired answers. Our results obtained up to by F1 score on the public test set and on the private test set. Keywords. Visual question answering Sequence-to-sequence learning Multilingual Multimodal. Abbreviations QA Question answering VQA Visual question answering VLSP Association for Vietnamese language and speech processing Seq2Seq Sequence-to-sequence ViT Vision transformer SOTA State-of-the-art GRU Gated recurrent unit GLU Gate linear unit LSTM Long short-term memory RNN Recurrent neural network API Application programming interface ConvS2S Convolutional sequence-to-sequence network Bi-RNN Bi-directional recurrent neural networks ConvS2S Convolutional sequence-to-sequence network BERT Bidirectional encoder representations from transformers Corresponding author. E-mail addresses 19522397@ sonlt@ . Luu . 2024 Vietnam Academy of Science amp Technology 2 TRIET M. THAI SON T. LUU 1. .

Không thể tạo bản xem trước, hãy bấm tải xuống
TÀI LIỆU MỚI ĐĂNG
77    140    3    01-06-2024
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.