Their ratings before and after the training session on the speaking rating scales were then compared. Particularly, dimensions of score reliability, criterion difficulty, rater severity, rater fit, rater bias, and score band separation were analyzed. Positive results were detected when the post-training ratings were shown to be more reliable, consistent, and distinguishable. Improvements were more noticeable for the score band separation and slighter in other aspects. Meaningful implications in terms of both future practices of rater training and rater training research methodology could be drawn from the study. | VNU Journal of Foreign Studies 2020 99-112 99 THE EFFECTIVENESS OF SPEAKING RATER TRAINING Nguyen Thi Ngoc Quynh Nguyen Thi Quynh Yen Tran Thi Thu Hien Nguyen Thi Phuong Thao Bui Thien Sao Nguyen Thi Chi Nguyen Quynh Hoa VNU University of Languages and International Studies Pham Van Dong Cau Giay Hanoi Vietnam Received 09 May 2020 Revised 10 July 2020 Accepted 15 July 2020 Abstract Playing a vital role in assuring reliability of language performance assessment rater training has been a topic of interest in research on large-scale testing. Similarly in the context of VSTEP the effectiveness of the rater training program has been of great concern. Thus this research was conducted to investigate the impact of the VSTEP speaking rating scale training session in the rater training program provided by University of Languages and International Studies - Vietnam National University Hanoi. Data were collected from 37 rater trainees of the program. Their ratings before and after the training session on the speaking rating scales were then compared. Particularly dimensions of score reliability criterion difficulty rater severity rater fit rater bias and score band separation were analyzed. Positive results were detected when the post-training ratings were shown to be more reliable consistent and distinguishable. Improvements were more noticeable for the score band separation and slighter in other aspects. Meaningful implications in terms of both future practices of rater training and rater training research methodology could be drawn from the study. Keywords rater training speaking rating speaking assessment VSTEP G theory many-facet Rasch 1. Introduction 1 language assessment were also framed into Rater training has been widely recognized four main approaches namely rater error as a way to assure the score reliability in training RET performance dimension language performance assessment especially training PDT frame-of-reference training in .