Keyword spotting (KWS) is one of the important systems on speech applications, such as data mining, call routing, call center, customer-controlled smartphone, smart home systems with voice control, etc. With the goals of researching some factors affecting the Vietnamese Keyword spotting system, we study the combination architecture of CNN (Convolutional Neural Networks)-RNN (Recurrent Neural Networks) on both clean and noise environments with 2 distance speaker cases: 1m and 2m. The obtained results show that the noise trained models are better performance than clean trained models in any (clean or noise) testing environment. The results in this far-field experiment suggest to us how to choose the suitable distance of the recording microphones to the speaker so that there is no redundancy of data with the contexts considered to be the same. | An evaluation of some factors affecting accuracy of the Vietnamese keyword spotting system