Inspired by the idea of AVCSR, which has combined visual features with audio features to increase the accuracy in noisy environments, we use AdaBoost algorithm and Kalman filter for the face and lip detectors. Our result shows that the system can detect and track mouth motion in real time. It is a critical point that encourages more researches in the visual tracking and voice recognition fields.