This paper explores the relationship between discourse segmentation and coverbal gesture. Introducing the idea of gestural cohesion, we show that coherent topic segments are characterized by homogeneous gestural forms and that changes in the distribution of gestural features predict segment boundaries. Gestural features are extracted automatically from video, and are combined with lexical features in a Bayesian generative model. The resulting multimodal system outperforms text-only segmentation on both manual and automaticallyrecognized speech transcripts. .