This paper describes the development of a rule-based computational model that describes how a feature-based representation of shared visual information combines with linguistic cues to enable effective reference resolution. This work explores a language-only model, a visualonly model, and an integrated model of reference resolution and applies them to a corpus of transcribed task-oriented spoken dialogues. Preliminary results from a corpus-based analysis suggest that integrating information from a shared visual environment can improve the performance and quality of existing discoursebased models of reference resolution. .