Adventures in Statistical Pronoun Interpretation
Joint work with Doug Appelt, Lara Taylor, and Alex Simma
Thursday May 5, 2005, EE/CSci 6-212
State-of-the-art pronoun interpretation systems rely predominantly on morphosyntactic contextual features, in which the feature weights are determined by either manual tuning or supervised learning. This talk will address two issues pertaining to such systems. The first part of the talk addresses the common refrain that the performance of morphosyntactically-driven systems is plateauing, and that further progress will require the use of world knowledge and inference. Since a suitably broad inferential capability is currently out of reach, it has been suggested that predicate-argument statistics mined from naturally-occurring data could provide a useful approximation. We test this idea in several system configurations, and conclude from our results and subsequent error analysis that such statistics offer only modest predictive information above that provided by morphosyntax. The second part of the talk addresses the fact that reliable estimation of the weights in both the manual tuning and supervised learning paradigms requires a substantial manually-annotated corpus of examples. I will describe a system for pronoun interpretation that is entirely self-trained from raw data, that is, using no annotated training data. The result outperforms a Hobbsian baseline algorithm and is only marginally inferior to an essentially identical, state-of-the-art supervised model trained from a suitably-sized manually-annotated coreference corpus. This result suggests that self-trained systems using very specific feature sets over very large corpora could eventually outperform the best supervised systems, since the latter are constrained to general feature sets due to the inherent limitations on training data size. I briefly survey ongoing work in this regard.
Dr. Kehler is an Associate Porfessor in the Department of Linguistics at UCSD. More information about his publications and research can be found at ling.ucsd.edu/~kehler/