Home: Vision and Language Collaborative Media Publications CV

Collateral Media

Collateral media are different multimedia artefacts communicating the same idea or concept; for example a picture and a sound can both represent an explosion in different ways, visually and acoustically, or different witnesses can describe verbally the event of a robbery using different words. Similarly, natural language textual descriptions can be collateral to a moving image and represent its content in words. Collateral text addresses higher levels of semantic video content than video processing alone, as language can express more information than colours, shapes, motion etc. Extracting information from collateral text can enhance video indexing, retrieval and browsing. In addition, integrating multiple collateral texts describing the same video data can on the one hand enhance the modeling of a more objective point of view, and on the other hand create the potential for applications such as information fusion and reuse, or training video feature detectors.

To find out more about computing narrative in multimedia systems, you can visit the EPSRC GR/R67194/01 project TIWO: Television in Words

PhD thesis (February 2006): 'Cross-Document Coreference between Different Kinds of Collateral Texts for Films' Download pdf

Keywords: Information Extraction, Cross-Document Coreference, Collateral Text, Narrative, Intelligent Multimedia Information Retrieval, Merging, Film Indexing

Abstract:
Recent systems merge information from texts describing video content for video annotation by employing cross-document coreference techniques, mostly realised between the same text genres or in texts including restricted sets of events. We introduce a new, interesting and challenging scenario - film and the variety of collateral text genres narrating its content, including unrestricted sets of events. In particular, cross-document coreference between plot summaries and audio description is challenging, as these two texts differ significantly. The resulting cross-referencing can potentially enrich video annotation. We address the questions of how plot summaries and audio description refer to events depicted in films, whether the same events are expressed by lexical regularities in both texts and how solutions to the cross-document coreference task can be extended to deal with different text genres and unconstrained sets of events. This thesis introduces a new research domain for information extraction and cross-document coreference, reports a corpus based analysis of the language used in plot summaries and audio description focusing on how events are expressed, proposes and evaluates solutions to the crossdocument coreference task for an unconstrained set of events in different text types and provides two data sets for information extraction related research. We make three claims. First, plot summaries and audio description use lexical regularities, such as frequent open class words occurring more frequently than in general language, to describe film content. Second, these two texts use similar terms in referring to entities, but different terms in referring to events, i.e. different frequent verbs. Frequent plot summary events are referred to by a very few lexical regularities in audio description. Third, the task of cross-document coreference between plot summary and audio description can be automated achieving at least 50% Precision and 33% Recall, by matching nouns, functional roles and some verbs, and taking into account the event temporal aspect. The Recall may be improved mostly by resolving all references to entities, while the Precision may be increased when treating a restricted set of events.