

![]()
Collateral Media
Collateral media are different multimedia artefacts communicating the same idea or concept; for example a picture and a sound can both represent an explosion in different ways, visually and acoustically, or different witnesses can describe verbally the event of a robbery using different words. Similarly, natural language textual descriptions can be collateral to a moving image and represent its content in words. Collateral text addresses higher levels of semantic video content than video processing alone, as language can express more information than colours, shapes, motion etc. Extracting information from collateral text can enhance video indexing, retrieval and browsing. In addition, integrating multiple collateral texts describing the same video data can on the one hand enhance the modeling of a more objective point of view, and on the other hand create the potential for applications such as information fusion and reuse, or training video feature detectors.
To find out more about computing narrative in multimedia systems, you can visit the EPSRC GR/R67194/01 project TIWO: Television in Words
PhD thesis (February 2006): 'Cross-Document Coreference between Different Kinds of Collateral Texts for Films' ![]()
Keywords: Information Extraction, Cross-Document Coreference, Collateral Text, Narrative, Intelligent Multimedia Information Retrieval, Merging, Film Indexing
Abstract: Recent systems merge information from texts describing video content for video annotation by
employing cross-document coreference techniques, mostly realised between the same text genres
or in texts including restricted sets of events. We introduce a new, interesting and challenging
scenario - film and the variety of collateral text genres narrating its content, including unrestricted
sets of events. In particular, cross-document coreference between plot summaries and audio
description is challenging, as these two texts differ significantly. The resulting cross-referencing
can potentially enrich video annotation. We address the questions of how plot summaries and
audio description refer to events depicted in films, whether the same events are expressed by
lexical regularities in both texts and how solutions to the cross-document coreference task can be
extended to deal with different text genres and unconstrained sets of events.
This thesis introduces a new research domain for information extraction and cross-document
coreference, reports a corpus based analysis of the language used in plot summaries and audio
description focusing on how events are expressed, proposes and evaluates solutions to the crossdocument
coreference task for an unconstrained set of events in different text types and provides
two data sets for information extraction related research. We make three claims. First, plot
summaries and audio description use lexical regularities, such as frequent open class words
occurring more frequently than in general language, to describe film content. Second, these two
texts use similar terms in referring to entities, but different terms in referring to events, i.e.
different frequent verbs. Frequent plot summary events are referred to by a very few lexical
regularities in audio description. Third, the task of cross-document coreference between plot summary and audio description can be automated achieving at least 50% Precision and 33% Recall, by matching nouns, functional roles and some verbs, and taking into account the event temporal aspect. The Recall may be improved mostly by resolving all references to entities, while the Precision may be increased when treating a restricted set of events.