Lecture webcasts are readily available on the Internet. These include class lectures, research seminars and product demonstrations. Webcasts routinely combine presentation slides with either a synchronized audio stream (i.e., podcast) or an audio/video stream. Conventional web search engines retrieve this content if you include "webcast" or "lecture" among your search terms, or search a website that specifically organizes lecture content. But users, particularly students, want to find the place in a lecture when an instructor covers a specific topic. Answering these queries requires a search engine that can search within the webcast to identify important keywords.
TalkMiner's automatic video analysis identifies slide images within each lecture video. Because slide images underlie both our search index and browsing interface, automatic detection is a critical component of the system. The text-based search index organizes the videos according to the words extracted from the presentation slides. When a user issues a query, the recovered text from the entire lecture is used to identify relevant lecture videos. If the user selects a specific talk for detailed review, thumbnail images of all slides within that video are shown, and visual cues are displayed to indicate slides that contain one or more of the search terms. When the user locates a slide of interest, the video segment in which it appears may be directly accessed for playback by clicking on the corresponding slide thumbnail. The interface thus enables efficient content-based search and non-linear playback.
TalkMiner builds its index and interface from commonly recorded video rather than using a custom designed lecture-capture system or requiring careful post-capture authoring. Moreover, the system does not impose onerous constraints on the captured video. TalkMiner's slide detection algorithm was developed to handle common lecture webcast video production techniques. Such production techniques include shooting a slide screen from the back of the room, picture-in-picture compositing, and multiple camera productions that intersperse full-frame shots of slides with shots of the speaker. Each of these cases present challenges to assembling a set of distinct and useful slide keyframes for browsing an individual lecture video. Additionally, we developed processing for slides with built up content that are revealed gradually (e.g., bullet lists). TalkMiner's specialized content analysis allows it to scale to include a greater volume and variety of content. At the same time, its minimal storage and processing requirements enable it to do so at a much lower cost than would otherwise be possible.
Technical details appear in our 2010 ACM Multimedia paper or check out the FAQ.
MIT Technology Review: Video Content Search Gets a Boost
CBCnews: Wanted: Better ways to search online video
Review and description of TalkMiner at MakeUseOf.com: