<<  The achievements of the scientific school of Professor Yuri Kuznetsov in the machine tool industry Metaphorical Imagery:- Utopian and Dystopian faces of the everyday  >>
Video Indexing and Summarization using Combinations of the MPEG-7
Video Indexing and Summarization using Combinations of the MPEG-7
Outline
Outline
Team
Team
MPEG-7 Objectives
MPEG-7 Objectives
MPEG-7: Key Technologies and Scope
MPEG-7: Key Technologies and Scope
MPEG-7 and other Standards
MPEG-7 and other Standards
MPEG-7 framework
MPEG-7 framework
MPEG-7 Motion Activity Descriptor
MPEG-7 Motion Activity Descriptor
Motivation for Compressed Domain Extraction
Motivation for Compressed Domain Extraction
Motivation for Descriptor
Motivation for Descriptor
Proposed Motion Activity Descriptor
Proposed Motion Activity Descriptor
MPEG-7 Intensity of Motion Activity
MPEG-7 Intensity of Motion Activity
Video Summarization using Motion Activity
Video Summarization using Motion Activity
Motion Activity as Summarizability
Motion Activity as Summarizability
Fidelity of a Summary
Fidelity of a Summary
Test of Hypothesis
Test of Hypothesis
Motion Activity as a Measure of Summarizability
Motion Activity as a Measure of Summarizability
Conclusions from Experiment
Conclusions from Experiment
Optimal Key-Frame Selection Using Motion Activity
Optimal Key-Frame Selection Using Motion Activity
Optimal Key-Frame Extraction Using Motion Activity
Optimal Key-Frame Extraction Using Motion Activity
Comparison with Opt
Comparison with Opt
Optimal Key-Frame Selection Based on Cumulative Motion Activity
Optimal Key-Frame Selection Based on Cumulative Motion Activity
Audio Assisted Video Browsing: Motivation
Audio Assisted Video Browsing: Motivation
Past Work
Past Work
MPEG-7 Feature Extraction for Generalized Sound Recognition
MPEG-7 Feature Extraction for Generalized Sound Recognition
Our approach to Principal Cast Detection
Our approach to Principal Cast Detection
Proposed Audio-Assisted Video Browsing Framework
Proposed Audio-Assisted Video Browsing Framework
Audio-Assisted Video Browsing Framework
Audio-Assisted Video Browsing Framework
MHL application of Caseys approach to News Video Browsing
MHL application of Caseys approach to News Video Browsing
Clustering Results for Male Principal Cast
Clustering Results for Male Principal Cast
Results and Challenges
Results and Challenges
Automatic Extraction of Sports Highlights
Automatic Extraction of Sports Highlights
Essential Strategy
Essential Strategy
Motion Activity Curve
Motion Activity Curve
Activity Curves for Golf
Activity Curves for Golf
Activity Curve for Soccer
Activity Curve for Soccer
Highlights extraction : Golf
Highlights extraction : Golf
Highlights Extraction: Soccer
Highlights Extraction: Soccer
Strengths and Limitations of Our Approach
Strengths and Limitations of Our Approach
Current Approach to Extraction of Soccer Highlights
Current Approach to Extraction of Soccer Highlights
41
41
Summary of Sports Highlights Generation
Summary of Sports Highlights Generation
PVR: Personal Video Recorder
PVR: Personal Video Recorder
Blind Summarization  A Video Mining Approach to Video Summarization
Blind Summarization A Video Mining Approach to Video Summarization
Content Mining
Content Mining
Content Mining vs
Content Mining vs
Summarization cast as a Content Mining Problem
Summarization cast as a Content Mining Problem
Data Mining Basics
Data Mining Basics
Associations
Associations
Some Basic Aspects
Some Basic Aspects
Current Direction
Current Direction
Previously observed pattern: Extended segments of very low activity
Previously observed pattern: Extended segments of very low activity
Time sequence mining
Time sequence mining
Mining for Temporal Patterns
Mining for Temporal Patterns
Point Distance Matrix
Point Distance Matrix
Point Distance Matrix
Point Distance Matrix
Advantages of Using Point Distance Matrix
Advantages of Using Point Distance Matrix
Multi-resolution Pattern Discovery
Multi-resolution Pattern Discovery
Illustration: Segmenting Haiden Video
Illustration: Segmenting Haiden Video
Other Issues
Other Issues
Block-diagram of time-series mining
Block-diagram of time-series mining
Target Applications
Target Applications
Future Extension - Model Based Matching
Future Extension - Model Based Matching
Conclusion
Conclusion

: Video Indexing and Summarization using Combinations of the MPEG-7 Motion Activity Descriptor with other MPEG-7 audio-visual descriptors. : Anthony Vetro. : Video Indexing and Summarization using Combinations of the MPEG-7 Motion Activity Descriptor with other MPEG-7 audio-visual descriptors.ppt. zip-: 789 .

Video Indexing and Summarization using Combinations of the MPEG-7 Motion Activity Descriptor with other MPEG-7 audio-visual descriptors

Video Indexing and Summarization using Combinations of the MPEG-7 Motion Activity Descriptor with other MPEG-7 audio-visual descriptors.ppt
1 Video Indexing and Summarization using Combinations of the MPEG-7

Video Indexing and Summarization using Combinations of the MPEG-7

Motion Activity Descriptor with other MPEG-7 audio-visual descriptors

Ajay Divakaran MERL - Mitsubishi Electric Research Labs Murray Hill, NJ

2 Outline

Outline

Introduction MPEG-7 Standard Motivation for proposed techniques Video Summarization using Motion Activity Audio Assisted Video Summarization Principal Cast Detection with MPEG-7 Audio Features Automatic generation of Sports Highlights Target Applications Personal Video Recorder Demonstration Initial work on Video Mining Conclusion

2

3 Team

Team

Yours Truly Kadir A. Peker Colleague and Ex-Doctoral Student Regunathan Radhakrishnan Current Doctoral Student Romain Cabasson Summer Intern Ziyou Xiong Summer Intern and Current Collaborator Padma Akella Initial Demo designer and developer Pradubkiat Bouklee Initial Software developer

3

4 MPEG-7 Objectives

MPEG-7 Objectives

To develop a standard to identify and describe the multimedia content Formal name: Multimedia Content Description Interface Enable quick access to desired content whether local or not

4

5 MPEG-7: Key Technologies and Scope

MPEG-7: Key Technologies and Scope

Description consumption

Description Production

5

6 MPEG-7 and other Standards

MPEG-7 and other Standards

Rate

Functionality

Emphasis on Subjective Representation

Emphasis on Semantic Conveyance

MPEG-2 Studio, DTV

Hybrid Content Interactive TV, Video Conferencing

Indexing Retrieving Browsing

MPEG-4 SNHC Object-Based

MPEG-7 Descriptors

MPEG-1 H.263

JPEG JPEG-2000

Visualization

Abstract Representation Virtual Reality

6

7 MPEG-7 framework

MPEG-7 framework

MPEG-7 standardizes: Descriptors (Ds): representations of features to describe various types of features of multimedia information to define the syntax and the semantics of each feature representation Description Schemes (DSs) to specify pre-defined structures and semantics of descriptors and their relationship Description Definition Language (DDL) to allow the creation of new DSs and, possibly, Ds and to allows the extension and modification of existing DSs XML MPEG-7 Schema

7

8 MPEG-7 Motion Activity Descriptor

MPEG-7 Motion Activity Descriptor

Feature Extraction from Video Uncompressed Domain Color Histograms - Zhang et al Motion Estimation - Kanade et al Compressed Domain DC Images - Yeo et al, Kobla et al Motion Vector Based - Zhang et al Bit Allocation - Feng et al, Divakaran et al

8

9 Motivation for Compressed Domain Extraction

Motivation for Compressed Domain Extraction

Compressed domain feature extraction is fast. Block-matched motion vectors are sufficient for gross description. Motion vector based calculation can be easily normalized w.r.t. encoding parameters.

9

10 Motivation for Descriptor

Motivation for Descriptor

Need to capture pace or Intensity of activity For example, draw distinction between High Action segments such as chase scenes. Low Action segments such as talking heads Emphasize simple extraction and matching Use Gross Motion Characteristics thus avoiding object segmentation, tracking etc. Compressed domain extraction is important

10

11 Proposed Motion Activity Descriptor

Proposed Motion Activity Descriptor

Attributes of Motion Activity Descriptor Intensity/Magnitude - 3 bits Spatial Characteristics - 16 bits Temporal Characteristics - 30 bits Directional Characteristics - 3 bits

12 MPEG-7 Intensity of Motion Activity

MPEG-7 Intensity of Motion Activity

Expresses pace or Intensity of Action Uses scale of 1-5, very low - low - medium - high - very high Extracted by suitably quantizing variance of motion vector magnitude Motion Vectors extracted from compressed bitstream Successfully tested with subjectively constructed Ground Truth

12

13 Video Summarization using Motion Activity

Video Summarization using Motion Activity

Video sequence V:{f1, f2, fN} set of temporally ordered frames Any temporally ordered subset of V is a summary Previous work: Color dominant Cluster frames based on image similarity Select representative frames from clusters

13

14 Motion Activity as Summarizability

Motion Activity as Summarizability

Hypothesis: Motion activity measures intensity of motion hence it measures change in the video Therefore it indicates Summarizability Test of the Hypothesis Examine relationship between Fidelity of Summary and motion activity Results show close correlation and motivate novel summarization strategy

14

15 Fidelity of a Summary

Fidelity of a Summary

15

16 Test of Hypothesis

Test of Hypothesis

Segment the test sequence into shots Use the first frame of each shot as its Key-Frame (KF) Compute the fidelity of each key-frame as described Compute the motion activity of each shot For each MPEG-7 motion activity threshold Identify shots that have the same or lower motion activity Find the percentage p of shots with unacceptable fidelity (>0.2) Plot p vs the MPEG-7 motion activity thresholds

16

17 Motion Activity as a Measure of Summarizability

Motion Activity as a Measure of Summarizability

17

18 Conclusions from Experiment

Conclusions from Experiment

The percentage of shots with unacceptable fidelity grows monotonically with motion activity In other words, as motion activity grows, the shots become increasingly difficult to summarize Hence, motion activity is a direct indicator of summarizability Question: Is the first frame the best choice as a key-frame?

18

19 Optimal Key-Frame Selection Using Motion Activity

Optimal Key-Frame Selection Using Motion Activity

Summarizability is an indication of change in the shot The cumulative motion activity is therefore an indication of the cumulative change in the shot

19

20 Optimal Key-Frame Extraction Using Motion Activity

Optimal Key-Frame Extraction Using Motion Activity

20

21 Comparison with Opt

Comparison with Opt

Fidelity KF

Mot. Activity

Ddsh First Frame

Ddsh proposed KF

Number of Shots

Very Low

0.0116

0.0080

25

Low

0.0197

0.0110

133

Medium

0.0406

0.0316

73

High

0.0950

0.0576

28

Very High

Overall avg.

0.0430

0.0216

21

22 Optimal Key-Frame Selection Based on Cumulative Motion Activity

Optimal Key-Frame Selection Based on Cumulative Motion Activity

22

23 Audio Assisted Video Browsing: Motivation

Audio Assisted Video Browsing: Motivation

Baseline MHL visual summarization works well only when semantic segment boundaries are well defined Semantic segment boundaries cannot be located easily using visual features alone Audio is a rich source of content semantics Should use audio features to locate semantic segment boundaries

23

24 Past Work

Past Work

Principal Cast Identification using Audio Wang et al Topic Detection using Speech Recog. Hanjalic etc Semantic Scene Segmentation using Audio Sundaram et al Past work has emphasized classification of audio into crisp categories We would like both a crisp categorization and a feature vector that allows softer classification Generalized Sound Recognition Framework Casey et al Caseys work provides a rich audio-semantic framework for our research

24

25 MPEG-7 Feature Extraction for Generalized Sound Recognition

MPEG-7 Feature Extraction for Generalized Sound Recognition

25

26 Our approach to Principal Cast Detection

Our approach to Principal Cast Detection

MPEG-7 Generalized Sound Recognition

State Duration Histograms

Our Enhancement

Principal Cast

26

27 Proposed Audio-Assisted Video Browsing Framework

Proposed Audio-Assisted Video Browsing Framework

27

28 Audio-Assisted Video Browsing Framework

Audio-Assisted Video Browsing Framework

28

29 MHL application of Caseys approach to News Video Browsing

MHL application of Caseys approach to News Video Browsing

Classify the audio segments of the news video into speech and non-speech categories in first pass Classify the speech segments into male and female speech Using K-means clustering find the principal speakers in each category The occurrence of each of the principal speakers provides a natural semantic boundary Apply baseline visual summarization technique to semantic segments obtained above There is thus a two-level summarization of the news video

29

30 Clustering Results for Male Principal Cast

Clustering Results for Male Principal Cast

30

31 Results and Challenges

Results and Challenges

Moderate accuracy so far. Results are thus promising but not satisfactory Lack of noise robustness and content dependence of training process represent major hurdle Currently working on eliminating such problems through extensive training Feature extraction too complex currently investigating compressed domain audio feature extraction Also examining alternative architectures that preserve basic spirit of framework

31

32 Automatic Extraction of Sports Highlights

Automatic Extraction of Sports Highlights

Rapid Sports Highlights extraction is critical Past work has made use of color, camera motion etc. MPEG-7 Motion Activity Descriptor is simple Can use it to extract high action segments for example Should be useful in highlight extraction

32

33 Essential Strategy

Essential Strategy

Sports are governed by a set of rules Key events lead to surges and dips in motion activity (perceived motion) Thus, for a given sport, we can look for certain temporal patterns of motion activity that would indicate an interesting event In sports highlights, the emphasis is on key-events and not on key-frames

33

34 Motion Activity Curve

Motion Activity Curve

Shot Detection not meaningful for our purpose Compute motion activity (avg. mag. Of mvs) for each P-frame Smooth the values using a 10 point MA filter followed by a median filter Quantize into binary levels of high and low motion using threshold Low threshold for Golf, High for Soccer

34

35 Activity Curves for Golf

Activity Curves for Golf

35

36 Activity Curve for Soccer

Activity Curve for Soccer

36

37 Highlights extraction : Golf

Highlights extraction : Golf

Play consists of long stretches of low activity interspersed with bursts of interesting high activity Look for rising edges in the quantized motion activity curve Concatenate ten second segments beginning at each of the points of interest marked above The concatenation forms the desired summary

37

38 Highlights Extraction: Soccer

Highlights Extraction: Soccer

Play consists of long stretches of high activity Interesting events lead to non-trivial stops in play leading to a short stretch of low MA Thus we look for falling edges followed by a non-trivially long stretch of low motion activity We are able to find the interesting events this way but have many false alarms With our interface false alarms are easy to skip

38

39 Strengths and Limitations of Our Approach

Strengths and Limitations of Our Approach

The extraction is rapid and can be done in real time We use an adaptively computed threshold that is suited to the content An interface such as ours helps skip false alarms easily There are too many false alarms

39

40 Current Approach to Extraction of Soccer Highlights

Current Approach to Extraction of Soccer Highlights

40

41 41

41

42 Summary of Sports Highlights Generation

Summary of Sports Highlights Generation

Motion Activity provides a quick way to generate sports highlights We use a different strategy with each sport The simplicity of the technique allows real-time tuning of thresholds to modify highlights Interactive interfaces enable effective use

42

43 PVR: Personal Video Recorder

PVR: Personal Video Recorder

With Massive Amounts of Locally Stored Content, Need to Locate & Customize Content According to User

Local Storage

Feature Extraction & MPEG-7 Indexing

Video Codec

Browsing & Summarization

Enhanced User Interface

43

44 Blind Summarization  A Video Mining Approach to Video Summarization

Blind Summarization A Video Mining Approach to Video Summarization

Ajay Divakaran and Kadir A. Peker Mitsubishi Electric Research Laboratories Murray Hill, NJ

45 Content Mining

Content Mining

What is Data Mining? It is the discovery of patterns and relationships in data. Makes heavy use of statistical learning techniques such as regression and classification Has been successfully applied to numerical data Application to multimedia content is the next logical step Most applicable to stored surveillance video and home video since patterns are not known a priori Should enable anomalous event detection leading to highlight generation Not applicable at first glance to consumer video

45

46 Content Mining vs

Content Mining vs

Typical Data Mining

Commonalities Large data sets. Video is well known to produce huge volumes of data Amenable to statistical analysis Many of the machine learning tools work well with both kinds of data as can be seen in the literature and our research as well Differences Number of features not necessarily as large as conventional data mining data sets Size of dataset not necessarily as large as conventional data mining data sets Popular data mining techniques such as CART may not be directly applicable and may need modification In summary, new mining techniques that retain the basic philosophy while customizing the details will have to be developed

46

47 Summarization cast as a Content Mining Problem

Summarization cast as a Content Mining Problem

DVD Auto-Summarization mode inspires blind Summarization Content Summarization can be cast as follows: Classify segments into common and uncommon events without necessarily knowing the domain Common patterns what this video is about Rare patterns possibly interesting events May help to categorize video, detect style... The Summary is then a combination of common and rare events Can hybridize with domain-dependent techniques

47

48 Data Mining Basics

Data Mining Basics

Associations Time series similarity Sequential patterns Clustering How does region A and B differ, Any anomaly in A, What goes with item x Marketing, molecular biology, etc.

48

49 Associations

Associations

A set of items i1..im; a set of transactions containing subset of items; a database of transactions: Rule X ? Y (X, Y items) : Support s: s% of transactions have X,Y together Confidence c: c% of the time buying X implies buying Y Improvement: Ratio of P(X,Y) to P(X)*P(Y) Find all rules with support, confidence and improvement larger than specified thresholds. Continuous-valued extension exists

49

50 Some Basic Aspects

Some Basic Aspects

Unsupervised learning Similar to clustering vs. classification Estimation of joint probability density Find values of (i1,i2,,in) where P(i1, i2,,in) is high

50

51 Current Direction

Current Direction

As a starting point, try to discover the temporal patterns we used in detecting golf highlights Then generalize to patterns across multiple features Associations between changes, e.g. activity level change, speaker change, scene change, etc.

51

52 Previously observed pattern: Extended segments of very low activity

Previously observed pattern: Extended segments of very low activity

followed by a jump in activity. Corresponds to a player preparing for a swing, then hitting the ball and the camera following the ball.

52

53 Time sequence mining

Time sequence mining

Find all similar sub-sequences in a given time sequence E.g. motion activity of a video sequence Previous work mostly query of a given sub-sequence in a larger sequence

53

54 Mining for Temporal Patterns

Mining for Temporal Patterns

Given a sequence S(i) and window size w, construct the set of all subsequences of size w: S(1:w), S(2:w+1), , S(N-w+1:N) Find the cross-distances between each pair and cluster Problem: How can we search for similar sub-sequences for different window sizes?

54

55 Point Distance Matrix

Point Distance Matrix

Let the distance between two sub-sequences of size w be: The distance between two points is: Then

55

56 Point Distance Matrix

Point Distance Matrix

xi-xi+w

xj-xj+w

56

57 Advantages of Using Point Distance Matrix

Advantages of Using Point Distance Matrix

Search for diagonal lines of low point-distance Not limited to a given window size, look for the longest possible diagonal line of low point-distance values By allowing non diagonal lines and curves, we can utilize Time Warping Matching of sub-sequences of different lengths

57

58 Multi-resolution Pattern Discovery

Multi-resolution Pattern Discovery

Multi-resolution analysis: Smooth and sub-sample time series (conventional multiscale, e.g. wavelets) Analysis with various window sizes, matching across different window sizes (our method automatically handles this)

58

59 Illustration: Segmenting Haiden Video

Illustration: Segmenting Haiden Video

Repeating temporal patterns

59

60 Other Issues

Other Issues

Clustering segments after finding similarities Extend to other features, multiple dimensions Currently using motion activity only Extend to multi-dimensional feature vectors (e.g. color histogram) Extend to multiple features, multiple modalities (e.g. video + audio) Using a normalized Euclidean distance measure Normalization based on local variance of data

60

61 Block-diagram of time-series mining

Block-diagram of time-series mining

61

62 Target Applications

Target Applications

Surveillance Video Can detect unusual events through video mining in stored video Home Video Can use event detection and other pattern discovery to manage home video Entertainment Quality Video Blind Summarization Genre Independent yet event-aware processing Content Management for Large Video Databases All of the above at a very large scale

62

63 Future Extension - Model Based Matching

Future Extension - Model Based Matching

Use more sophisticated statistical techniques to fuse label streams

63

64 Conclusion

Conclusion

System Features Unique, simple and flexible summarization Integrated Player-Browser Enable rapid and convenient browsing Video Summarization using Motion Activity as Summarizability Audio-based principal cast detection Audio-visual feature based sports highlights extraction Further Possibilities Refine Audio-assisted browsing Incorporate other visual features Video Mining

64

Video Indexing and Summarization using Combinations of the MPEG-7 Motion Activity Descriptor with other MPEG-7 audio-visual descriptors
http://900igr.net/prezentacija/anglijskij-jazyk/video-indexing-and-summarization-using-combinations-of-the-mpeg-7-motion-activity-descriptor-with-other-mpeg-7-audio-visual-descriptors-236379.html
c

29
900igr.net > > > Video Indexing and Summarization using Combinations of the MPEG-7 Motion Activity Descriptor with other MPEG-7 audio-visual descriptors