Semantic Similarity Measurement: An Approach for Video Entities using IMDB
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Semantic Similarity Measurement: An Approach for Video Entities using IMDB Yunfei Yu Zhao Lu Department of Computer Science and Technology Department of Computer Science and Technology East China Normal University East China Normal University Shanghai, China Shanghai, China hijacker1987@gmail.com zlu@cs.ecnu.edu.cn Abstract—There have been a lot of researches on video high- Generally, semantic extraction for a video can be per- level semantic modeling and analysis through schema and on- formed at three different levels, they are layout, content and tology. However, the lack of research about high-level semantic semantic respectively [2]. Currently, many related researches of video on the individual level makes it hard for us to compare have focused on extracting semantic from layout and content videos. Traditional semantic modeling approaches for videos levels, however there are semantic gaps. Recently, researches are using one ontological model to represent one kind of videos. begin to focus on semantic level of videos and try to mine However concrete videos cannot be described well through this the intended meaning of videos. MPEG-7 provides a stan- approach. In this study, we propose a semantic modeling ap- dardized description for all types of multimedia and this kind proach of videos from individual level and use IMDB (Internet of description is related with the content itself. In order to Movie Database) as video sources. Then we introduce a seman- tic similarity measure between two video entities to evaluate represent semantic relations of videos from various levels, the two video from high-level semantic. Experimental results video semantic models are often built on MPEG-7. show the highly similar coefficients compared with the cogni- There are two issues to be discussed: (1) There are vari- tions of users based on one dataset collected from IMDB. ous kinds of features which can be used to describe the se- mantics of videos and the relations between videos, except Keywords-Video Semantic, Entity Representation/Network, the way of acquiring semantics of videos using a suitable Entity Similarity Measure description scheme for videos. It is important to model rela- tions between two videos well. Both of them require finding I. INTRODUCTION an appropriate ways to select features and mine various po- As a result of rapid growth in broadband networks, digi- tential relations among videos. (2) The way of making usage tal video has become a major source of content on Internet, of these features to measure similarity among videos. and movie is one of the most widespread. IMDB (Internet To tackle the two issues, in this study, we introduce the Movie Database) is an online database regarding movie ac- notion of video entity and use it to present the complex video tors, movies, television programs, electronic games and film- content and relations. We first try to construct a video entity making [1]. It is a huge collection of movie information. network, and then various relations among video entity rela- Until Oct. 2010, IMDB contains more than 1,692,400 video tional network, such as similar genre, common director, are data and more than 3,797,000 people data. The system clas- defined. Two videos probably don’t share any common fea- sifies every pertinent detail about a movie into its attribute tures, while one video links to another video through several labels, such as Director, Writers, Stars, Release Date, Story- relation transmission steps. These steps may be one of these line, Plot Keywords, Taglines, Genres, Motion Picture Rat- relations: actedIn, directed, wrote. Whereafter, we attempt to ing. Users are easily attracted to the introductions, storylines, do some similarity calculation based upon the common fea- and stars in a concrete movie. Also, there are plenty of re- tures and show some experimental results. views and recommendations refer to popular ones. Through The remainder of this paper is organized as follows: In it, users can access the information of a concrete movie ad- section II, some related work will be discussed. Section III vantageously. However, although there are great amount of provides both the representation of video entity and the video the recommendation results, there is less information of de- entity network. In section IV, we present the semantic simi- tailed comparison between movies. larity measure between two video entities. Section V Considering the usages of videos are based on under- presents the data set from IMDB, experimental method, ex- standing semantics of videos well, it is important that the perimental results and discussion. Section VI provides the semantics of videos are well modeled. Compared with se- conclusion and some discussions about future work. mantic researches on information retrieval, video semantic is a composited knowledge: (1) There are various information II. RELATED WORK sources which can support various semantic of videos except In relative terms, OWL (The Web Ontology Language) text. (2) The semantic relations among various video compo- is intended to describe the definition and instantiation of nents are complicated and it is important to present these web ontology. As one kind of Formal Semantics, OWL spe- relations well. (3) Both the slight features and the time fea- cifies how to derive its logical consequences, that is, facts tures of videos make it is important to keep a certain degree not literally present in the ontology, but entailed by the se- of ambiguity while describe semantics of videos. mantics [3]. In the video domain, it’s unrealistic to derive all
the literal information, but we can draw support from the Tsyn (e) = {t1 , t 2 ,..., t n } (2) entailments of OWL to record the significant contents. An entity is a thing which has a separate existence and where t1, t2,…, t3 in the synonymy denote the different titles can be uniquely identified [4]. There are three main compo- of the video entity e. nents in ER (Entity Relation) [5]: Entities (or Objects), The different titles in the synonymy set share the same Attributes, and Relationships. A simple ER model can be entity by the Equ. (1). Polysemy situation also exists. Natu- defined as triad, ER = (E, A, R). The relationship indicates rally, different video entities sharing same title is allowed. It that each video entity in the model may have a relationship is normal in a video database such as IMDB, for example, with one of the connected entity. during 1970 to 2004, the number of movies with the title of Entity networks help users mine various kinds of infor- “lost” is 23. mation, instead of from unstructured text data. Lin Lin and In order to distinguish the different videos with the same Mei-Ling Shyu [6] proposed novel high-level feature min- title, description of videos must be referred. So it is ensure ing/detection framework utilizing the associations and cor- that the video is unique because the title and its description relations among the feature-value pairs and the target con- are provided in the structuring. cepts. They developed a mining algorithm in the areas of multimedia retrieval and high-level feature (concept/event) B. Video Entity Representation using OWL detection and make the Rule Selection and Classification for OWL can be used to express such semantic relations [9]. the similarity calculation. High-level features from OWL provides additional formal semantics such as disjoint- TRECVID 2007 and 2008 are used to compare the perfor- With, intersectionOf, unionOf, complementOf, oneOf, allVa- mance. luesFrom for the reason of expressing more semantic rela- In the area of video semantic, a movie recommendation tions and restrictions. Actually, it is feasible that we can take system based on YAGO and IMDB [7] was presented, which advantage of the addition semantics to apply in the domain based on semantic distance measurement and the features of of video entities. movies. Furthermore, they will put user’s feedback into con- In order to enable separate manipulation, we identify dif- sideration in the future work. By comparison, we focus on ferent types of distinguishing features (i.e., description, the latent rules in similarity among movies instead of be- attributes, title). The most important is that OWL provides tween movies and stars. The use of feature-matching and the ability to offer new features including inference, seman- information content to compare concepts are discussed in [8], tic joins, and eventually dynamic scenario construction [10]. and also with their approach in computing semantic similari- So we may use OWL to make a redescription of the relevant ty between objects or entities. information to make it more intelligible. Moreover, the mov- ie information is to be expanded. Adding or removing an III. VIDEO ENTITY AND VIDEO ENTITY NETWORK attribute or property procedurally become realizable. First, we initialize a class of movie which inherits from video, as A. Video Entity follows: A video entity is viewed as the description of the collec- tions of related information, and it can be formally represented as, Entity ( x) = {D, F } (1) Our representation of video entity class consults the defi- where x represents the identifier of the concrete video entity, nition of object-oriented classes. A formal syntax of a video other two parameters D and F represent its description and entity class definition using OWL is presented in Table 1. the collection of features respectively. TABLE 1. FORMAL SYNTAX OF VIDEO ENTITY CLASS Synonymy occurs when different titles denote the same DEFINITION video entity. For example, both two titles “The Shawshank Descriptor Redemption” and “Die Verurteilten” denote the same movie which ranks the first in IMDB. In this study, we view the ... synonymy Tsyn as the title tuples Esyn of a video entity e which map to the same video entity described in Equ. (2). Property For above example, the synonymy set can be represented as {The Shawshank Redemption, Die Verurteilten…}, the title which indexed first is mapping to a real video, others may Subproperty be mapped to the video according to the synonymy set. We nent for the representation of entity classes. The synonymy ... of a video entity e with the title mapping rules is described as follows:
From TABLE 1, we could see that MovieDescriptor 5: for each ∈ do properties relate movies to their genres and components of 6: Find the intersection of genres; others. HasGenre is a subproperty of the hasMovieDescrip- 7: add , into ; tor property, with its range further restricted to MovieGe- 8: end for nres. 9: else The video entity network is a group of individuals. So 10: for each ∈ do after the instantiation of individuals, we can use set opera- 11: add into ; tion of OWL to process class extension. For example, we 12: end for can use the operator named “intersectionOf” to get the re- 13: send if sults of entities which share the same genre named “drama”. 14: end for Plentiful set operations of OWL contribute to the various set 15: return node N operation of entity network. There’s one more thing to say that OWL can define data type properties and different There are two stages in Algorithm 1: (1) The first stage attributes can have their own association of data type. We includes step 1 and step 2. Title and description are used to were greatly enlightened that we can draw support from generate the identifier of the new video entity, and then the OWL to represent the video entity. Then the video entity new video entity is added as a new node which we use N to network facilitates the comparison of the individuals. represent in Algorithm 1. (2) The second stage includes from Considering our video entity network differs from the step 3 to step 14. Features traversal is applied in this stage, in hierarchical structure. If we classify various entities to differ- the outside loop, once the common feature F is found, it ent category, then one entity may appears in different catego- will be added into SF (an array records the common features). ry. In this circumstance, we have to deal with the redundancy, There is a slight difference in the traversal of genre, so we and the relationship among the entities will become more should record not only the genre identifier but also the inter- complicated. In general, our current method that calculates section of genres. the semantic similarity is based on the common features min- As shown in Table 1, the ObjectProperty named hasGe- ing between connected and independent entities. nre is used to record detailed genre information, other fea- C. Video Entity Network tures are recorded in the similar way. A simple figure is used to show the part of the defined video entity network. Several kinds of metadata could be used as attributes to describe movies in a structured manner which include direc- tor, genre, writer, country, plot keywords, release date, awards and so on. The reason that we choose IMDB as ex- ternal sources is to obtain adequate and authoritative infor- mation of movies. We can complete the attributes of video entity according to the actual requirements. It can be represented as an ontological model in that it emphasizes the semantic relationships among data elements [11]. Frequently, the major genre of different movies is the same if they share a same director. For example, Nuovo Cinema Paradiso and Malena share exactly the same director, genre and country. In this circumstance, we should take more attributes into Figure 1. Part of the Defined Video Entity Network account to discriminate them. Due to the fact that every movie in IMDB has its own de- scriptive web page, so extraction algorithms are designed for As described in Figure 1, the entity network consists of the semi-structured data in web pages. We should consider domain, entity, features and entities can be shared among not only the inherent features of the HTML but also the self- domains. Identifier is used to distinguish a video entity from defining ones. In short, title and description could be directly others. Relationship among them is created by mining the extracted through identification for tags and other detailed common features. It is desirable that a video e is represented attributes are extracted by surrounding tag properties and by a single entity and those have the same distinguishing element containing relationships. features are linked to this entity. In Figure 1, the two video After the representative features FE are extracted, the entities share the same director and one genre. next problem is how to establish the linkages among them. IV. SIMILARITY MEASURE FOR VIDEO ENTITIES The detail is described by Pseudo code in Algorithm 1. Algorithm 1 In our entity network environments, the general ap- 1: Use title and description to generate the identifier I. proaches to assess semantic similarity are feature-matching 2: Create the new node N with identifier I. and information content analysis. The former follows closely 3: for each ∈FE do to the common characteristics between entities to compute semantic similarity. While the later approach uses informa- 4: if is genre then tion theory to define different degrees of informativeness of
entities. Such as two entities share informativeness of child- them are the same, there is 1.7 (the maximum) as the weight concept if there exist an immediate super-concept that sub- wg according to the matrix W. In this study, the values of the sumes them. weight matrix are initialized based on our experiments, and Naturally, the genre and director are critical features of a we will do further study on this in the future work. video and both of them have a great influence on its style and target audience. In the similar way, the country and the re- B. Other Features Similarity Measure lease date are considered as second critical. Based on above Moreover, the style of a video depends heavily on its Di- considering, we select four factors which are genre, director, rector, Country and Release Date also. For two features of country and release date, as the major features to compute two video entities, they are Director and Country. We can the similarity value between two video entities. measure the director similarity and the country similarity using string comparison results. Each of them returns a Boo- A. Genre Similarity Measure lean value and we save two values to Simd and Simc respec- Genre is a significant factor which mainly determines the tively. performance style of videos. Generally a video belongs to Definition 3. Similarity of Release date of two video enti- more than one genre. For example, the genres of Godfather ties: Given two release date of two videos e1 and e2, D1 and include Crime, Drama and Thriller, while its major genre is D2, the similarity of release date of two video entities is crime. Genres are independent of each other and it’s difficult computed as, to quantify the correlation among genres directly. 1 Simr = − log (5) Definition 1. Genre Set: The set of genre G is defined as D1 − D2 an ordered array with n dimension as follows: C. Compute Semantic Similarity Between Two Video G = {genre1 , genre2 ,..., genren } (n ≥ 1) (3) Entities After we compute four similarity between two video enti- Here, genre1, genre2,…, genren represent all genres a ties, they are the genre similarity, the director similarity, the video belongs to, the parameter n represents the number of country similarity and the release date respectively, the se- all members contained in the genre set G. mantic similarity value between two video entities e1 and e2, Definition 2. Genre Similarity: Given two Genre set of Sim(e1,e2), is computed using Equ. (6). two videos e1 and e2, G(e1) and G(e2), the genre similarity is measured as follows: ( 1,e2) =wg ∗Simg (e1,e2)+wd *Simd (e1,e2)+wc *Simc(e1,e2)+wr *Simr (e1,e2) (6) Sime ∑ wg σ ∈G ( e1 )∩G ( e2 ) where wg + wd + wc + wr = 1 . Simg (e1 , e2 ) = (4) G (e1 ) ∪ G (e2 ) Simg(e1, e2), Simd(e1, e2), Simc(e1, e2) and Simr(e1, e2) where σ is the intersection of two genres set G(e1) and G(e2), represent similarity values from four aspects: they are genre, wg is the weight matrix representing the different values con- director, country and release date. The four parameters wg, ducted by different match cases of genres. wd, wc and wr, are the weights for the genre similarity, the To measure the similarity of genres, we assign different director similarity, the country similarity and the release date weights to each match of genres using the weight matrix W. similarity respectively. The four parameters can be different The elements of the matrix W stand for the strength of rela- but they all fall between zero and one with the condition that tionships between two genre sets G(e1) and G(e2) of two they sum to one. video entities e1 and e2. In this study, we initialize the weight V. EXPERIMENTAL EVALUATION matrix W as follows: ⎡ 1 0.8 0.6⎤ A. Data preparation W = ⎢⎢0.8 0.5 0.3⎥⎥ We select movies from IMDB in different genres consi- dering the style uniform distribution. A website crawler is ⎢⎣0.6 0.3 0.2⎥⎦ used to get relevant information of movies from the TOP 250 ranking page. The relevant information is stored as text and The rows and columns represent the genres of two video will be used as external data source. entities e1 and e2 respectively. The genres in the matrix W are The data consists of 100 movies and the genres of them sorted by the importance, which is the value in first place is include crime, drama, thriller, Adventure, Western, Mystery, the major one, the value in second place is less important, Sci-Fi, Biography, History, War, action, romance. Consider- and the value in last place is the least important. The matrix ing different predilection of the volunteers, we give higher W is symmetrical because the similarity measure between marks to their predilection. Besides, to keep the variety of two video entities is unordered. For example, if the first ge- genres, we select the movies with different genres on average. nre of the video entity e1 is as same as the second genre of For each kind of movie, the associated data given by volun- the video entity e2, then there is 0.8 as the weight wg accord- teers is recorded. They may give different points in different ing to the matrix W. For two video entities, if all genres of rank, 5 points in first rank, 4 points in second and so forth.
B. Evaluation results Through the experiments, we discover the latent rules in First we guarantee that all the volunteers have seen the the similarity between movies and we will adjust the weight test movies before. Each Experiment, we draw 10 movies matrix for the best performance in the future work. On the from the movie data at random, one of them is set as a test other hand, with regard to the model represented in this pa- object and the remaining movies are set as compared objects. per, our further work can focus on combining person entity Every volunteer is asked to give the similarity point for the into entity network, after all many attribute values of video test movie and the compared movies respectively. Then we entities is person name, so we could use person entity replace draw two most similar items of each test movies and they the person name as attribute values. We believe that the ap- will be stored as the correct one. proach in this paper is in the right direction for processing If the correct items marked by volunteers contain the similarity between video entities. most similar one calculated by our system, we count it as one ACKNOWLEDGMENT hit. After that, we take all the hit-count, miss-count and wrong-count into the calculation of results. We list the re- This work is supported by an Opening Project of Shang- sults in TABLE 2 as follows: hai Key Laboratory of Integrate Administration Technolo- gies for Information Security (No. AGK2010004) and a TABLE 2. THE PERFORMANCE OF THE SYSTEM WHEN USING grant from the National High Technology Research and De- THE DIFFERENT WEIGHT MATRIX WITH wg = 0.7 velopment Program of China (863 Program) (No. The weight matrix Precision Recall 2009AA01A348). ⎡ 1 0 .8 0 .6 ⎤ ⎢ 0 .8 0 .5 0 . 3 ⎥⎥ 0.524 0.409 ⎢ REFERENCES ⎣⎢ 0 . 6 0 .3 0 . 2 ⎦⎥ [1] http://www.imdb.com. Sep. 2010 ⎡ 1 0 . 75 0 .7 ⎤ [2] Yalan Yan, Jinlong Zhang and Mi Yan, Ontology Modeling for ⎢ 0 . 75 0 .6 0 . 4 ⎥⎥ Contract: Using OWL to Express Semantic Relations, Enterprise 0.463 0.397 ⎢ Distributed Object Computing Conference, 2006. EDOC '06. 10th ⎢⎣ 0 . 7 0 .4 0 . 2 ⎥⎦ IEEE International, pp. 409 – 412. [3] http://www.w3.org/TR/2004/REC-owl-guide-20040210/, Sep. 2010 We take two different matrixes to evaluate the accuracy [4] Stoermer, H. and Bouquet, P., A novel approach for entity linkage, and quality of the algorithm and the results are shown in Information Reuse & Integration, 2009. IRI '09. IEEE International Conference, pp. 151 – 156. TABLE 2. As shown in the experiment, the genres style sig- nificantly affects the performance of the calculation and if [5] Liquan Han, Jianchao Xu and Qing'an Yao, Entity-Relationship semantic meta-model based on ontology, Computer Application and we modify the weight matrix to an insensitive one, the preci- System Modeling (ICCASM), 2010 International Conference, pp. sion will be brought down. We consider the director, country, V11-219 - V11-222. release date as average impact factors in the experiment. [6] Lin Lin and Mei-Ling Shyu, Mining High-Level Features from Video While the range of the recall is comparatively narrow, we Using Associations and Correlations, Semantic Computing, 2009. must constantly test the factor wg and adjust the weight ma- ICSC '09. IEEE International Conference, pp. 137 – 144. trix in the future work. [7] Yajie Hu, Ziqi Wang, Wei Wu, Jianzhong Guo and Ming Zhang, Compared with the algorithm used in [7], the method Recommendation for Movies and Stars Using YAGO and IMDB, Web Conference (APWEB), 2010 12th International Asia-Pacific, pp. proposed by us processes relatively low precision. However, 123 – 129. we haven’t taken the actor factors into account and we fil- [8] Rodriguez, M.A. and Egenhofer, M.J., Determining semantic tered out much less data information than it. This leads to a similarity among entity classes from different ontologies, Knowledge fact that the algorithm complexity of our method is relatively and Data Engineering, IEEE Transactions, pp. 442 – 456, 2003. small. [9] D. Kosmopoulos, S. Petridis, I. Pratikakis, V. Gatos, S. Perantonis, V. Karkaletsis and G. Paliouras, Knowledge Acquisition from VI. CONCLUSIONS AND FUTURE WORK Multimedia Content using an Evolution Framework, IFIP International Federation for Information Processing, pp. 557-565, Realizing video semantic has become one of the most 2006. important issues in the semantic research domain, and the [10] Lacy, L. and Gerber, W., Potential modeling and simulation video entities could be used in various useful applications, applications of the Web ontology language – OWL, Simulation especially in computing the similarity. In this paper, we Conference, 2004. Proceedings of the 2004 Winter. mine some chief features to compute the semantic relation, [11] J. Jiang and D. Conrath, Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy, Proc. Int’l Conf. Computational while ignore others, so we can have some further studies on Linguistics (ROCLING X), 1997. other combination of features.
You can also read