VITAG: AUTOMATIC VIDEO TAGGING USING SEGMENTATION AND CONCEPTUAL INFERENCE - IIT HYDERABAD

Page created by Terrence Mcdaniel

IT & Technique

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM)

ViTag: Automatic Video Tagging Using Segmentation and Conceptual Inference

Abhishek A. Patwardhan, Santanu Das, Debi Prosad Dogra
Sakshi Varshney, Maunendra Sankar Desarkar School of Electrical Sciences
Department of Computer Sc. & Engineering IIT Bhubaneswar
IIT Hyderabad Bhubaneswar, India
Hyderabad, India Email: dpdogra@iitbbs.ac.in
Email: {cs15mtech11015,cs15mtech11018,
cs16resch01002,maunendra}@iith.ac.in

Abstract—Massive increase in multimedia data has created the ViTag framework is outlined in Figure 1.
a need for effective organization strategy. The multimedia col-
lection is organized based on attributes such as domain, index-
terms, content description, owners, etc. Typically, index-term
is a prominent attribute for effective video retrieval systems.
In this paper, we present a new approach of automatic video
tagging referred to as ViTag. Our analysis relies upon various
image similarity metrics to automatically extract key-frames.
For each key-frame, raw tags are generated by performing
reverse image tagging. The ﬁnal step analyzes raw tags in order
to discover hidden semantic information. On a dataset of 103
videos belonging to 13 domains derived from various YouTube
categories, we are able to generate tags with 65.51% accuracy.
We also rank the generated tags based upon the number of
proper nouns present in it. The geometric mean of Reciprocal
Rank estimated over the entire collection has been found to be
0.873.
Keywords-video content analysis, video tagging, video orga-
nization, video information retrieval Figure 1: Overview of the proposed ViTag architecture.

I. I NTRODUCTION
A. Related work
Finding match for a user submitted query is challenging
on large multimedia data. To reduce the empirical search, Automatic video tagging research is growing. Siersdor-
video hosting websites often allow users to attach a descrip- fer et al. [1] have devised a technique based on content
tion with the video. However, description or index terms can redundancy of videos. However, their approach requires
be ambiguous, irrelevant, insufficient or even empty. This querying external video collection to generate tags for the
creates a necessity for automatic video tagger. In this paper, video in question. Our approach exploits semantic similarity
we present an automatic video tagging tool, referred to as information [2]. Moxley et al. [3] performs a search using
ViTag. It involves video segmentation that extracts distinct, three attributes (frames, text, and concepts) to find matching
representative frames from the input video by hierarchical videos out of a collection of videos. The approach needs
combination of various image similarity metrics. In the next automatic speech recognition, and therefore it seems difficult
step, raw tags obtained from the segmented video frames to work on generic videos involving challenging domains
are investigated to estimate semantic similarity information. like animation, songs, games, etc. Toderici et al. [4] have
Finally, we annotate the input video by combining raw tags trained a classifier that learns association of audio-visual
with the inferred tags. features of a video to its tags. Machine learning based
In accomplishing this, we make the following contribu- approaches are also promising. But they come at higher
tions: (i) a hierarchical combination of three image similarity training and tuning overheads. Borth et al [5] have ex-
metrics to design a video segmentation algorithm, (ii) a tracted key-frames for video summarization using k-means
conceptual inference heuristic to automatically infer generic clustering to group similar frames into a cluster. Yao et
tags from raw tags, and (iii) a fully automatic, end-to-end, al. [6] have tagged videos by mining user search behavior.
and open-source tool to output the tags solely based on Their method requires dynamic information about user’s
analyzing the input video. The approach implemented within behavior. Probabilistic model-based method proposed in [7]

978-1-7281-5527-2/19/$31.00 ©2019 IEEE 271
DOI 10.1109/BigMM.2019.00050

Authorized licensed use limited to: Indian Institute of Technology Hyderabad. Downloaded on February 12,2021 at 10:02:57 UTC from IEEE Xplore. Restrictions apply.

involves the two-step process, i.e, video analysis followed
   by querying classiﬁcation framework to generate tags.
      Rest of the paper is organized as follows. Section II
   presents the overall methodology and implementation de-
   tails. In Section III, we present the results. Finally, in
   Section IV, we provide conclusions and future work.

                 II. P ROPOSED V I TAG F RAMEWORK
   A. Video Feature Extraction
     ViTag ﬁrst extracts the key-frames and feed them as inputs
   to the reverse image tagger that generates raw tags. The
   process of reading dissimilar frames from an input video is
   outlined in Algorithm 1. The threshold value can be set
   empirically.

   Algorithm 1 Selection of dissimilar frames
   Require: Video V
    1: Output frame sequence K = ∅
    2: prev ← First frame in V
                                                                                          Figure 2: Complete key-frame extraction module.
    3: for all F rame : f ∈ V do
    4:     score ← compute mean square error(f , prev)
    5:     if score > threshold then                                                 Algorithm 2 Selecting a representative frame for a given
    6:         K =K ∪f                                                               window
    7:         prev ← f                                                              Require: Frames[1..N]: Window
    8:     end if                                                                    Require: Scores[1..N-1]: Similarity scores for adjacent
    9: end for                                                                           frames within the window
   10: Return K                                                                       1: maxVal, maxInd ← MAX(Scores[1..N-1])
                                                                                      2: minVal, minInd ← MIN(Scores[1..N-1])
                                                                                      3: if minVal > threshold then
      The algorithm consists of two stages. The input sequence
                                                                                      4:     if Scores[maxInd-1] < Scores [maxInd+1] then
   of frames is partitioned into ﬁxed-sized non-overlapping
                                                                                      5:         select = maxInd+1
   windows. Within each window, we estimate the similarity of
                                                                                      6:     else
   two successive frames using features such as Mean Square
                                                                                      7:         select = maxInd
   Error (MSE), SIFT, and Structural Similarity Index (SSI).
                                                                                      8:     end if
   This results into a similarity vector (V ) for each window. A
                                                                                      9: else
   value in similarity vector (Vi ) depicts similarity score among
                                                                                     10:     if Scores[minInd-1] < Scores[minInd+1] then
   two adjacent frames (Fi , Fi+1 ) within the window. The input
                                                                                     11:         select = minInd+1
   to the video segmentation process as depicted in Figure 2 is
                                                                                     12:     else
   the set of frames selected by Algorithm 1. We analyze the
                                                                                     13:         select = minInd
   similarity vector for each window so as to select a single
                                                                                     14:     end if
   representative frame for that window.
                                                                                     15: end if
      Intuitively, we wish to select one frame that contains
                                                                                     16: return Frames[select]
   maximum information. Note that, a frame can be considered
   to contain maximum information in both cases when it
   matches to both of its neighbors with the highest matching
   score (considered as a representative frame for the window)                       B. Raw Tag Generation
   or its matching scores are low with the adjacent frames (it
                                                                                        The second phase of ViTag receives key-frames and
   contains unique information). To capture both cases, the
                                                                                     obtains raw tags by querying the reverse image tagger. The
   heuristic discussed in Algorithm 2 is used. The algorithm
                                                                                     reverse image search engine provides a list of web-pages and
   picks a frame contributing to maximum score. If the min-
                                                                                     the key-terms associated with the query image. Such a search
   imum score turns out to be less than a threshold value,
                                                                                     technique is discussed in [8]. Typically such algorithms em-
   the heuristic assumes the existence of a frame containing
                                                                                     ploy techniques including maximal stable external regions
   unique information. Hence we select the frame contributing
                                                                                     (MSER) detection [9], object detection [10], vocabulary
   to minimum score.

                                                                               272

Authorized licensed use limited to: Indian Institute of Technology Hyderabad. Downloaded on February 12,2021 at 10:02:57 UTC from IEEE Xplore. Restrictions apply.

tree [11], etc. We initially encode the frame within a query i =
V Wui . (1)
and it is ﬁred to the search engine. The responses are then e(u,i)∈E
parsed to extract tags. The steps are shown in Fig. 3. u∈T

Furthermore, it is expected that many of the tags in T will
be semantically similar to others. We model this situation
within bipartite graph G by inserting extra edges across two
nodes from the set T . Due to such construction, G is no
more a bipartite graph. We refer to these newly inserted
edges as semantic edges to distinguish them from the edges
originally present in G. We insert such semantic edges to
G in following two cases: (i) For each pair of tags (t1,
t2)∈ T , we ﬁnd semantic similarity score. We add semantic
edge E(t1, t2) and E(t2, t1) and label that edge with
semantic similarity score obtained. (ii) For all multi-word
tags m ∈ T , we check for the presence of each individual
word (w) within the set T . If it exists, we add edge E(m,w).
We label edge with a score equal to reciprocal of the total
number of words present in m. This allows us to capture the
semantic similarity of a multi-word tag. After augmenting
the semantic edges to graph G, we revise the score vector
to reﬂect the changes made in G. To compute the revised
V
Figure 3: Reverse image tagging methodology used in our , we use (2).
value of V
work.

V i +
i = V Sxu ∗ Wui . (2)
C. Conceptual Inference e(u,i)∈E e(x,u)∈E
After performing a reverse image search for the key- u∈T x∈T

frames, we post-process the obtained tags in order to infer
We revise each entry in V with product of two weights,
more generic tags. We achieve this by adding an extra mod-
ule of conceptual inference that refers to external knowledge i.e., (i) weight Wui connecting node u ∈ T to node i ∈ C
source built on the top of various concepts in the web. Such and (ii) node u connects to node x ∈ T with semantic edge
, we sort it in descending order
weight Sux . After revising V
a representation is referred to as a concept graph. Formally,
concept graph [12] is a knowledge representation structure and select top r entries. Semantic similarity metric used in
storing a weighted association of natural language words frameworks such as NLTK [13] fails to capture semantic
with (abstract) concept terms. similarity among commonly occurring words like iPhone,
gadget etc. We fix this issue by referring to concept graph
D. Semantic Similarity using Bipartite Graph engine.
Let T be the set of (unique) raw tags obtained and C
be the set of (unique) concept terms obtained by querying E. Implementation Details
each raw tag from T to the concept graph engine. A directed
bipartite graph G(T, C) with Edges E: T → C represents The video segmentation algorithm can be accelerated by
a mapping of raw tags to various concept terms. We label parallelizing computations. We achieve this by applying
each edge e(t, c) with a score w such that w: E → [0, 1]. classic loop transformation approach known as it loop tiling.
A labelled score w on each edge represents how likely a We query host architecture to get a total number of process-
concept c is associated with a tag t. Each score w is obtained ing cores (denoted as p) available on a system. We have tiled
by querying concept graph engine. We need to identify a the iterations of a parallel loop by a factor of p. We run all
set K ⊆ C such that each c ∈ K is associated with a large the iterations within each tile in parallel. Python3 has been
number of incoming edges from T . To find K, it is important used to implement ViTag. For computing SIFT and SSIM
to obtain a relative importance of each c ∈ C. Thus, we need scores, we have used OpenCV library. Our implementation
) of length equal to cardinality
to find the score vector (say V uses Google Reverse Image Search engine [14] to obtain raw

of C. Once we obtain V , it is easy to select top r entries tags of the key-frames. The conceptual inference heuristic
for some r ∈ N by simply sorting the vector V . In order to is based on Microsoft Concept Graph utility [12], [15]. The

compute the value Vi for i ∈ C, we sum up the weights of implementation and datasets are available at [16], [17].
the incoming edges for node i. Formally,

273

Authorized licensed use limited to: Indian Institute of Technology Hyderabad. Downloaded on February 12,2021 at 10:02:57 UTC from IEEE Xplore. Restrictions apply.

No Of
      Domain                                    Description                                                                                                   Examples
                                     videos
      Tourism                        8          Diverse tourist places, Seven wonders                                                                         Statue of Liberty, Gate way of India
      Products                       7          Product review, advertisements                                                                                iphone review, Nike-shoes ad
      Ceremony                       8          Popular events, ceremonies, Major disasters                                                                   Oscar Award functions, Japan Tsunami
      Famous persons                 10         Documentary on famous persons, Artists in concerts                                                            Indian leaders, Jennifer Lopez
      Entertainment                  9          Songs from popular movies, tv shows                                                                           Abraham Lincoln, Mr. Bean
      Speech                         7          Recent speeches by prominent personalities                                                                    Koﬁ Annan, Barack Obama
      Animations                     8          Popular animation movies, cartoon series                                                                      Tom n jerry, Kung fu Panda, Frozen
      Wildlife                       7          Videos/documentary on animal species                                                                          Peacock, Butterﬂy, Kangaroo, Desert
      Geography and Climate          8          Weather forecasting videos, Videos covering maps                                                              Continents of world, Weather forecast
      Vehicles                       8          Famous bikes, cars and automobiles                                                                            Sports Bike, Lamborghini Car
      Science and Education          8          Lecture series, Videos on general awareness                                                                   FPGA, Social media effects
      Video Games                    8          Popular Computer and mobile games                                                                             Counter Strike, Dangerous Dave
      Sports                         7          Videos of popular tournaments                                                                                 Tennis, Cricket, Football, Chess
      Total                          103
                                        Table I: Details of the video dataset used in evaluation of ViTag

                   III. R ESULTS AND E VALUATION
      We have used a video collection created using
   YouTube.com for experiments. While creating a collec-
                                                                                                     1.0
   tion, we have studied various video categories available on
   YouTube.com. It organizes videos into 31 different cate-
   gories out of which many categories are merged. Finally,                                          0.8                                                                                                                                                               0.76
   we have made 13 distinct domains. For each domain, we                                                   0.71                   0.7                                                               0.7
   have selected videos based upon inclusive opinion of each                                                                                               0.64                                                             0.64 0.63
                                                                                                                                             0.61 0.62 0.6                                                     0.61
                                                                                     Tag precision

   person in the group. For each popular content, we have                                            0.6              0.57                                                                                                                              0.57
   selected a random video obtained by searching the website.
   We have selected videos with length between 50 seconds
   to 4 minutes. A total of 103 videos have been collected.                                          0.4

   Each domain consists of approximately 8 videos. Table I
   describes the details. We have tagged each video using
   ViTag. Furthermore, by using natural language processing                                          0.2

   package (NLTK), we are able to reason whether a tag
   contains proper nouns or not. This information enables us to
                                                                                                     0.0
   rank the tags. We have also used reciprocal rank as another
                                                                                                                                                                Entertainment

                                                                                                                                                                                           Sports
                                                                                                                       Products

                                                                                                                                                                                Speech

                                                                                                                                                                                                                                        Science & edu

                                                                                                                                                                                                                                                         Video games

                                                                                                                                                                                                                                                                        Animations
                                                                                                                                  Ceremony

                                                                                                                                                                                                    Wildlife

                                                                                                                                                                                                                Geography
                                                                                                                                              Famous person

                                                                                                                                                                                                                             Vehicles
                                                                                                            Tourism

   metric for evaluation.
                                                                                                                                                                                         Domain
      Fig. 4 shows mean precision attained for each video do-
   main. Mean has been computed by taking the geometric
   mean of precision values attained for all videos belonging                            Figure 4: Tag precision recorded across various domains.
   to a particular domain. For Animations category, almost
   77% of the generated tags are precise. For videos belonging
   to product reviews and advertisements, ViTag obtains a                            descending order. The rectangle with dashed line shows ideal
   minimal precision of 57.36%. For videos belonging to do-                          accuracy having the precision of one for all the videos in the
   mains like Tourism, wildlife, animation, events/ceremonies,                       collection. Around 55% of the ideal accuracy is achieved by
   ViTag attains more than 70% precision. The summary of tag                         ViTag.
   precision is presented in Table III.                                                 We estimate the effectiveness of conceptual inference
      It is also important to investigate how many videos lie                        heuristic using the second metric of binary relevance for
   in particular precision interval. Fig. 5 shows number of                          inferred tags. For 4 out of 103 videos, conceptual inference
   videos attaining a particular precision interval. ViTag attains                   heuristic cannot infer extra tag. For 43 out of the remaining
   full (100%) precision for 11.5% videos. For 70 out of                             99 videos, the conceptual inference heuristic has inferred
   103 videos, it is able to generate more than 60% relevant                         meaningful tags. It has inferred vague tags for remaining 56
   tags. Fig. 6 summarizes the accuracy of ViTag. The plot                           videos. Table II lists videos for which conceptual inference
   is obtained by sorting the precision of all the videos in                         generates meaningful tags. The table also depicts cases

                                                                               274

Authorized licensed use limited to: Indian Institute of Technology Hyderabad. Downloaded on February 12,2021 at 10:02:57 UTC from IEEE Xplore. Restrictions apply.

Sample video
                                     Domain                                                                        Some raw tags                                                   Auto-inferred tags
                                                                            description
     Positive                        Famous Persons                         Indian freedom ﬁghters                 Mahatma Gandhi, Bhagat Singh,freedom ﬁghters                    leader person
     Conceptual                      Wildlife                               Dancing peacock                        peacock, peafowl                                                bird
     Inference                                                                                                                                                                     famous landmark,
                                     Tourism                                Eiffel Tower                           france eiffel tower, eiffel tower night view
                                                                                                                                                                                   sight
     Negative                        Ceremony                               Christian wedding                      woman, event, female, facial expression                         group, information
     Conceptual                      Geography                              Weather forecast                       map, planet, world, earth                                       object, material
     Inference                       Product                                iPhone review                          gadget, iPhone, Screenshot                                      item, factor
                                                                            Table II: Effectiveness of Conceptual inference heuristic

                     25

                                                                                                                                                 1.0
                                                                              24

                                                                     20
                     20

                                                                                                                                                 0.8
                                                                                              18

                                                                                      16

                                                                                                                                 Tag Precision
                     15
      No of videos

                                                                                                                                                 0.6
                                                                                                              12

                     10

                                                                                                                                                 0.4
                                                    5        5
                      5
                                            3

                                                                                                                                                       0   20   40            60   80      100
                             0      0                                                                  0
                      0                                                                                                                                              Video count

                            0−.1   .1−.2   .2−.3   .3−.4    .4−.5   .5−.6    .6−.7   .7−.8   .8−.9   .9−.99   1

                                                              Tag precision                                                 Figure 6: Tag precision for the entire video collection in
                                                                                                                            sorted order. The bounding rectangle depicts ideal accuracy.
                     Figure 5: No of videos Vs Tag-precision interval.

                                                           Tag Precision              Reciprocal Rank                       are very interesting to reﬂect enough potential about our ap-
                          Geometric mean                   0.6467                     0.873                                 proach. However, we think there are scopes for improvement
                          Arithmetic mean                  0.6491                     0.905
                          Median                           0.6389                     1.0
                                                                                                                            as discussed in future work. A snapshot of the user interface
                                                                                                                            of ViTag is presented in Fig. 8.
   Table III: Summary with Tag Precision and Reciprocal Rank
   as the evaluation metrics                                                                                                                      IV. C ONCLUSIONS AND F UTURE W ORK
                                                                                                                               We propose an analytical, end-to-end and fully automatic
                                                                                                                            approach to the problem of automatic video tagging. Our
   where conceptual inference fails to infer relevant tags. We                                                              approach exploits the combination of various image sim-
   have also evaluated ViTag based on reciprocal rank metric.                                                               ilarity metrics to select key-frames from the input video
   Fig. 7 shows the reciprocal rank for all the videos in the                                                               containing dissimilar information. Then we use reverse im-
   collection sorted in decreasing order. As seen from the                                                                  age tagging engine to generate raw tags for the input video.
   ﬁgure, ViTag covers 85% of the ideal case scenario. Table                                                                We infer generic tags using conceptual inference heuristic.
   III summarizes the statistical results with reciprocal rank as                                                           The heuristic leverages semantic similarity among tags for
   a metric of evaluation.                                                                                                  tag-inference. We have evaluated our implementation on
      In summary, our observations are as follows: For entire                                                               an open-collection comprising of 103 videos belonging to
   collection comprising of 103 videos, ViTag has generated                                                                 13 domains derived from various YouTube categories. Our
   696 tags out of which 456 tags are precise. It attains about                                                             implementation has obtained 65.51% precision and 87%
   65.51% of accuracy using precision as a metric. In 43.4%                                                                 accuracy using reciprocal rank as a metric. Our approach
   of the cases, conceptual inference heuristic has inferred                                                                is not video domain speciﬁc and it does not need any pre-
   valuable generic tags. It has obtained 87.3% accuracy as                                                                 tagged video dataset and training. This makes it practical
   per reciprocal rank metric. We believe the evaluation results                                                            and complementary to the state-of-art approaches.

                                                                                                                      275

Authorized licensed use limited to: Indian Institute of Technology Hyderabad. Downloaded on February 12,2021 at 10:02:57 UTC from IEEE Xplore. Restrictions apply.

[3] Emily Moxley, Tao Mei, Xian-Sheng Hua, Wei-Ying Ma,
                            1.0
                                                                                          and BS Manjunath, “Automatic video annotation through
                                                                                          search and mining,” in Proceedings of the IEEE International
                                                                                          Conference on Multimedia and Expo. IEEE, 2008, pp. 685–
                            0.8

                                                                                          688.
          Reciprocal Rank

                                                                                      [4] George Toderici, Hrishikesh Aradhye, Marius Pasca, Luciano
                                                                                          Sbaiz, and Jay Yagnik, “Finding meaning on youtube: Tag
                            0.6

                                                                                          recommendation and category discovery,” in Computer Vision
                                                                                          and Pattern Recognition, IEEE Conference on. IEEE, 2010,
                                                                                          pp. 3447–3454.
                            0.4

                                                                                      [5] Damian Borth, Adrian Ulges, Christian Schulze, and Thomas
                                                                                          Breuel, “Keyframe extraction for video tagging & summa-
                                                                                          rization.,” pp. 45–48, 01 2008.
                            0.2

                                  0   20     40            60   80   100              [6] Ting Yao, Tao Mei, Chong-Wah Ngo, and Shipeng Li, “Anno-
                                                  Video count
                                                                                          tation for free: Video tagging by mining user search behavior,”
                                                                                          in Proceedings of the 21st ACM international conference on
                                                                                          Multimedia. ACM, 2013, pp. 977–986.
   Figure 7: Reciprocal rank for all videos of the dataset shown
   in a sorted order. The bounding rectangle depicts the ideal                        [7] Jialie Shen, Meng Wang, and Tat-Seng Chua, “Accurate
   case.                                                                                  online video tagging via probabilistic hybrid modeling,” Mul-
                                                                                          timedia Systems, vol. 22, no. 1, pp. 99–113, 2016.
                                                                                      [8] Yushi Jing, David Liu, Dmitry Kislyuk, Andrew Zhai, Jiajing
                                                                                          Xu, Jeff Donahue, and Sarah Tavel, “Visual search at
                                                                                          pinterest,” 05 2015.
                                                                                      [9] J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide
                                                                                          baseline stereo from maximally stable extremal regions,” in
                                                                                          Proceedings of the British Machine Vision Conference. 2002,
                                                                                          pp. 36.1–36.10, BMVA Press, doi:10.5244/C.16.36.
                                                                                     [10] Josef Sivic, Bryan C Russell, Alexei A Efros, Andrew Zisser-
        Figure 8: A snapshot of the user interface of ViTag.                              man, and William T Freeman, “Discovering objects and their
                                                                                          location in images,” in Proceedings of the IEEE International
      We would like to come up with a deep neural network                                 Conference on Computer Vision. IEEE, 2005, vol. 1, pp. 370–
   driven reverse image tagger to improve the accuracy of tag                             377.
   generation. Also, we would like to explore various natural
                                                                                     [11] David Nister and Henrik Stewenius, “Scalable recognition
   language processing techniques to detect and eliminate the                             with a vocabulary tree,” in Proceedings of the IEEE computer
   non-relevant tags. For conceptual inference heuristic, we                              society conference on computer vision and pattern recogni-
   would like to introduce a scoring mechanism to reason                                  tion. Ieee, 2006, vol. 2, pp. 2161–2168.
   about the proﬁtability of adding extra generic tags. We                           [12] Zhongyuan Wang, Haixun Wang, Ji-Rong Wen, and Yanghua
   would also like to explore parameter tuning. This may                                  Xiao, “An inference approach to basic level of catego-
   have positive impact on existing accuracy. In addition to                              rization,” in Proceedings of the 24th acm international
   this, we would like to make it run on real-time multimedia                             on conference on information and knowledge management.
   video collection (such as www.YouTube.com). We think the                               ACM, 2015, pp. 653–662.
   current implementation stands as a good starting point to                         [13] Edward Loper and Steven Bird, “Nltk: The natural language
   explore above aspects.                                                                 toolkit,” in Proceedings of the ACL-02 Workshop on Ef-
                                                                                          fective Tools and Methodologies for Teaching Natural Lan-
                                           R EFERENCES                                    guage Processing and Computational Linguistics - Volume
     [1] Stefan Siersdorfer, Jose San Pedro, and Mark Sanderson,                          1, Stroudsburg, PA, USA, 2002, pp. 63–70, Association for
         “Automatic video tagging using content redundancy,” in                           Computational Linguistics.
         Proceedings of the 32nd international ACM SIGIR conference                  [14] “Google inc, google reverse image search.,” .
         on Research and development in information retrieval. ACM,
         2009, pp. 395–402.                                                          [15] Wentao Wu, Hongsong Li, Haixun Wang, and Kenny Q Zhu,
                                                                                          “Probase: A probabilistic taxonomy for text understanding,”
     [2] Jose San Pedro, Stefan Siersdorfer, and Mark Sanderson,                          in Proceedings of the ACM SIGMOD International Confer-
         “Content redundancy in youtube and its application to video                      ence on Management of Data. ACM, 2012, pp. 481–492.
         tagging,” ACM Transactions on Information Systems, vol. 29,
         no. 3, pp. 13, 2011.                                                        [16] “Vitag automatic video tagger.,” .
                                                                                     [17] “Vitag evaluation: video collection.,” .

                                                                               276

Authorized licensed use limited to: Indian Institute of Technology Hyderabad. Downloaded on February 12,2021 at 10:02:57 UTC from IEEE Xplore. Restrictions apply.

You can also read