Tracking Viral Videos on YouTube: Tool Set Construction
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Tracking Viral Videos on YouTube: Tool Set Construction David Hehir u4402855 November 3, 2011 Abstract all this content existing in the one place, much of it is rehashes of existing content. Currently, This paper documents the implementation of there is no easy way to track the reuse of content a tool set calculates a set of near matches of easily within the YouTube ecosystem. This is an input set of images using the colour correl- for two main reasons, comparing two videos to ogram as a feature extractor. This tool set is a check if they share nearly-matches of sections of subset of tools that can be used to extract com- contents is non-trivial to do reliably. The second mon short segments of video, or visual memes, reason is the huge amount of data to process. from a set of videos. The software described in In order to run any checks, the data must be this paper is able to extract features and match subsetted in some way. The only way to do similar images into sets of similar images, and this is currently using text searches that have sits within a large more complicated process for the potential to miss large amounts of reuse, in visual meme tracking. This paper serves as a particular if the video is untagged or tagged in design and performance testing document, illus- a different language. Thus, how much of it that trating that the two command line tools created is reused is not known and is the basis of this work with an extraction rate of up to 94% preci- project. sion and 87% recall, matching the performance This project’s aim is to investigate how can of the base algorithm this tool set is based on. the reused be measured between videos with- out actually watching these videos. Work on 1 Introduction this has already been undertaken to track small sections of remixed videos throughout YouTube YouTube is currently the largest social video (Xie, Natsev, Hill, Kender & Smith 2011). This platform on the Internet, containing a huge paper discussed a methodology that has been number of videos with over 6.9 billion video tested on YouTube with success, but only inves- views in the U.S alone (comScore 2011). Along tigates videos of current affairs, leaving a gap with this, 48 hour of new content is uploaded for further research into other genres. However every minute (Search Engine Watch 2011). With the tools to research other genres of videos are 1
not readily available, leading to the main topic tions being commercially used such as for image of this project. search engines (Bloore 2008) and Google Gog- In order to fit this project into a single gles (Google 2010) semester, this project focused mainly on the construction of the tools to allow others to fur- ther research the propagation of remixed videos 3 Method throughout YouTube. This paper focuses on the design of the tools (section 3.2.1) as well as the In this section, the methodology to create the implementation of these into functioning tools tools to solve the problem described in 1 is de- (section 3.3 and 3.4). The tools were then bench- scribed. marked against the same test data set as in (Xie et al. 2011) to compare implementation results. 3.1 Overview A set of correct matches allow the calculation of precision and recall values, that allow tuning The basis for the solution to the problem spec- of the tools as well as evaluation of the over- ified is given in (Xie et al. 2011). This paper all performance and thus suitability for further outlines a process of finding short segments of research. video that is replicated in many videos within YouTube. In this paper, the term visual meme is defined as "a short segment of video that is 2 Prior Work frequently remixed and reposted by more than one author" (Xie et al. 2011) . In order to trace In Xie’s paper (Xie et al. 2011), there is dis- the propagation of visual memes throughout cussion on the need for a cultural unit for Youtube a process shown in figure 1. videos as there is for still based and text (A) Separate (B) Robust (C) Build based mediums. Websites such as Know Your Set of Videos into Feature FLANN Videos Meme (Cheezburger Inc 2011) index many know Keyframes Extraction Index memes including video but have limited actual measures of a meme’s spread other than video Meme (F) (D) (E) Distance Transitive Approximate views, Google insight or observation. Clusters Verification Closure NN Lookup Other sites offer ways of visualising the spread of particular news stories or search terms Figure 1: High Level Process of Visual Meme (Leskovec, Backstrom & Kleinberg 2011). How- Matcher ever this is only for text and not for video. This project focuses more on the problem of comparing sections of video, represented by (A) Separate Videos into Keyframes The first key frames, together and finding videos with step within the process is to convert the set of near-matches. The idea of Content-based image videos into a set of still images, representing querying is not new, with several implementa- scenes within the video. This process is known 2
as keyframe extraction. For the sake of sim- based on a given input data set of feature vec- plicity for this project, keyframe extract is not tors. The output of this step is a FLANN incorporated in the toolset construction. This matcher that is ready to be used to calculate was done with since other research is currently matches. undergoing to complete this stage. For this project, the starting point is the out- (D) Approximate NN Lookup Step (D) in the put of step (A), with a set of images the input process is to calculate the matches. This step is for the rest of the toolset. where all the feature vectors produced in step (B) are matches producing a large number of (B) Robust Feature Extraction The first step sets of pairs. that the tool set needs to calculated is the fea- ture vector for each image. A feature vector is (E) Distance Verification In Xie’s paper, a essentially n dimensional vector containing in- match is only considered valid if the distance formation about the image in each entry. The between the two feature vectors are less than a colour correlogram is discussed in more detail specific threshold. This threshold is defined as in section 3.3. The output of this step is a set of feature vectors representing each image. | f q |2 Tq , τ (1) | f max |2 (C) Build FLANN Index Once the feature vec- All matches that satisfy this requirement are tors have been generated, clusters of matches kept. The result from this step is a subset of the need to be generated. In this project, Fast pairs found in step (D) Library for Approximate Nearest Neighbours (FLANN) is used to calculated which images (F) Transitive Closure The final step in the are near neighbours to each other. This in turn process is group the pairs together. This process allows grouping of near-match images together groups the pairs together as shown in figure 2 to form groups of memes. This process is run over every pair, creating a FLANN performs an approximate nearest set of clusters of similar images. neighbour by clusting the images into trees, by storing images that are ’close’ in a similar sec- tion of the tree. By storing the images in trees 3.2 Tool set Design and only search a subset of the tree (e.g. only 3.2.1 Problem Decomposition specific tree branches based on a query image), a faster but approximate nearest neighbours can Figure 1 outlines the 6 main steps involved in be found. Further detail on how FLANN works calculating sets of meme clusters from YouTube is given in section 3.4. videos. In this project, only a subset of this The first step involving FLANN in this project flow chart has been considered. The main as- is the indexing step. This step builds the tree sumption is that the videos have already been 3
3.2.2 Design Criteria a) 1 2 2 3 In the initial design stages of the tool set, a set of criteria were written b) 1 2 • Memory Efficiency • Modular 3 • Tunable Memory Efficiency The given problem for Figure 2: Transitive Closure Example this project deals with a large number of im- ages that require processing. One of the biggest challenges with dealing with images on this scale is memory usage. After a certain number Input Keyframes Performance Metrics Meme Matcher of images it is no longer possible to keep them Toolset in memory, thus any processing over the whole Tuning Parameters Output Meme Clusters data set needs to be in stages keeping the mini- mal amount of information about the image in Figure 3: High level block diagram of developed memory at an period of time. tool set Tunable In order to get the best performance out of the tool set, several tuning parameters for both the colour correlogram and feature extracted from YouTube and have been subse- matcher need to be exposed. The two obvious quently had their keyframes extracted. parameters that can be exposed are the search distance k for the colour correlogram and τ, the From the highest level, the tool set ideally global threshold distance value. operates as a black box as illustrated in figure 3. From the diagram, all that is required is a set of images, along with some optional tuning Modular The design constraint that the tool parameters. set should modular was chosen since it is likely that this tool set may only be used in part. That The assumption that keyframes are already is, a different feature descriptor extractor may extracted allows the overall process to be simpli- be used to compare to the results of the colour fied and split into two main processing section; correlogram. Alternatively, the colour correl- calculating the colour correlogram feature de- ogram may be used as a feature extractor for scriptors and matching based on these descrip- other uses than meme clusters. Therefore, the tors as illustrated in figure 4. tool set was split into two separate executables, 4
Input Keyframes Performance Metrics Calculate Match Correlogram Features Tuning Parameters Output Meme Clusters Feature Descriptors Figure 4: Proposed tool set decomposition one for the colour correlogram and another for Thus, the feature vector calculated from the the feature matching. colour correlogram is a vector containing a prob- This design choice allows faster tuning of the (k) ability value αc at index c. Therefore, the fea- feature matcher. This is because the colour cor- ture vector with the same dimension as the num- relogram can be executed once only (for a set ber of quantization bins. distance value), with the feature matcher being Figure 5, illustrates a small image section con- able to reuse the correlogram values. sisting of 9 pixels and two colours, c1 and c2 . When examining the middle pixel with k = 1, (1) 3.3 Colour Correlogram the autocorrelogram will be equal to αc1 = 14 . 3.3.1 Background The colour correlogram can be defined as fol- lows, given a pixel of colour ci in an image, the C1 C2 C2 colour correlogram defines the probability that a pixel within a distance k is of colour c j (Huang, Kumar, Mitra, Zhu & Zabih 1997). The auto- correlogram is defined as a correlogram with C2 C1 C2 ci = c j . That is, the probability that a pixel within distance k shares the same colour as the reference pixel. This can mathematically be ex- pressed as C2 C1 C2 h i (k) γci ,c j , Pr p1 ∈ Ici ,p2 ∈ Ic j p2 ∈ Ic j | p1 − p2 | = k (2) Figure 5: Small Subset of Pixels with represen- tative colour values With the autocorrelogram defined as (k) (k) The autocorrelogram was selected as the fea- αc , γc,c (3) ture detector since it can be computed quickly 5
and can tolerate changes in zoom, viewing po- with this step. This library was chosen since sition as well as being rotation invariant (by . Although there exist bindings for C,C++ and definition). These properties make the color cor- python, the C bindings were chosen due to the relogram well suited for use in visual meme language familiarity and potential performance matching since it is fairly common for small gains over a python implementation. transforms to be undertaken by the remix au- The process the program will go through to thor (e.g. cropping or resizing). calculate the correlogram is illustrated in the To illustrate the autocorrelogram, figure 6 flow chart in figure 8 shows three images with their correlogram un- The process outlined in the figure allows derneath. The left and centre images are consid- much of the work to be delegated to the ered matches, while the image on the right is not OpenCV or other libraries. OpenCV provides a match. The correlograms plotted underneath facilities to read an image (step A) into a form illustrate this. such that direct access to the pixel values are The fact that the correlogram is computation- possible. ally simple with a runtime of O(n2 k ) (Huang Quantisation (step B) is achieved by reading et al. 1997) make it suitable for use with large each individual pixel value, based on its value data sets, such as the ones used in this project setting it to a new value inside a quantisation and (Xie et al. 2011). bin. The following formulas implement this: pixel value Cross Pattern Colour Correlogram A modifi- bin = b c interval cation of the original colour correlogram feature pnew value = interval/2 + interval × bin vector is to calculate a cross pattern colour cor- relogram. The idea is the only pixels that are Where the value of interval is defined as considered are those within a cross mask as interval = Maxchannel /Nchannel bins . This defini- shown in figure 7. tion allows different number of quantisation The reason for doing so is to remove any bins in each channel. In this project 162 bin boarders or logos that may be present (e.g. news quantisation is used, with 18 hue channels, 3 feeds down the bottom), that are common across saturation channels and 3 value channels. many different images. The colour correlogram In Huang et. al. (Huang et al. 1997) an al- used in the reference paper (Xie et al. 2011) gorithm is proposed for calculating the correlo- utilised the cross colour correlogram. However, gram in O(n2 d) for small values of search radius for simplicity, the regular correlogram was used. d. The algorithm uses the following formula (k) Γci ,c j ( I ) , { p1 ∈ Ic1 , p2 ∈ Ic j } | p1 − p2 = k| 3.3.2 Design (4) At the core of this tool is the use of OpenCV (k) Γci ,c j (Bradski 2000). This library handles the im- γci ,c j = (5) age loading and simple manipulations involved hci ( I )8k 6
1.0 1.0 1.0 51770 51763 15175 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0.0 0.0 0.0 Figure 6: Three images and their respective colour correlograms Where λc,h ( x,y) (k) is defined as: λc,h ( x,y) (k) = λc,h ( x,y) (k − 1) + λc,h ( x +k,y) (0) with an initial condition 1 if p ∈ Ic λc,h ( x,y) (0) = 0 otherwise The calculate correlogram stage (step C) is im- plemented using the O(n2 d) method explained Figure 7: Area taken into account for correlo- above. This process is repeated for each of the gram calculation input images, with each value of ci representing a quantisation bin, thus giving a feature vector with 162 bins in this project. It should be noted Where hci ( I ) is the image histogram. The func- that as each bin is calculated, the max feature (k) vector is updated if the result is larger than the tion for Γci ,c j is define as follows for the O(n2 d) solution previous stored value. The final step in this process is to serialize the feature vectors (step D). In this case, the feature c ,h c ,h vectors were chosen to be serialized in JSON ∑ (k) Γci ,c j = λ(xj −k,y+k) (2k ) + λ(xj −k,y−k) (2k ) ( x,y)∈ Ici since it is human readable (i.e. nicer to debug) c ,h c ,h and lightweight than XML, with the potential + λ(xj −k,y−k+1) (2k − 2)λ(xj +k,y−k+1) (2k − 2) to be significantly smaller files than XML. 7
Read Image Calculate Correlogram Serialize Feature Quantise Image (B) from Disk (A) (C) Vector (D) Figure 8: Process for calculating the correlogram for a single image 3.3.3 Implementation t y p e d e f s t r u c t CorrelogramCollection { The Colour Correlogram tool was the first of the CorrelogramArray ∗ c o r r e l o g r a m s ; two tools to be constructed. It was primarily double ∗ MaxFeatureVector ; written in C since this was a language option i n t NumBins ; that has bindings to OpenCV. OpenCV was cho- int searchDistance ; sen as the image library since it is not only free } CorrelogramCollection ; but has the simple functions to read images and calculate histograms. The proposed design has met the design cri- The JSON serialization library, Jansson teria specified in section 3.2.2. The program has (Jansson 2011), was chosen since it has no exter- been written with memory use in mind. For nal library dependencies as well as being simple example, the program will only have on image to use. in memory at any point in time, this is essen- With OpenCV and Jansson, the tool set was tial since images are large and by keeping the relatively straightforward in implementation of image in memory is unnecessary and reduces the Colour Correlogram feature extractor. scalability of the program. Throughout the tool, there are three main C The program has satisfied the tunable design structs that are used to store the set of feature criteria s vectors shown below. t y p e d e f s t r u c t Correlogram 3.4 Fast Library for Approximate Near- { est Neighbours char ∗ fileName ; 3.4.1 Background double ∗ F e a t u r e V e c t o r ; } Correlogram ; Approximate nearest neighbour matching is a method used to approximately find matches of t y p e d e f s t r u c t CorrelogramArray points (in our case vectors) in high dimensional { space. A linear search typically is not used since Correlogram ∗∗ a r r a y ; this takes too long to execute with large datasets i n t elements ; of high dimensionality; the time complexity of a } CorrelogramArray ; linear search is O(n2 d), where n is the number of images and d is the dimensionality. 8
As mentioned previously, approximate near- Correlogram process. est neighbour solutions reduce this complex- Step C is where the bulk of the work is done ity, allowing the matching process to be done in this tool, where each individual image is in less than O(n2 d) time. The FLANN library matched with a set of various other images which uses randomised k-d trees, can be build within a specified search radius, given by equa- in O(nlog2 n) and queried in O(n1−1/k + m) tion 1. This is done using a FLANN based (Cormen, Leiserson, Rivest & Stein 2009b). The matcher that matches a query image, imgq , to FLANN library has been found to not only be other images within the image database that faster than linear search but also a magnitude have a distance to imgq that is less than or equal faster than other approximate nearest neighbour to the threshold value, calculated from 1. match algorithms (Muja & Lowe 2009), making Each iteration of the matching process pro- it ideal for the purposes needed in this project. duces a set, containing image filenames that are The actual FLANN algorithm is not discussed considered matches. With the set of match pairs in depth here and more information can be calculated, the transitive closure step needs to found in Muja et. al. (Muja & Lowe 2009). executed. It was decided for simplicity to do Instead, this project just interacts with libraries this within the same application as the feature that implement the algorithm. There are two matching so that the amount of intermediate main libraries that implement FLANN, OpenCV data serialized to disk is minimised. (Bradski 2000) and the Point Cloud Library Thus, after matching has occurred for each (Rusu & Cousins 2011). query image (i.e. all the images in the database), the sets of images need to be merged. The merg- 3.4.2 Design ing step is done such that if image A ∈ Sa and A ∈ Sb then the all images I ∈ Sa ∪ Sb are con- The Feature Matcher design is far more simple sidered a match. The one disadvantage to this than the Colour Correlogram design in section method of joining is that the worst case distance 3.3 above. This is because the majority of the between two images is nTq , where n is the num- calculation is done within external libraries, re- ber of images in Sa ∪ Sb , which may lead to ducing this second executable to just interacting some false positives. However, this method of with the libraries. joining has the advantage of being able to use The overall process is illustrated in figure 9. the disjoint set data structure with each match The first major section in the feature matcher results being a set off the parent represented tool is to read in the file (step A) and deseri- by its query image. This allows the merging alize this file into feature vectors (step B). The operations to only take O(mα(n)) where α(n) is deserialization process is tightly bound to the the inverse Ackermann function and m is the serialization process of the Colour Correlogram number of disjoint set operations (join, find, etc) (section 3.3). The order of extracting the objects (Cormen, Leiserson, Rivest & Stein 2009a). from the JSON file needs to be done in the same The matches are then serialized using a sim- order in which they are packed by the Colour ple JSON file. This allows the results of the 9
Read JSON File Calculate Nearest Group Sets into Deserialize File (B) from Disk (A) Neighbours (C) Clusters (D) Figure 9: Feature Matcher process matching to be checked at a later date. time taken 20 fold to approximately 30 seconds on the same data set. 3.4.3 Implementation Using the FlannBased matcher gives three dif- ferent methods of querying the dataset; match, The implementation of the feature matcher was knnMatch and radiusMatch. The first, match, re- required to be written in C++ due to interfaces turns only a single but best result and is thus of various descriptor matchers (including the inapplicable for the tool set needs. knnMatch re- FLANN based matcher) only exist in the C++ turns the best k matches. This method could be OpenCV bindings and not C. used by selecting the best 50 results then culling OpenCV was chosen over the Point Cloud the results based on their distance to the query Library since the colour correlogram tool was vector as in Xie (Xie et al. 2011). This method written using OpenCV, thus not adding in a new would be preferable if there are too many result system dependency. The Point Cloud Library within the search radius Tq , thus limiting to a is focused more on robotics while OpenCV has maximum of 50 results. However for simplic- more an image processing focus, giving extra ity, the radiusMatch method was used since this helper functions making this tool easier to im- only returns results that are within the search plement. distance away from the query image. The initial step of deserialization was able At this stage a disjoint set is built from the to be completed again with Jansson. With the results of the FlannBasedMatcher.radiusMatch feature vectors deserialized, the vectors needed method call. C++11 does have a disjoint set but to be converted to a format that is suitable for C++11 is not in widespread use. Thus, the boost the OpenCV FlannBasedMatcher object. This implementation of disjoint set was used (Siek, meant that the feature vectors (originally dou- Lee & Lumsdaine 2001). The boost implementa- ble[]) needed to be converted to an OpenCV tion uses both path compression and union by specific format, cv::Mat. rank so that the O(mα(n)) is achieved. This deserialization step was found to take a significant amount when each feature vector was serialized in a single file (approximately 4 Results 10 minutes) with the test data set discussed in 4. A modification to the Colour Correlogram Once the program was constructed as described tool was made such that the feature vector was in section 3.2.1, the tools needed to be verified serialized into a single file which reduced the that they work as expected. 10
4.1 Methodology τ Recall Precision Nmatches The testing methodology was to use the same 1 0.744 0.995 1416 tuning dataset as in Xie (Xie et al. 2011) and 2 0.739 0.994 1595 compare the recall and precision results. Along 3 0.553 0.982 2394 with the test dataset a set of true/false pairs 4 0.818 0.970 4172 were given to calculate the recall and precision 5 0.875 0.941 6289 values. 6 0.915 0.511 4504 A python script was used to piece together 7 0.978 0.358 1201 the two tools and batch run the tools with dif- 8 0.991 0.342 282 ferent parameters. The python script also does 9 0.996 0.339 84 the calculation of precision and recall values as 10 1.0 0.339 41 follows: 11 1.0 0.339 23 for pair in Pairs : 12 1.0 0.339 15 f i l e 1 = FindPairIdInFile ( pair ( 0 ) ) 13 1.0 0.339 10 f i l e 2 = FindPairIdInFile ( pair ( 1 ) ) 14 1.0 0.339 3 i f pair i s a true pair : 15 1.0 0.339 3 if file1 = file2 : numerator++ Table 1: Results for r = 1 r e c a l l D e n o m i n a t o r ++ precisionDenominator++ else From table 1 similar results as compared to r e c a l l D e n o m i n a t o r ++ the reference material was found. The operating i f pair i s a f a l s e pair : point chosen in (Xie et al. 2011) was τ = 11.5 if file1 = file2 : with recall value R = 80.1% and precision value precisionDenominator++ P = 98.2%. The closest value occurs somewhere end loop between 4 and 5. The tool was run again using a Recall is then calculated as finer interval of τ values (0.1) giving the results numerator shown in table 2 recall = recallDenominator From the tables, there is no direct match for Precision is given as the operating point found in the reference mate- rial. Thus, the maximum F1 score will be taken numerator precision = (F1 = R2RP + P ). The maximum value is found to be precisionDenominator at τ = 5 Compared to (Xie et al. 2011), the results 4.2 Observed Results found using the tool build in this project are Using the methodology presented above, the comparable but ultimately are less precise than recall and precision values were found for a the equivalent values found in the reference pa- varying number of τ and search radius r values. per. However, it should be noted that this tool 11
τ Recall Precision 1.0 4.1 0.826 0.970 4.2 0.826 0.965 4.3 0.828 0.963 0.8 4.4 0.821 0.960 Precision 4.5 0.831 0.960 4.6 0.841 0.954 0.6 4.7 0.851 0.950 4.8 0.850 0.946 4.9 0.859 0.944 0.4 0.2 0.4 0.6 0.8 1.0 5.1 0.879 0.93 Recall 5.2 0.878 0.925 5.3 0.875 0.905 Figure 10: Plot of Recall against Precision for 5.4 0.875 0.881 r=1 5.5 0.884 0.804 5.6 0.896 0.76 This may also explain the differences between 5.7 0.909 0.708 operating points. 5.8 0.909 0.651 The tool set used in the reference set utilised 5.9 0.911 0.54 the cross-correlogram as explained in 3.3. This Table 2: Results for r = 1 continued again changes the feature vector resulting in different matches when compared to those in found above. set produces slightly higher recall values. One Other than τ it is also possible to tune the explanation for this is that this tool set makes colour correlogram stage with a larger search larger groups than the equivalent in the refer- radius. ence paper. This is likely to result in higher When the same process was run over r = 2, recall values and lower precision as found here. the precision and recall values where found to The reason for this can be attributed to differ-be as shown in table 3. The results found using ences in the preprocessing and implementation this new radius value are similar though with a of the colour correlogram. For the results found, different operating point. When the values are no preprocessing was undertaken. However, in plotted over figure ??, the values form a similar the reference results, the extra preprocessing pattern as shown in figure 11. steps included the removal of blank frames, re- moval of boarders, normalization of the aspect 4.3 Performance ratio, de-noising and contrast and gamma cor- rection. This would mean two different feature Performance was not a major criteria for the vectors since they are now two different images. design of this tool set. However, since it may 12
τ Recall Precision 1.0 1 0.744 0.995 2 0.739 0.994 0.8 Legend Precision 3 0.738 0.994 R=2 R=1 4 0.582 0.987 5 0.624 0.977 0.6 6 0.822 0.971 7 0.853 0.953 0.4 8 0.864 0.924 0.2 0.4 0.6 0.8 1.0 Recall 9 0.914 0.766 10 0.940 0.450 Figure 11: Plot of Recall against Precision for 11 0.978 0.361 r = 1 and r = 2 12 0.984 0.344 13 0.996 0.342 14 1.0 0.339 Colour Correlogram Time (s) 15 1.0 0.339 r=1 3618 Table 3: Results for r = 2 r=2 7964 r=3 14046 be used for potentially very large dataset, some Feature Matcher idea of timing is required. Table 4 shows the τ=1 35 wall time taken to execute the individual tools τ=2 35 at different tuning values. The tests were run on τ=3 35 a laptop with an i5-2410M 2.30GHz with 4GB of RAM. Table 4: Timing of individual tools Further values of τ are not shown since the tool only took 35 seconds regardless of the value of τ or r. the investigation that was initially started in From the timing results and recall and pre- (Xie et al. 2011) into finding and observing how cision results shown above, varying the search visual memes propagate through the YouTube radius ,r, provides no benefit. ecosystem. This goal was not achieve for various reason but mainly due to time constraints. How- 5 Conclusion ever, this project report has illustrated that the tools required to do so have been constructed The original aim of this project was to not only and require only extra preprocessing of images develop tools that allow feature extraction and or implementing the cross colour correlogram matching of key frames in videos but to extend to begin the original projects aims investigation. 13
5.1 Further Work References From here, there are several obvious exten- Bloore, P. (2008), ‘Tineye’. Accessed 20/10/2011. sions to the work outlined in this report. Available Online www.tineye.com. One is to rewrite the colour correlogram such that it extends the OpenCV abstract class Bradski, G. (2000), ‘The OpenCV Library’, Dr. cv::FeatureDetector. This would allow easier Dobb’s Journal of Software Tools . interaction with the rest of the OpenCV library, making the first tool that calculates the colour Cheezburger Inc (2011), ‘Know your meme’. Ac- correlogram much simpler. cessed 20/10/2011 http://knowyourmeme. A second obvious extension of this work is com. to begin the investigation into visual memes comScore (2011), ‘comscore releases au- on YouTube. In (Xie et al. 2011), only current gust 2011 u.s online video rank- affair videos have been investigated as visual ings’. Accessed 20/10/2011. Avail- memes. However, many other genres of videos able Online http://www.comscore.com/ exist on YouTube that would provide interesting Press_Events/Press_Releases/2011/9/ results. For example, music videos and gaming comScore_Releases_August_2011_U.S. videos could be used to find out another mea- _Online_Video_Rankings. sure of popularity, rather than just video views. This tracking could be used in conjunction with Cormen, T., Leiserson, C., Rivest, R. & Stein, traditional measures (views) to document the C. (2009a), Introduction to Algorithms, MIT popularity of video memes in a similar way to Press, chapter 21.4 Analysis of union by that used by text and other mediums. rank with path compression. Cormen, T., Leiserson, C., Rivest, R. & Stein, 6 Code C. (2009b), Introduction to alogrithms, MIT The code has been kept on github avail- Press, chapter 10. able at https://github.com/davidhehir/ Google (2010), ‘Google goggles’. Accessed Comp3750Project. The code, installation guide 20/10/2011. Available Online http://www. and user guide can all be downloaded from this google.com/mobile/goggles/. repository. Huang, J., Kumar, S. R., Mitra, M., Zhu, W.-J. & Zabih, R. (1997), Image indexing using Acronyms color correlograms, in ‘Proceedings of the ANN Approximate Nearest Neighbour 1997 Conference on Computer Vision and Pattern Recognition (CVPR ’97)’, CVPR ’97, FLANN Fast Library for Approximate Nearest IEEE Computer Society, Washington, DC, Neighbours USA, pp. 762–. 14
Jansson (2011), ‘Jansson’. Accessed 18/10/2011 http://www.digip.org/jansson/. Leskovec, J., Backstrom, L. & Kleinberg, J. (2011), ‘Meme-tracking and the Dy- namics of the News Cycle’. Accessed 20/10/2011 Avaliable Online http:// memetracker.org/quotes-kdd09.pdf. Muja, M. & Lowe, D. G. (2009), Fast approx- imate nearest neighbors with automatic algorithm configuration, in ‘International Conference on Computer Vision Theory and Application VISSAPP’09)’, INSTICC Press, pp. 331–340. Rusu, R. B. & Cousins, S. (2011), 3d is here: Point cloud library (pcl), in ‘International Conference on Robotics and Automation’, Shanghai, China. Search Engine Watch (2011), ‘New YouTube Statistic: 48 hours of video uploaded per minute’. Accessed 20/10/2011. Avail- able Online http://searchenginewatch. com/article/2073962/New-YouTube- Statistics-48-Hours-of-Video- Uploaded-Per-Minute-3-Billion-Views- Per-Day. Siek, J. G., Lee, L.-Q. & Lumsdaine, A. (2001), The Boost Graph Library: User Guide and Reference Manual (C++ In-Depth Series), Addison-Wesley Professional. Xie, L., Natsev, A., Hill, M., Kender, J. & Smith, J. R. (2011), ‘Visual Memes in Social Me- dia: Tracking Real-world News in YouTube Videos’, ACM Multimedia . To Appear. 15
You can also read