A Smart Assistant for Visual Recognition of Painted Scenes
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
A Smart Assistant for Visual Recognition of Painted Scenes Federico Conconea,b , Roberto Giaconiab , Giuseppe Lo Rea,b and Marco Moranaa,b a University of Palermo, Department of Engineering, Viale delle Scienze, ed. 6, 90128, Palermo, Italy b Smart Cities and Communities National Lab CINI - Consorzio Interuniversitario Nazionale per l’Informatica Abstract Nowadays, smart devices allow people to easily interact with the surrounding environment thanks to existing communication infrastructures, i.e., 3G/4G/5G or WiFi. In the context of a smart museum, data shared by visitors can be used to provide innovative services aimed to improve their cultural experience. In this paper, we consider as case study the painted wooden ceiling of the Sala Magna of Palazzo Chiara- monte in Palermo, Italy and we present an intelligent system that visitors can use to automatically get a description of the scenes they are interested in by simply pointing their smartphones to them. As compared to traditional applications, this system completely eliminates the need for indoor position- ing technologies, which are unfeasible in many scenarios as they can only be employed when museum items are physically distinguishable. Experimental analysis aimed to evaluate the performance of the system in terms of accuracy of the recognition process, and the obtained results show its effectiveness in a real-world application scenario. Keywords Machine Learning, Human-Computer Interaction (HCI), Cultural Heritage 1. Introduction impairments may rely on specific services provided by their smartphones to move in Smart personal devices, such as smartphones public spaces, so helping them to live in a and tablets, have totally changed the way more independent way [3]. people live. In addition to the traditional call- In this context, social and cultural inclu- ing and messaging capabilities, these devices sion represents a primary goal for many come with heterogeneous sensors that allow innovative IT systems. A smart museum, for users to collect and share information with example, is a suitable scenario for this type the surrounding environment, thus paving of solutions because a wide range of services the way to a new generation of applications can be provided to users in order to create that would not otherwise be possible [1, 2]. a more inclusive and informative cultural For instance, people with visual and hearing experience, both physically and virtually. In an ideal smart museum, visitors should be Joint Proceedings of the ACM IUI 2021 Workshops, April able to get suggestions about the items to 13-17, 2021, College Station, USA visit, as well as personalized descriptions " federico.concone@unipa.it (F. Concone); roberto.giaconia@gmail.com (R. Giaconia); of the works according to their individual giuseppe.lore@unipa.it (G. Lo Re); knowledge (like a guided tour would do). marco.morana@unipa.it (M. Morana) Artificial Intelligence (AI) methods provide © 2021 Copyright for this paper by its authors. Use permit- an invaluable help to realize these services, ted under Creative Commons License Attribution 4.0 Inter- national (CC BY 4.0). while also being non-invasive and ensuring CEUR http://ceur-ws.org CEUR Workshop Proceedings the visitor’s freedom to follow or not the (CEUR-WS.org) Workshop ISSN 1613-0073 Proceedings
suggestions provided. This last aspect is of the various illustrated scenes and AI tech- fundamental to any system, since it certainly niques. In particular, a Convolutional Neural makes the exposition more appealing and Network (CNN) is employed to synthesize captivating to visitors. a set of features from a given picture in For these reasons, modern museums have input, and a distance-based classification recently started a process of deep trans- algorithm is used for the final inference. formation by developing new interactive Such method normally has low accuracy on interfaces and public spaces to meet the chal- single images, so the proposed solution relies lenges raised by the technological revolution on the contextual classification of multiple of the last years. Moreover, it should not be images. This kind of expedient exploits the ignored that in cultural sites a broad range overall characteristics of the paintings and of visitors can be found, from the youngest has not been researched by other works in to the oldest. The younger generations the literature, since most image recognition are accustomed to the new technologies, algorithms usually try to recognize generic while others might feel alienated in a smart objects inside artworks. environment that encourages them to use The remainder of the paper is organized their personal devices. Hence, an inclusive as follows: related work is outlined in Sec- smart museum should allow visitors to tion 2. Section 3 introduces the case study, access all of its services, working towards while Section 4 describes the proposed sys- making them easily accessible to everyone. tem as well as the algorithms behind the AI In this paper, we address a scenario in recognition modules. The experimental re- which users of a smart museum can exploit sults are shown and discussed in Section 5. ad-hoc intelligent services aimed at im- Finally, conclusions and future works follow proving their visit experience by providing in Section 6. personalized descriptions of the artworks of interest. The way in which the specific contents for 2. Related Work different users are selected [4, 5] is out the Technologies for smart environments [6, 7], scope of this work, which instead focuses and smart museums in particular, have on the intelligent system responsible for been deeply researched in recent years. As supporting the visit thanks to the use of a result, a variety of solutions have been smart mobile devices. Our case study is the proposed to address specific challenges. In Sala Magna of Palazzo Chiaramonte (also this paper, we focus on an intelligent IT known as Steri) in Palermo, Italy, which is system capable of supporting computer as- characterized by a unique wooden ceiling sisted guides. Early technologies employed from the 14th century containing a variety in museums usually consisted of recorded of paintings. Visitors can use our system descriptions, played by a small sound player to take a picture of any painting they are through headphones, which required the interested in and get the corresponding visitors to follow a predetermined tour, or to description, completely eliminating the need manually input a code for every work. More for indoor positioning technologies, which recently, researchers suggested to replace are unfeasible in our case study as they these systems with applications directly can only be used when museum items are usable through the users’ smart devices. physically separated. In this context, in order to make the apps Our solution exploits visual information
easier to use, several works focused on the describes a SIFT-based artwork recognition specific issue such as activity recognition [8] subsystem that operates in two steps: at or indoor positioning, which allow to detect first the images captured from the camera the user’s movements and position within are pre-processed to remove blurred frames; the museum so as to provide him/her with then, the SIFT features are extracted and ad-hoc information (e.g., a description of the classified. Moreover, authors exploit the items in the nearby). visitor’s location in order to select the nearby Indoor positioning systems typically artworks, so greatly reducing the computa- exploit visitor’s smart devices in order to tional effort of the matching process while interact with Bluetooth Low Energy (BLE) at the same time increasing the recognition beacons, sensor networks [9], or WiFi accuracy. infrastructure. Although these technologies Despite having similar performances are becoming increasingly accurate, energy to SIFT, Convolutional Neural Networks efficient and affordable [10], there are some are more suited for large scale Content types of exhibitions in which different items Based Image Retrieval (CBIR), and they are are necessarily placed close to each other, generally faster to extract features from an so making the positioning system unable to input [18]. An enhanced museum guid- identify the real interests of the user. ance system based on mobile phones and In this paper, we address this scenario on-device object recognition was proposed and present a different approach that can be in [19]; here, given the limited performance deployed in a variety of exhibitions, allowing of the devices [20], the recognition was smart touring where indoor positioning is performed by a single layer perceptron not feasible. neural network. Such a solution can be used The issue of uniquely referring to items together with indoor positioning, but it is placed close to each other is frequently outdated, as smart devices are now capable addressed by means of Quick Response (QR) of running larger neural networks, and codes that identify each single work of art. CNNs should be preferred for large scale The approach described in [11], for instance, CBIR. In [21], the authors rely on the CNN exploits mobile phone apps and QR codes to extract relevant features of a specific to enable smart visits of a museum. Once artwork, while the classification is treated as an item has been identified by means of a regression problem. CNNs are particularly its QR code, the visitors are provided with suitable to synthesize a small set of values an augmented description, including text, (features) from a query image, which can images, sounds, and videos. This system, as then be compared to a database of examples well as many others in the literature [12, 13], in order to find images with similar contents. is extremely easy to use, although the users This enables fast image classification within generally prefer to recognize the artwork a restricted dataset, even using a pre-trained itself rather than QR codes or numbered network for inference. codes, as discussed in [14, 15]. To this aim, More recent works are exploring the pos- various image processing algorithms and sibility of using CNNs in combination with methods have been proposed for this kind other well-known techniques. For example, of task, including Scale-Invariant Feature [22] discusses two novel hybrid recognition Transform (SIFT) and its faster but less systems that combine Bags of Visual Word accurate counterpart, Speeded Up Robust and CNNs in a cooperative and synergistic Features (SURF) [16]. For example, [17] way in order to achieve better content recog-
Figure 1: Part of the wooden ceiling of Palazzo Chiaramonte. nition performances in a real museum. allowing them to plan the tours according to their personal needs. In the scenario addressed in this paper, 3. Application Scenario visitors are interested in knowing the stories painted on the various parts of the ceiling, The system presented here has been designed stories that are very close to each other and with the aim of providing a non-invasive hardly distinguishable. Even with the assis- solution to enjoy Palazzo Chiaramonte in tance of the tour guides or other tools (such Palermo, Italy. as QR codes), this characteristic makes it One of the worth noting place to visit in very difficult for the visitors to find the scene Palazzo Chiaramonte is the Sala Magna with of interest, unless they manually count the its 14th-century wooden ceiling (see Fig. 1), beams. Our idea is to let visitors exploit their which measures 23 × 8 meters and is com- smart devices to locate a specific scene, select posed of 24 beams, parallel to the short sides it, and obtain an augmented description, e.g., of the hall, perpendicularly divided by a long by means of 3D images, stories or videos fake beam. On all sides of the beams and telling of the scene. Moreover, once an item ceiling coffers, various scenes are illustrated, has been recognized, descriptions can be some telling mythological and hunting sto- easily transferred to a “totem” located in ries, others with plants and patterns. the Sala Magna, that is a smart touchscreen The large number of visitors the Sala enabling users to enjoy the scenes of interest Magna receives every year, each of whom in a more comfortable way. This is very surely owns a smart device, and the many important, for instance, to people that are frescoes present in the hall make this place not accustomed to prolonged interactions a perfect scenario to test intelligent ap- with small smart devices [14]. plications aiming to improve the visitors’ Such an alternative is also very useful for experience [23] during the tour. For example, visiting groups, as it both allows individual information gathered from personal devices touring on smartphones and tour guides’ pre- may be used to alert the visitors about the sentations at the multimedia totems. influx of crowds for a specific artwork, so
CLIENT SERVER Take photo Crops generation Triplets extraction Features extraction Classification Select pixel Figure 2: Frescoes classification procedure. The user points its device to a painting, touching the screen to precisely locate it. The client application creates a set of crops and sends it to the server. Crops are then classified. 4. System Overview 4.1. Crops Generation The recognition system is based on a client- Visitors select a scene of interest by pointing server architecture in which two different a specific region (e.g., object/item) using the kinds of clients are available. The first is touchscreen of the smart device. The picture the totem (i.e. the touch-screen monitor), is then decomposed in two sets of crops of where visitors can browse the artworks different resolutions, namely 512 x 512 and and their descriptions in a very clear way. 336 x 336 pixels. Each set consists of five re- The second client is a mobile app that runs gions, = { , , , , }, the first centered on the visitors’ personal smart devices, (C) on the pixel chosen by the user, the others as well as other smartphones and tablets selected at the left (L), right (R), top (T), and provided by the museum. In addition to the bottom (B) of the central one. functionalities provided by the monitor, the While the center crops might seem suffi- mobile software application includes a scene cient to classify the location selected by the recognition service that can be also used in visitor, using adjacent crops and different combination with the other kind of client, resolutions allows to increase the recogni- i.e., sharing the identifier of the recognized tion performances, as it will be discussed scene with the touch-screen totem. This in Section 5.2. The obtained crops are sent enables visitors to select an object of interest to the server for the last three steps, i.e. through the mobile devices and then show triplets extraction, feature extraction, and the the results in the totem, thus enhancing classification tasks. their experience within the museum. This architecture is much lighter than three-level 4.2. Triplets and Feature solutions based, for instance, on the fog Extraction paradigm [24, 25] and it is adequate for the purposes of the proposed application. In order to obtain a compact representa- In the following of this section we tion of input data, the server groups the present the main phases of the recognition crops into two horizontal/vertical triplets procedure, summarized in Fig. 2. 1 = ( , , ) and 2 = ( , , ), and builds feature vectors by using a convolu- tional neural network. The adoption of CNN in our system is justified by the intrinsic
nature of this category of neural networks, feature, the system leverages on a Convolu- which are specialized in processing data that tional Neural Network in which the last two has a grid-like topology, such as an image. A dense layers and the soft-max function were CNN is typically made up of three different discarded from the network. The underlying kinds of layers, named convolutional, pooling, idea is to describe the content and shapes of and fully-connected layers [26]. the graphical content of the crops with a high The convolutional layer aims to synthesize level of abstraction, so that it is possible to the spatial relationship between pixels of the make a comparison by only using the feature input image, without losing features which vectors. are critical for getting a good prediction. This To be more specific, the network model we layer uses a combination of linear and non- adopted (refers to Fig. 3) is made of 13 convo- linear operations, i.e., convolution operation lution layers, with 3-by-3 kernel size filters, and activation function. The convolution is each one using a ReLU activation function. It an element-wise product between the input also employs 5 pooling layers, specifically 2- image and a kernel, and its output is a fea- by-2 max-pooling, to achieve downsampling. ture map containing different characteristics The original network [28] uses 3 dense lay- of the input image. More are the kernels used ers and a final soft-max function, specifically during the analysis, more feature maps are targeted towards the 1000 classes of the Ima- generated. The feature maps are then evalu- geNet dataset; our network only uses the first ated by means of a nonlinear activation func- dense layer, so the output of our network is tion, such as the sigmoid, hyperbolic, or rec- a set of 4096 features. tified linear unit (ReLU) functions. The pooling layer performs a downsam- 4.3. Classification pling operation with the aim to reduce the spatial dimensionality of the feature maps The complete classification process consists generated at the previous layer and, at the in a two steps procedure. At first, the same time, to extract dominant features 4096-elements feature vectors obtained invariant in rotation and position. One of from each crop in the triplets are classified the most adopted operation at this stage is according to a minimum distance ap- the max pooling, which applies a filter of size proach [29]. Generally, given a training set × to the feature maps, and extracts the = { 1 , 2 , … , } and the corresponding maximum value for each of them. label set = { 1 , 2 , … , }, a new point ∗ Finally, the pooling layer output is trans- of unknown class ∗ is classified to ∈ formed in a one-dimensional array and if the distance measure ( ∗ , ), ∈ , is mapped by a subset of fully-connected smaller than those to all other points into layers to the final outputs of the network. the training set: Hence, these layers return class scores for classification and regression purposes, ∗ = if ( ∗ , ) < ( ∗ , ), using the same principles of the traditional (1) ≠ , = 1, … , . Multi-Layer Perceptron neural network (MLP). Here, each feature vector associated with the If such a layer is not included into the neu- crop in is classified as belonging to the class ral network, then the CNN can be used to ∈ by using the Frobenius distance [30]; extract a set of features from the input im- thus, the output of the first classification step age [27]. As our goal is only to extract the
INPUT STACK 1 Convolution + ReLU STACK 2 Max Pooling STACK 3 Fully Connected STACK 4 STACK 5 STACK 6 OUTPUT 64 kernels of size 3x3 128 kernels of size 3x3 256 kernels of size 3x3 512 kernels of size 3x3 512 kernels of size 3x3 4096 neurons 64 kernels of size 3x3 128 kernels of size 3x3 256 kernels of size 3x3 512 kernels of size 3x3 512 kernels of size 3x3 2x2 window with stride 1 2x2 window with stride 1 256 kernels of size 3x3 512 kernels of size 3x3 512 kernels of size 3x3 2x2 window with stride 1 2x2 window with stride 1 2x2 window with stride 1 Figure 3: The CNN used for the feature extraction. Table 1 Devices used for the experiments. Model Camera CPU RAM Nokia 7 Plus 12 Mp + 13 Mp Qualcomm Snapdragon 660 4 GB Apple iPhone7 12 Mp + 12 Mp Apple A10 Fusion 3 GB Samsung Galaxy A50 25 Mp + 8 Mp + 5 Mp Exynos 9610 6 GB Samsung Galaxy Tab A10.1 8 Mp Exynos 7904 3 GB is represented by two new triplets 1 and 2 but rather improves the system precision. containing the predicted class for each crop. The second phase aims to evaluate if crops in a triplet were classified as depicting the 5. Experimental Evaluation same object. To accomplish such a task, we The effectiveness of the proposed solution introduced the concepts of strong confidence was evaluated through several experiments and a weak confidence for classification. The focused on the case study described in first one is achieved when every element of Section 3. the triplet is associated with the same object; the latter occurs when only two elements are associated with the same class. 5.1. Experimental Setup Firstly, the system checks for a strong The experiments were carried out using confidence in any of the triplets and, if none three different models of smartphones, and is found, it tries for the weak confidence. If one tablet (see Table 1) provided with the none of the triplets achieve strong or weak client software application. The mobile app confidences, the visitor is asked to take a (supporting both Android and iOS systems) new picture of the artwork. This process supports visitors during all the recognition is performed in near real-time, therefore it phases described so far, i.e., it allows to does not slow down the visiting experience, observe and select a scene in the wooden
Titolo: Titolo: Storia non identificata con ripetuto Storia non identificata con ripetuto omaggio di cavaliere a dama con omaggio di cavaliere a dama con turbante turbante Storia: Descrizione: Descrizione: Susanna e i Vecchioni 1. Con lo sfondo di un vasto 1. Con lo sfondo di un vasto paesaggio con cespi di vite e una paesaggio con cespi di vite e una Scena: splendida farfalla, una dama con splendida farfalla, una dama con Celebrazione dell’atto giudiziario elegante acconciatura di gusto elegante acconciatura di gusto moresco è seduta a colloquio con moresco è seduta a colloquio con Descrizione: un giovane. 2. La dama in piedi si un giovane. 2. La dama in piedi si Susanna, in ginocchio e bloccata accomiata dal giovane che adesso accomiata dal giovane che adesso da due sgherri, espone la sua è armato e in ginocchio. 3. La è armato e in ginocchio. 3. La difesa mentre viene giudicata dai dama maneggia una gabbia dentro dama maneggia una gabbia dentro vecchioni, seduti sulla sinistra uno la quale è racchiuso un coniglio. la quale è racchiuso un coniglio. accanto all’altro. Figure 4: Interface of the smartphone-side application. From left to right: the log-in screen, the camera screen, and an example page describing a scene. ceiling, automatically extracts the crops ered 100 different relevant locations within from it and sends them to the classification the ceiling, thus the number of images server. Moreover, the software application is obtained from each device is 1500. also able to manage information received by The number of locations is calculated by the server, thus enabling visitors to read the taking into account the specific structure of description of the scene in the device itself our case study. The ceiling is made of 24 or on the touch-screen monitor. beams, each of which is divided in two parts Fig. 4 shows three examples of the by a central beam; on each side, we defined smartphone-side application. The leftmost two locations: one for the side of the beam image represents the interface visitors can facing East, and one for that facing West (i.e. use to login to the service. The image 4 locations for each beam). In addition to in the center is the main interface of the these 96 classes, the East and West walls of application that allows visitors to zoom the hall also have paintings similar to the in/click on a detail of the scene of interest sides of the beams, so 4 further locations and stars the remote recognition process. were considered. Finally, the rightmost image shows infor- Early experiments were conducted by mation provided to users for the recognized randomly splitting such a dataset into element; in particular, in addition to the title training and testing sets; we will refer to and the description of the scene, the visitors this case as mixed dataset. Then, other tests are provided with high resolution pictures were performed by dividing the dataset so of the details of interest that are difficult to that the training and testing sets contained distinguish by standing at a great distance images acquired from different cameras. from the ceiling. This separate dataset is closer to the ap- While the CNN is pre-trained on Ima- plication scenario because every visitor geNet, the dataset to train the classification will use devices with camera settings and algorithm and perform the experiments was characteristics that might differ from the captured using the devices listed in Table 1. ones used to train the system. Each class in the dataset corresponds to a part of the ceiling, captured in three pictures 5.2. Recognition Results taken from different positions (Fig. 5), for each of which five regions of interest have The first set of experiments aimed to assess been manually selected (Fig. 6). We consid- the system performance when considering
Figure 5: Images of the same scene at different distances. gle crop classification by using the separate dataset. Results from Fig 7-b show a simi- lar trend as the train-to-test ratio increases, but also highlight a significantly lower mean accuracy, thus demonstrating the inadequacy of a single crop to drive the classification pro- cess. The next set of experiments concerns the evaluation of the proposed three crops classi- fication system, in which two different clas- sification confidence settings are introduced, Figure 6: Example of manual cropping of the namely weak and strong. photo to create the dataset. Performances were evaluated both in terms of accuracy and percentage of crop discarded because they have not reached a the recognition of a single (central) crop weak or strong classification confidence. It from the mixed dataset. The mean accuracy is worth noting that discarded images imply achieved in this case is shown in Fig 7-a, that visitors would be asked to take the where each bar represents a different ratio photos again; thus, the lower this value, the of training and testing samples. Since higher the usability of the system. our classifier has to be able to distinguish Fig. 8 shows the result obtained on the between 100 classes (scenes), results indicate separate dataset. By observing the mean that a larger number of training samples accuracy values (Fig. 8-a) we can notice are required in order to obtain satisfactory a significant improvement as compared accuracy values. In this case, images for to the single crop classification (Fig 7-b). both the training and the testing sets were Unfortunately, Fig. 8-b indicates that as the captured by using the same devices, which number of samples in the training set varies, is not representative of a real scenario the number of discarded crops is stably high. that involves hundreds of testing devices This is mainly due to not having enough equipped with different cameras. images in the dataset. For this reason, the For this reason, we also evaluated the sin- next set of experiments aimed to evaluate the impact of data augmentation. This technique
100 100 80 80 Accuracy (%) Accuracy (%) 60 60 40 40 20 20 0 0 0 2 1:8 1:6 1:4 1:3 1:2 1:8 1:4 1:2 2:3 1:1 3:2 2:1 4:1 8:1 1:2 1:1 Train-to-test ratio Train-to-test ratio (a) (b) Figure 7: Single crop classification accuracy on (a) mixed and (b) separate datasets. 100 100 80 Discarded crops (%) 80 Accuracy (%) 60 60 40 40 20 20 0 0 1:6 1:4 1:2 1:1 2:1 4:1 6:1 1:6 1:4 1:2 1:1 2:1 4:1 6:1 Train-to-test ratio Train-to-test ratio (a) (b) Figure 8: (a) Accuracy and (b) percentage of discarded crops for weak (solid bars) and strong (hatched bars) classifications when using the separate dataset. is used to artificially increase the number tion improved less, but still in a noticeable of samples in the training set in order to way: from 78% accuracy and 80% discarded extract additional information [31]. In our queries, to 88% accuracy and 68% discarded system, data augmentation is performed by queries. This confirms that data augmen- creating crops of different resolutions of the tation enhances the performances of the original locations of interest so as to obtain classifier without requiring the involvement new samples. of new capturing devices. Results in Fig. 9 show that data augmen- The last set of experiments was aimed to tation causes an increase in accuracy and a assess the classification procedure that will decrease in the number of discarded crops. be actually performed by the smart museum Weak classification best results improved application: instead of using only one triplet from 56% accuracy and 47% discarded of crops, users’ devices will send to the clas- queries, to 69% accuracy and 41% discarded sification server all ten crops introduced in queries respectively. The strong classifica- Section 4.1, divided in 4 horizontal/vertical
100 100 80 80 Discarded crops (%) Accuracy (%) 60 60 40 40 20 20 0 0 2:1 3:1 4:1 5:1 6:1 7:1 8:1 2:1 3:1 4:1 5:1 6:1 7:1 8:1 Train-to-test ratio Train-to-test ratio (a) (b) Figure 9: (a) Accuracy and (b) percentage of discarded crops for weak (solid bars) and strong (hatched bars) classifications when using the data augmentation on the separate dataset. 100 6. Conclusions Accuracy and Discarded (%) 80 In this paper, we presented an intelligent sys- 60 tem that exploits personal smart devices to 40 support visitors during their tours within a museum. Differently from other works in the 20 literature, the system completely eliminates 0 the need for indoor positioning technologies, which might not be suitable for some kinds of 5:1 :1 :1 :1 :1 :1 :1 :1 10 15 20 30 35 40 45 Train-to-test ratio expositions, such as our case study. In fact, the stories represented in the ceiling of the Figure 10: Accuracy (blue) and percentage of dis- carded crops (white) of the final classification pro- Sala Magna of Palazzo Steri are composed of cedure. a single artwork containing multiple scenes that cannot be physically distinguished, thus making indoor technologies unfeasible. triplets. These are evaluated in terms of Thanks to the camera of their smart de- strong and weak confidence, and the input vice, visitors are able to get the description is discarded if neither is obtained. Results for anything they are interested in by sim- in Fig. 10 indicate that by applying this ply pointing the device and selecting a re- criterion, the number of discarded samples is gion of interest. Features are extracted and significantly lower than in previous experi- sent to a remote server in order to recognize ments; in particular, considering train-to-test the specific scene, and send back the related ratios in the same range as Fig. 9, we can information. Moreover, the mobile app run- notice that only 10% − 20% are rejected while ning on the client is capable of communicat- achieving an accuracy of 80% − 90%. ing with a touch-screen totem, so visitors can read and listen artworks’ information in the way they prefer basing on their technology background. As regards the recognition process, the
adoption of CNNs allows to extract features phones, ACM Trans. Interact. Intell. from 10 different regions of the photo taken Syst. 3 (2014). doi:10.1145/2499669. by a visitor, taking advantage of the shape [4] V. Agate, P. Ferraro, S. Gaglio, G. Lo Re, of the items in our specific scenario. Exper- M. Morana, Vasari project: a recom- imental results showed the performance of mendation system for cultural heritage, the recognition system in terms of accuracy Proc. of the 5th Italian Conference on and percentage of discarded crops, thus ICT for Smart Cities And Communities proving its effectiveness in a real-world (I-CiTies 2019) (2019) 1–3. application scenario. [5] A. De Paola, A. Giammanco, G. Lo Re, The system will be soon deployed to M. Morana, Vasari project: Blended support the visitors of Palazzo Chiaramonte. recommendation for cultural heritage, This will enable us to collect a greater Proc. of the 6th Italian Conference on number of query examples (captured from a ICT for Smart Cities And Communities wide range of devices), which could also be (I-CiTies 2020) (2019) 1–3. exploited to further refine the model. [6] A. De Paola, P. Ferraro, S. Gaglio, G. Lo Re, M. Morana, M. Ortolani, D. Peri, An ambient intelligence system for assisted 7. Acknowledgments living, in: 2017 AEIT International An- nual Conference, volume 2017-January, This research is partially funded by the 2017, pp. 1–6. doi:10.23919/AEIT. Project VASARI of Italian MIUR (PNR 2017.8240559. 2015-2020, DD MIUR n. 2511). [7] A. De Paola, P. Ferraro, G. Lo Re, M. Morana, M. Ortolani, A fog- References based hybrid intelligent system for energy saving in smart buildings, [1] V. Agate, F. Concone, P. Ferraro, Journal of Ambient Intelligence Wip: Smart services for an augmented and Humanized Computing 11 campus, in: 2018 IEEE Interna- (2020) 2793–2807. doi:10.1007/ tional Conference on Smart Computing s12652-019-01375-2. (SMARTCOMP), IEEE, Piscataway, NJ, [8] F. Concone, S. Gaglio, G. Lo Re, USA, 2018, pp. 276–278. doi:10.1109/ M. Morana, Smartphone data analysis SMARTCOMP.2018.00056. for human activity recognition, Lecture [2] S. Gaglio, G. Lo Re, M. Morana, Notes in Computer Science 10640 C. Ruocco, Smart assistance for stu- LNAI (2017) 58–71. doi:10.1007/ dents and people living in a campus, 978-3-319-70169-1_5. in: 2019 IEEE International Conference [9] A. De Paola, A. Farruggia, S. Gaglio, on Smart Computing (SMARTCOMP), G. Lo Re, M. Ortolani, Exploiting the Piscataway, NJ, USA, 2019, pp. 132– human factor in a wsn-based system 137. doi:10.1109/SMARTCOMP.2019. for ambient intelligence, in: Com- 00042. plex, Intelligent and Software Inten- [3] I. Apostolopoulos, N. Fallah, E. Folmer, sive Systems, 2009. CISIS ’09. Interna- K. E. Bekris, Integrated online lo- tional Conference on, 2009, pp. 748– calization and navigation for people 753. doi:10.1109/CISIS.2009.48. with visual impairments using smart [10] M. Majd, R. Safabakhsh, Impact of machine learning on improvement of
user experience in museums, in: 2017 age and Music, Springer Berlin Heidel- Artificial Intelligence and Signal Pro- berg, Berlin, Heidelberg, 2010, pp. 170– cessing Conference (AISP), IEEE, Pis- 183. cataway, NJ, USA, 2017, pp. 195–200. [17] S. Alletto, R. Cucchiara, G. Del Fiore, doi:10.1109/AISP.2017.8324080. L. Mainetti, V. Mighali, L. Patrono, [11] T. Octavia, A. Handojo, W. T. KUSUMA, G. Serra, An indoor location-aware T. C. YUNANTO, R. L. THIOSDOR, system for an iot-based smart mu- et al., Museum interactive edutainment seum, IEEE Internet of Things Journal using mobile phone and qr code, vol- 3 (2016) 244–253. doi:10.1109/JIOT. ume 15-17 June, 2019, pp. 815–819. 2015.2506258. [12] M. S. Patil, M. S. Limbekar, M. A. Mane, [18] V. D. Sachdeva, J. Baber, M. Bakht- M. N. Potnis, Smart guide–an approach yar, I. Ullah, W. Noor, A. Basit, Per- to the smart museum using android, In- formance evaluation of sift and con- ternational Research Journal of Engi- volutional neural network for image neering and Technology 5 (2018). retrieval, Performance Evaluation 8 [13] S. Ali, B. Koleva, B. Bedwell, S. Ben- (2017). ford, Deepening visitor engagement [19] P. Föckler, T. Zeidler, B. Brombach, with museum exhibits through hand- E. Bruns, O. Bimber, Phoneguide: Mu- crafted visual markers, in: Proceedings seum guidance supported by on-device of the 2018 Designing Interactive Sys- object recognition on mobile phones, tems Conference, DIS ’18, Association in: Proceedings of the 4th Interna- for Computing Machinery, New York, tional Conference on Mobile and Ubiq- NY, USA, 2018, p. 523–534. doi:10. uitous Multimedia, MUM ’05, Associ- 1145/3196709.3196786. ation for Computing Machinery, New [14] L. Wein, Visual recognition in mu- York, NY, USA, 2005, p. 3–10. doi:10. seum guide apps: Do visitors want it?, 1145/1149488.1149490. in: Proceedings of the SIGCHI Con- [20] S. Gaglio, G. Lo Re, G. Martorella, ference on Human Factors in Comput- D. Peri, Dc4cd: A platform for dis- ing Systems, CHI ’14, Association for tributed computing on constrained de- Computing Machinery, New York, NY, vices, ACM Transactions on Embedded USA, 2014, p. 635–638. doi:10.1145/ Computing Systems 17 (2017). doi:10. 2556288.2557270. 1145/3105923. [15] M. K. Schultz, A case study on the [21] G. Taverriti, S. Lombini, L. Seidenari, appropriateness of using quick re- M. Bertini, A. Del Bimbo, Real-time sponse (qr) codes in libraries and wearable computer vision system for museums, Library & Information improved museum experience, in: Pro- Science Research 35 (2013) 207 – 215. ceedings of the 24th ACM International doi:https://doi.org/10.1016/j. Conference on Multimedia, MM ’16, lisr.2013.03.002. Association for Computing Machinery, [16] B. Ruf, E. Kokiopoulou, M. Detyniecki, New York, NY, USA, 2016, p. 703–704. Mobile museum guide based on fast doi:10.1145/2964284.2973813. sift recognition, in: M. Detyniecki, [22] G. Ioannakis, L. Bampis, A. Kout- U. Leiner, A. Nürnberger (Eds.), Adap- soudis, Exploiting artificial intelligence tive Multimedia Retrieval. Identifying, for digitally enriched museum visits, Summarizing, and Recommending Im- Journal of Cultural Heritage 42 (2020)
171 – 180. doi:https://doi.org/10. deep learning in image classification 1016/j.culher.2019.07.019. problem, in: 2018 International Inter- [23] G. Lo Re, M. Morana, M. Ortolani, disciplinary PhD Workshop (IIPhDW), Improving user experience via motion IEEE, Piscataway, NJ, USA, 2018, sensors in an ambient intelligence sce- pp. 117–122. doi:10.1109/IIPHDW. nario, 2013, pp. 29–34. 2018.8388338. [24] F. Concone, G. Lo Re, M. Morana, A fog-based application for human activ- ity recognition using personal smart de- vices, ACM Transactions on Internet Technology 19 (2019). doi:10.1145/ 3266142. [25] F. Concone, G. Lo Re, M. Morana, Smcp: a secure mobile crowdsens- ing protocol for fog-based applica- tions, Human-centric Computing and Information Sciences 10 (2020). doi:10. 1186/s13673-020-00232-y. [26] R. Yamashita, M. Nishio, R. K. G. Do, K. Togashi, Convolutional neural net- works: an overview and application in radiology, Insights into imaging 9 (2018) 611–629. [27] T. Bluche, H. Ney, C. Kermorvant, Feature extraction with convolutional neural networks for handwritten word recognition, in: 2013 12th Interna- tional Conference on Document Analy- sis and Recognition, IEEE, IEEE, Piscat- away, NJ, USA, 2013, pp. 285–289. [28] K. Simonyan, A. Zisserman, Very deep convolutional networks for large- scale image recognition, arXiv preprint arXiv:1409.1556 (2014). [29] P. Kamavisdar, S. Saluja, S. Agrawal, A survey on image classification ap- proaches and techniques, International Journal of Advanced Research in Com- puter and Communication Engineering 2 (2013) 1005–1009. [30] G. H. Golub, et al., Cf vanloan, ma- trix computations, The Johns Hopkins (1996). [31] A. Mikołajczyk, M. Grochowski, Data augmentation for improving
You can also read