Adaptive User Profiling in E-Commerce and Administration of Public Services
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
future internet Article Adaptive User Profiling in E-Commerce and Administration of Public Services Kleanthis G. Gatziolis, Nikolaos D. Tselikas and Ioannis D. Moscholios * Department of Informatics and Telecommunications, University of Peloponnese, 221 00 Tripoli, Greece; kgatziol@uop.gr (K.G.G.); ntsel@uop.gr (N.D.T.) * Correspondence: idm@uop.gr Abstract: The World Wide Web is evolving rapidly, and the Internet is now accessible to millions of users, providing them with the means to access a wealth of information, entertainment and e-commerce opportunities. Web browsing is largely impersonal and anonymous, and because of the large population that uses it, it is difficult to separate and categorize users according to their preferences. One solution to this problem is to create a web-platform that acts as a middleware between end users and the web, in order to analyze the data that is available to them. The method by which user information is collected and sorted according to preference is called ‘user profiling‘. These profiles could be enriched using neural networks. In this article, we present our implementation of an online profiling mechanism in a virtual e-shop and how neural networks could be used to predict the characteristics of new users. The major contribution of this article is to outline the way our online profiles could be beneficial both to customers and stores. When shopping at a traditional physical store, real time targeted “personalized” advertisements can be delivered directly to the mobile devices of consumers while they are walking around the stores next to specific products, which match their buying habits. Keywords: user profiling; e-commerce; retailing; e-shopping; mobile shopping; analytics; neural Citation: Gatziolis, K.G.; Tselikas, networks; public e-governance N.D.; Moscholios, I.D. Adaptive User Profiling in E-Commerce and Administration of Public Services. Future Internet 2022, 14, 144. https:// doi.org/10.3390/fi14050144 1. Introduction The Internet today is a technological and social phenomenon. It affects everyone’s Academic Editors: Incheon Paik and daily life and has had significant social impacts. Huge amounts of data and information B. T. G. Samantha Kumara are being uploaded to the internet every day. Businesses want to maximize their profits Received: 6 April 2022 by advertising their services or products to targeted customers, while Internet users want Accepted: 4 May 2022 to avoid receiving irrelevant information from Internet search results. It is necessary to Published: 9 May 2022 predict users’ needs to improve their browsing experience and provide them with valuable Publisher’s Note: MDPI stays neutral data. The solution to both problems described above is web personalization via user with regard to jurisdictional claims in profiling [1–3]. published maps and institutional affil- A User Profile is a group of items and/or patterns used to describe the user briefly. iations. User Profiling is an especially critical procedure for e-business systems that captures online users’ attributes, knows online users, provides tailor-made goods and services, and therefore improves user satisfaction. To conduct our research, we contacted the major superstores in Greece, asking for Copyright: © 2022 by the authors. information on the way they have created their online user profiles. Our results show Licensee MDPI, Basel, Switzerland. that while stores do allow users to register and create new profiles, there are times when This article is an open access article customers provide false data. This problem can occur when no online verification process distributed under the terms and is in place. So, a question we must investigate is: which registered customers are supplying conditions of the Creative Commons accurate online information? Attribution (CC BY) license (https:// “User profiling techniques have widely been applied in various e-business appli- creativecommons.org/licenses/by/ cations, e.g., online customer segmentation, web user identification, adaptive web site, 4.0/). Future Internet 2022, 14, 144. https://doi.org/10.3390/fi14050144 https://www.mdpi.com/journal/futureinternet
Future Internet 2022, 14, 144 2 of 24 fraud/intrusion detection, personalization, e-market analysis, recommendation, as well as personalized information retrieval and filtering” [4]. User Profiling can be defined as the course of pinpointing the data about a user interest domain [5,6]. This data can be used by the system to grasp more about the user and be further utilized to better meet the user’s needs. In this article, we propose the implementation of an online profiling mechanism in a virtual e-shop, its success rates, and how neural networks could be used to predict the characteristics of new users. We also indicate the way our online profiles could be of benefit both to customers and stores through real time “personalized” advertisements targeted at customers shopping in physical stores. The proposal of this article is significant since it could redefine the way we shop at physical stores. If the real online profiles of the consumers are known, then we could use them to promote in real time, specific products to certain customers while shopping. A lot of research has already been conducted both on the techniques of user profiling in online shops and the techniques of user profiling in physical shopping, so the main objective of this article is to fill in this research gap by joining these approaches in order to increase the profits of businesses and the affordability for customers through personalized price offers. The rest of this paper is organized as follows. Section 2 reviews some related work and introduces the theoretical basis. Section 3 describes our proposed model, and Section 4 describes the experimental setup as well as the results. Finally, Section 5 concludes the paper. 2. Related Work 2.1. User Profiling A user profile is a visual representation of the personal data associated with a par- ticular user, or a customized interface [7]. That is, a profile is the digital representation of an individual’s identity. However, it can also be considered as the representation of a user model. A profile stores the description and characteristics of the individual it represents. These facts can be utilized by various systems that take into account people’s attributes and preferences. This is why profiles are essential for a modern system, as the information found in the profile is personalized, thus enabling us to distinguish and group them. There are two phases which allow us to acquire the user profile. In the first phase, the user is asked explicitly to insert his/her initial profile as a goal. He/she can also amend the profile by hand. Users may not be able to enumerate all their interests at once. So, their browsing history is used to update their profile. The second phase (user profile acquisition) monitors the browsing behavior of the user, and through the scheme of content analysis, the data of the user’s interest are successively acquired. The information contained in a profile can be either dynamic or static. In the first case, the profile is called dynamic, and this means that the information can change over time [8,9]. These changes usually occur depending on the actions that the user takes in the system and usually they cannot use or make changes to this information. In contrast, in the second case, where the profile is called static, the information in the profile remains constant for a long period of time and it rarely changes [8,9]. Such a profile will contain mainly demographic notes about the user, such as name, age, height, etc. In many systems, a combination of the advantages of static and dynamic can be observed, thus making the profile hybrid [5,10]. Profiles can be found in operating systems, computer programs, recommendation systems, computer games, etc. [11]. 2.2. Profile Structure According to the previous description referring to the characteristics of the user’s profile, we can divide the profile into subcategories, namely, the basic and the extended profile, respectively [12]. The virtual identity is the first thing that the user selects, and it refers to the user’s ID. This identity is permanent and does not change, whereas it is the user’s choice whether he wants a pseudonym or his real identity. The basic profile is the
Future Internet 2022, 14, 144 3 of 24 one containing the user’s very basic information (demographic data) and can usually be altered, although rarely, in accordance with the user’s needs. The extended profile contains information that changes over time and is not specified when the profile is created. The information can be changed, or new information can be entered, making the profile dynamic. Interaction with third-party profiles and policies requires settings related to data security and user privacy as to who can use this information. As all these features form the structure of an integrated profile, there are also different profile design patterns or often a mixture of these patterns. Static models are the basic types of user profiles. In them, the main data are collected and will not change again, i.e., they are static. Changes in the user’s choices are not registered in the system and no algorithms are used to parameterize the profile. Dynamic models allow a more up-to-date representation of users. Changes are often made to them over time and through the user’s interaction with the system. These profiles are particularly useful in adaptive hypermedia as they are updated to take into account the current needs and goals of the user. Hybrid models are those that combine static and dynamic models according to the needs of the system. 2.2.1. Profile Monitoring In order to analyze a profile, it must first be extensively monitored and all the user’s actions over time must be recorded [13]. Monitoring a profile consists of three processes: - Direct monitoring of the use of the application by keeping a history of the usage pattern. - Storing the history by the system to avoid failures. - Immediate feedback on the performance of the service. Of course, this information is particularly valuable, as the risk of user privacy vio- lation is high, and therefore, this matter raises ethical and legal issues regarding privacy monitoring [14]. 2.2.2. Data Collection After having created a user profile, the next step is to collect information about the user so that it can eventually be analyzed. There are several ways to collect information about users, with some of them discussed below [15]. The easiest and quickest way to collect information is through direct user interaction with the system, where the latter is asked to answer a series of questions that will help the system “learn” about him/her. This process usually takes place during registration with the system, at which point the user is asked to fill in forms or other interfaces that serve this purpose. Usually, this is an optional type of intelligence as users may not be willing to fill out lengthy forms, and this information rarely changes over time. In general, this information is comprised of demographic details, such as the user’s age, marital status or sex. However, there are several problems with collecting information in the first way, as users may not want to provide much data, and this has led to the creation of a second way which learns the user’s preferences by observing the user interacting with the system. In this case, the system does not automatically request information about preferences from the user. Instead, it comes as the user navigates through the system and is subconsciously asked to make some decisions. Thus, the system learns dynamically from observing their interactions. For this reason, for the system to learn about a profile, the user’s behavior should be repetitive, i.e., the user’s actions should be performed under similar conditions at different points in time. There is also a third hybrid mode which is a combination of the two above [16,17]. That is, data are collected not only by asking the user to answer questions directly, but also during the user’s interaction with the system. This mode combines the advantages of the two previous ones, thus making it ideal for most profiling systems.
Future Internet 2022, 14, 144 4 of 24 Each method has its advantages and disadvantages. The first method is usually the best when data need to be collected quickly, but there are several problems. First, it lacks the ability to adapt to changes and user preferences. Secondly, it is highly dependent on the user’s willingness to provide the information and it is likely to become invalid after a period of time. Third, users may not write true information on the forms and those who are willing to provide true information may not know how to express their interests. However, users have full control over the information collected and it is their decision what they want to share with the system. In the second method, the information is gathered by observing the user’s movements in the system, so it takes more time to gather information, and this information cannot be changed or seen by the users. Moreover, if there is no repetition in the user’s actions, the pattern cannot be discovered. However, this information can be easily and automatically changed so that the system is always aware of and more accurate regarding the user’s preferences. This could be a simple case of using cookies to store and track visits from particular users, including the pages and products viewed, or it could be something more advanced such as eye movements, or even motion detection [18]. Cookies could be used to save some basic information and preferences about users, such as their individual login information or favorite sports or politics. They could also be used for personalization issues. As customers are browsing in e-shops and viewing certain items or parts of a site, cookies could be used to help build targeted ads. Finally, cookies could be used to track items users previously viewed, allowing the e-shop sites to suggest similar goods they might like and keep items in shopping carts for future reference. However, we must keep in mind that cookies have some negative aspects as well. Many users regularly delete cookies from their browsers. Others will not allow cookies to be stored on their machines for security reasons. There are some privacy aspects to be taken into consideration too. Third-party cookies are generated by websites that are different from the web pages users are currently surfing. This is because they are linked to ads via that page. An e-shop with 20 banners/advertisements may generate 20 cookies, even if users never click on those ads. These cookies could let advertisers or analytics companies track and analyze an individual’s browsing history. Finally, as mentioned in the above paragraph, we cannot store advanced information in cookies about customers such as eye movements, or even motion detection. Consequently, it is better and more secure to store user’s profiling details in a server-recommendation system. For all the above reasons, we chose for our implemented recommendation system to use cookies to store only some basic information about users such as their login data, and we keep all the important details and the analysis of the customers such as parenthood, gender, interests, etc., in our system. The hybrid method attempts to combine the advantages of the first two methods by directly asking users to provide as much information as possible, and then the system, observing their interaction, adjusts the user’s profile according to their preferences. In Table 1, a comparative list of profile types in relation to the researched literature is presented.
Future Internet 2022, 14, 144 5 of 24 Table 1. A comparative table of user profile types, in relation to the researched literature. User Profile Type Description Advantages Disadvantages Users may not want to provide much data. Data are collected quickly. It lacks the ability to adapt to changes Data gathered are of high quality. and user preferences. Direct user interaction with Usually, users enter real It is highly dependent on the user’s Explicit user the system. information when they enroll. willingness to provide profile Users manually create and fill Users have full control over the the information. in main data. information collected. Users may not write true information Users decide what they want to on the forms. share with the system. Users who are willing to provide true information may not know how to express their interests. It takes more time to gather valuable User’s information can be easily information about users. and automatically updated so that The system learns If there is no repetition in the user’s Implicit user the system is always aware and dynamically from observing actions the pattern cannot profile more accurate about user interactions. be discovered. their preferences. The information cannot be changed Minimal user effort is required. or seen by the users. Combine the previous methods and adjust the user’s Hybrid user profile Advantages of both techniques. Disadvantages of both techniques. profile according to their preferences. 2.2.3. Data Analysis Data analysis is a process for inspecting, cleaning, transforming and modeling data in order to discover information that is useful for decision making by users. Data analysis can be distinguished into several phases as shown below [19]. Data collection as presented is next to the requirements that are determined based on those that guide data analysis. Data processing includes the phases where raw information is processed and converted into information which is ready to be analyzed. This may involve entering data into rows and columns in a tabular format, such as a spreadsheet or database. Data modeling is the process wherein mathematical formulas or algorithms are applied to the data to display the relationships between variables so that the information can be ultimately visualized to be understood by the user. However, all of the above depends on the initial phase of data analysis which consists of four questions. These questions have to do with the quality of the data, the quality of the measurements, data transformation and whether the collected information meets the requirements of the survey design [20]. 2.3. User Modeling User modeling is a part of human–computer interaction and describes the process of creating and modifying a user model [21]. The main goal of user modeling is to adapt systems to the specific needs of the user. The system must appear to be built for each individual user, while it is built for hundreds of millions of users. That is, it should say “the right thing, at the right time, in the right way” [22]. User modeling consists of two main categories. The first is the user model, which is the set of information that makes up the user profile, and the second is data collection. The set of information that makes up the profile is all the data that make the profile distinct from the rest. Data collection is also a separate chapter in itself, as through it we can extend the information we have about a user either by asking the user to provide it or by tracking
Future Internet 2022, 14, 144 6 of 24 the user’s actions in the system. The latter is extremely important for a system that can adapt to the user’s needs [23]. A very simple example of user modeling is e-commerce websites that use all the information about a user’s browsing and shopping and combine it with information from other users in order to better understand their shopping preferences. Thus, the system can easily suggest possible products that may be of interest to users. Types of Data in User Models User data includes data about users’ interaction with the system [24]. Thus, each user is made according to this data and is made to stand out from the rest. The following are the types of data that can be incorporated into user models. Demographic data has information about the first name, last name, age, height, weight, gender, nationality, place of residence, etc. These data can be expanded and modified to a huge extent depending on the requirements of the application. Usually, they form the static part of the profiles as this information changes very rarely to never. By looking at these elements, we can group the users of the system according to their profile and look at their actions individually. This, again, could be useful in an e-shop system as, for example, we could look at the shopping preferences of the two genders separately. Knowledge or background data is perhaps one of the most important in user models. These data are usually not subject to frequent changes, and they are determined in the short term, thus forcing systems to be dynamic. This means that the system should understand the changes in knowledge acquired by the user by observing the user’s movement and choices in the system and adjust the data to make it more useful to the user. Interest and preference data are the most important pieces of information in systems that filter information, such as recommendation systems. However, it is usually different from demographic information, as the user does not need to be asked about it. Instead, by observing the recurring patterns in users’ actions, an ideal system could infer the user’s interests on its own. The user’s individual traits are the set of user characteristics (extrovert, reactive, etc.) that are not subject to any change or that change over a long period of time. That is why many such systems with this kind of information can be static. Examples of such systems are specially designed psychological tests. As before, this information differs from demographic information, as here too it is particularly important to observe recurring patterns in the actions of users. 2.4. Uses of User Model Data We have analyzed the profiles and the information that populates them. A modern profile should have information that has been gathered either dynamically or statically and this information should form a personalized profile of the user. Once a system has gathered information about users, it can begin to present the data or even use it to its advantage. Profiling can be used, with many important benefits, in several applications, some of which are presented below. 2.4.1. Experienced Systems Experiential systems are computer systems that can mimic human decision-making to help solve a problem in a particular area. These systems work by asking questions step by step to pin down the issues that come up and find solutions [25]. User models can be used to comply with the user’s current knowledge and differentiate between experienced and novice users. The system is able to conclude that skillful users are in a better position to understand more complex queries than someone who is new to the domain. Thus, it adapts its vocabulary and the queries it uses to find a solution.
by step to pin down the issues that come up and find solutions [25]. User models can be used to comply with the user’s current knowledge and differentiate between experienced and novice users. The system is able to conclude that skillful users are in a better position to understand more complex queries than someone who is new to the domain. Thus, it Future Internet 2022, 14, 144 adapts its vocabulary and the queries it uses to find a solution. 7 of 24 2.4.2. Recommendation Systems 2.4.2. Recommendation Recommendationsystems Systemsare application tools and techniques that give suggestions for objects that a user might Recommendation systems want are to use. These application recommendations tools and techniques may be decisions that give that suggestions the user wants to make, such as: which is the best purchase, what for objects that a user might want to use. These recommendations may be decisions that kind of music he/she would the userlike wantsto listen to make,to, orsuch whatas:news whichto read is the[26]. best purchase, what kind of music he/she would like to listen to, or what news to read of The basic idea is to present a selection items that best fits the user’s needs, which [26]. are determined based on analysis of The basic idea is to present a selection of items the user’s profile that during best fitsprofile creation the user’s needs,orwhich while navigating are determined the application. based on analysis of the user’s profile during profile creation or while Recommendation navigating the application. systems have become prevalent nowadays and are widely used in a variety of applications. Recommendation systems The mosthave popular applications become prevalent are probably nowadays and are movies, widelymusic, used news, books, research articles, search engine queries, products, in a variety of applications. The most popular applications are probably movies, etc. A typical example of music, a recommendation system is the www.stumbleupon.com (accessed news, books, research articles, search engine queries, products, etc. A typical example of on 5 April 2022) awebsite system, which recommendation system uses is thethewww.stumbleupon.com web ratings gathered (accessed by a collaborative on 5 April rating system 2022) website that canwhich system, match usesusersthe withwebinteresting websites by ratings gathered based on their preferences. a collaborative rating system that can matchFor example, users for two users with interesting with based websites the same preferences, on their a recommendation system is preferences. capable For of suggesting example, something for two users with that the maysamebe ofpreferences, interest to the second user, depending a recommendation system on is the data provided from the first one. Figure 1 shows two people capable of suggesting something that may be of interest to the second user, depending on with the same prefer- ences the data(they look almost provided from the the same, first one. they Figurehave similar 1 shows two ages, theywith people aretheof the same same gender, preferences they probably (they look almost likethe similar same,clothes) they have andsimilar how aages, recommendation they are of the system same is capable gender, of they probably suggesting like similar clothes) something that may andbe how a recommendation of interest to User B basedsystemon isthe capable of suggesting data provided from something User A. that may be of interest to User B based on the data provided from User A. Figure1. Figure 1. Recommendation Recommendation system. system. 2.4.3. 2.4.3. User User Simulation Simulation Since Since modelinga auser modeling lets user thethe lets system systemperform an internal perform representation an internal of a particular representation of a par- user, user ticular simulation user, allows us user simulation to perform allows usabilityusability us to perform testing. These testing.tests involve These tests ainvolve processa used process used to evaluate a product by testing it on these users, thereby providingidea to evaluate a product by testing it on these users, thereby providing the basic the of howidea basic realofusers how would use would real users the system, use theand the tests system, andfocus on measuring the tests the abilitythe focus on measuring of aability product to satisfy someone [27]. A few striking examples of goods that profit of a product to satisfy someone [27]. A few striking examples of goods that profit from these tests fromare websites, these food, tests are consumer websites, food,products, consumer computer products,interfaces, computeretc. interfaces, etc. 2.5. Knowledge Extraction Knowledge mining in Computer Science (also called knowledge discovery in databases), is the process of detecting interesting and useful patterns and pertinence in great numbers of data [28]. The field of knowledge mining combines artificial intelligence tools and techniques with database management and is widely used by businesses (insurance, bank- ing, etc.), in scientific research (medicine, physics etc.) and in government security systems (criminality and terrorism actions). Thus, using clustering or categorization algorithms, data are extracted to help humans make appropriate decisions.
gorithms, data are extracted to help humans make appropriate decisions. Companies’ transactional data have significantly increased; thus, the deman more sophisticated systems capable of discovering the knowledge contained withi data has come to the foreground. A successful application of data mining was the Future Internet 2022, 14, 144 8 of 24 tion of credit card fraud. The system studied the consumer’s buying behavior an played a pattern for them. Any purchase made outside this pattern led to an inve tion. Companies’ transactional data have significantly increased; thus, the demand for more The complete sophisticated systemsdata mining capable processtheinvolves of discovering knowledge multiple containedstages, within thatwhich are inform data has gathering come to theand pre-processing, foreground. A successfulinapplication which, before the data of data mining wasmining algorithms the detection of credit are ap card fraud. The system studied the consumer’s buying behavior the surveyed set of information is assembled. Then, the data are processed, and displayed a pattern which en for them. Any purchase made outside this pattern led to an investigation. data mining and results in the interpretation of the database. To achieve the afor The complete data mining process involves multiple stages, which are information tioned process, gathering there are some and pre-processing, techniques in which, before thewhich are discussed data mining algorithms below. are applied, the Predictive surveyed modelingis is set of information used when assembled. Then, we aimare the data atprocessed, estimating which the valuedata enables of a part miningand feature and results we know in thesome interpretation of the database. of the values To achieveAn of the attribute. the aforementioned example is data clas process, tion, whichthere are somea techniques gathers group of which are discussed data that have been below.sorted into predefined sets and Predictive modeling is used when we aim at estimating the value of a particular for patterns feature and we inknow the some data ofthat differentiate the values these An of the attribute. groups. exampleThese is datadiscovered classification, pattern then whichbegathers reuseda to classify group other of data data been that have when the name sorted for the group into predefined sets andattribute looks for is unkn For example, patterns in theadata manufacturer maythese that differentiate develop groups.predictive modelspatterns These discovered to distinguish can then which be reused to classify other data when fail in extremely hot or cold temperatures. the name for the group attribute is unknown. For example, a manufacturer may develop predictive models to distinguish which parts fail in A second technique is descriptive modeling or clustering, which also subdivid extremely hot or cold temperatures. items A into groups. second Withisarraying, technique descriptivethe appropriate modeling sets may or clustering, whichnotalsobe known in subdivides its advanc they itemsare discovered into groups. With after analysis arraying, of the data. the appropriate setsFor mayinstance, not be knownan advertiser in advance, but may inter they are discovered after analysis of the data. For instance, an advertiser general population in order to categorize plausible consumers into many kinds of g may interpret a general population in order to categorize plausible consumers into many kinds of groups and then develop separate advertising campaigns [28]. Figure 2 shows the clusterin and then develop separate advertising campaigns [28]. Figure 2 shows the clustering groups. into groups. Figure 2. Clustering. Figure 2. Clustering. The next data mining technique worth mentioning is pattern mining. This technique focuses on establishing modes that present specific patterns within the data. They are often The next data mining technique worth mentioning is pattern mining. This tech used in stores trying to find out which products are commonly purchased along with some focuses on Although other ones. establishing modes testing that present such insights specific is possible withoutpatterns the help ofwithin the data. The an application, often used in data mining hasstores trying facilitated to find out the discovery which products of associations are commonly in less obvious purchased datasets. Figure 3 with someinother illustrates ones. a simple wayAlthough testing how the pattern such mining insights technique is possible is used without the help in the data. application, data mining has facilitated the discovery of associations in less obvio
Future Internet 2022, 14, x FOR PEER REVIEW 9 of 25 Future Internet 2022, 14, 144 9 of 24 tasets. Figure 3 illustrates in a simple way how the pattern mining technique is used in the data. Figure 3. Figure 3. Pattern MiningAvailable Pattern Mining Availableonline: online:https://borgelt.net/teach/fpm/ https://borgelt.net/teach/fpm/ (accessed (accessed on on 55 April April 2022). 2022). 2.6. Similar Systems 2.6.1. The WEST 2.6.1. The WEST System System When When analyzing analyzing user user analysis analysis systems, systems, it it is is important important to to refer refer to to early early systems systems that that became pioneers in their field. One of these was the WEST became pioneers in their field. One of these was the WEST system [22]. system [22]. The The WEST WESTsystemsystemwas wasa tutorial forfor a tutorial a game called a game HowTheWestWasWon. called HowTheWestWasWon. In thisIn game, this players spin three spinners and have to create numerical expressions game, players spin three spinners and have to create numerical expressions with the with the numbers spin, numbersusing +, −using spin, , ×, / +,and−, appropriate parentheses ×, / and appropriate to determine parentheses what the final to determine whatvalue will the final be. So, if, for example, the player rolled 2, 3 and 4 with the spinners, they could create value will be. So, if, for example, the player rolled 2, 3 and 4 with the spinners, they could the numerical expression (2 + 3) × 4 = 20 and advance 20 places. If a player reaches one create the numerical expression (2 + 3) × 4 = 20 and advance 20 places. If a player reaches city (i.e., every 10 places), he automatically advances to the next city, and if he lands on one city (i.e., every 10 places), he automatically advances to the next city, and if he lands an opponent, then he is sent back two cities. Thus, it makes it an optimal strategy for the on an opponent, then he is sent back two cities. Thus, it makes it an optimal strategy for user to have to calculate all possible moves that put him ahead of his opponents. By thus the user to have to calculate all possible moves that put him ahead of his opponents. By analyzing the players’ moves, the system discovered that the most popular strategy was to thus analyzing the players’ moves, the system discovered that the most popular strategy add the two smallest numbers and multiply them by the largest. was to add the two smallest numbers and multiply them by the largest. Although the WEST system explored some of the basic concepts of user modeling, Although the WEST system explored some of the basic concepts of user modeling, due to the limited results, it worked very well by analyzing player behaviors so that they due to the limited results, it worked very well by analyzing player behaviors so that they could be understood by users. could be understood by users. 2.6.2. The Gumsaws System 2.6.2. The Gumsaws System The Gumsaws system was created to support the construction of adaptive web pagesThe[29].Gumsaws This systemsystem waswas ablecreated to meettothe support the construction scalability, replaceability of and adaptive web adaptabil- pages ity [29]. needs of aThis system website was able users. by modeling to meet the this It did scalability, by usingreplaceability and adaptability knowledge mining techniques needs to learnofthe a website by modeling user’s navigation users. It did this by using knowledge mining techniques history. to learn the user’s navigation history. The Gumsaws system had features to create a profile or group of profiles and to store, The update retrieve, Gumsaws and system had features delete entries. These to create awere functions profile or group performed byofthe profiles systemand to using store, retrieve, various sourcesupdate and delete of information, entries. such These as direct functionswhich information were performed came directlyby from the system users, using various group information sources which of came information, suchnavigation from users’ as direct information which camebetween history and correlations directly from users, them. Thus, group information the system could bewhichused bycame news from users’and systems navigation served itshistory and correla- users according to tions preferences. their between them. Thus, the system could be used by news systems and served its us- ers according to their preferences. 2.6.3. The CATS System The Collaborative Advisory Travel System (CATS) was recommended as a solution to suggest a plan for ski holidays for a group of friends [30]. This allowed a group of users to work together at the same time in order to choose a ski vacation package that satisfied
Future Internet 2022, 14, 144 10 of 24 the whole group. The system revolved around the interactive DiamondTouch tabletop that allowed developing group recommendations that can be shared virtually among up to four users. The proposals relied on a group profile which was a mix of personal inclinations. 2.6.4. The PCAHTRS System The PCAHTRS system is a Personalized Context-Aware Hybrid Travel Recommender System proposed by R. Logesh and V. Subramaniyaswamy [31]. With this system, they tried to propose a way to achieve better personalized recommendations in the e-tourism domain. The main purpose of this model was to design a hybrid collaborative filtering travel recommender system that provides personalized tourist venues based on ratings and desires. It is shown that the form of the implicit and explicit preferences of users extended with the semantic models is the key to uncertainty issues that come up in the recommendation process. PCAHTRS was based on the user contextual information and opinion mining technique to improve accuracy in prediction. 2.6.5. The Hootle Hootle was a group recommender system (GRS) proposed by JO Álvarez Márquez and J Ziegler [32]. In this system, user preferences and needs were modified in group discussions and users could interact with the desired features of the items. All group members should therefore accept or reject the proposed features and manage group choices according to their importance. 3. Our Proposed Implementation Artificial intelligence is radically changing our lives and has been around for a long time. Through the COVID-19 pandemic, it has been given a new impetus, since public and private lives are now largely played out online. Any registration system primarily aims at collecting information on site visitors, not only to determine who is coming to the site, but also to facilitate informed decisions concerning the site design and content. Marketers pay critical attention to customer profile data, which are used to better understand their audience, how they use the website, what products they like, their offline interests, and who is on their social media. The value of the database depends on the quality of the data it contains, and 88% of customers admit that traditional registration forms provide incomplete or incorrect information, so the database does not contain the required quality of data. Poor data quality can result in lost sales, ineffective direct marketing, administrative costs and a loss of 10–20% of annual revenue in avoidable distribution errors [33]. Users need a platform that checks and verifies data provided upon signing up. This will boost the profitability of the business and give consumers a sense of uniqueness by receiving targeted advertising—discounts—and recommended products on the site’s specially designed “personal” page. Generally, users who are already registered do not meddle with updating their profile, since they have already received access to the platform. Additionally, many users who are concerned about their personal information do not include their real personal data online. They intentionally (in most cases) give incorrect information. These fake profiles can be modified or updated with more data using the methods for unregistered users. Given the above, we created a “user profile extraction engine” called Profiler for a virtual web shop. Through this implementation, we can track users’ movements and create their profiles accordingly. Our primary goal was to create and edit a profile for e-commerce purposes. 3.1. The Database The database is used for the static data of the users entered during registration, the dynamic data entered during their navigation and for the products. The database consists of four tables: members (users), products (products), tracking (tracking) and item bought (purchases).
3.1. The Database The database is used for the static data of the users entered during registration, the Future Internet 2022, 14, 144 dynamic data entered during their navigation and for the products. The database consists 11 of 24 of four tables: members (users), products (products), tracking (tracking) and item bought (purchases). The users table consists of only three elements: the username, password and an ID The user. for each usersThis tableIDconsists of only is unique three for each elements: user thekey and is the username, password that connects and an this table ID to the for each user. tracking table.This ID is unique for each user and is the key that connects this table to the tracking table. The tracking table contains data that attempt to determine whether the user is male The tracking table contains data that attempt to determine whether the user is male or female, whether they have children and what their hobbies are. It also keeps a record or female, whether they have children and what their hobbies are. It also keeps a record of when they last logged in, how many times they have shopped at the store, how much of when they last logged in, how many times they have shopped at the store, how much money they have spent and other personal information, if any. money they have spent and other personal information, if any. The product table contains one-by-one information and images of the products as The product table contains one-by-one information and images of the products as well well as information that helps the system to categorize the products and answer the as information that helps the system to categorize the products and answer the queries queries received from the user during the shopping process. received from the user during the shopping process. Finally, the shopping table (items bought) contains information about the purchases Finally, the shopping table (items bought) contains information about the purchases made by each user. Figure 4 shows the tables and some of the elements and keys that made by each user. Figure 4 shows the tables and some of the elements and keys that make make up the system’s database. up the system’s database. Figure 4. Figure 4. Database tables. tables. 3.2. 3.2. User User Tracking Tracking Technique Technique The The process of user process of user tracking tracking is is also also the the point point where where profiles profiles are are dynamically ‘built’. dynamically ‘built’. Every time a user makes a query in the database, the database displays the appropriate Every time a user makes a query in the database, the database displays the appropriate products and at the same time notes, by editing the user’s profile, the categories of interest. products and at the same time notes, by editing the user’s profile, the categories of in- PHP was used for server-side scripting and database communication. The dynamic terest. editing of the profile is not visible to the ordinary user but only to the administrator of the PHP was used for server-side scripting and database communication. The dynamic website and cannot be edited unless the information in the database is ‘tampered with’. editing of the profile is not visible to the ordinary user but only to the administrator of the We mentioned in Section 2.2.1 the ways in which it is possible to monitor profiles. website and cannot be edited unless the information in the database is ‘tampered with’. In this application, the ideal way is the second one, i.e., monitoring through the user’s We mentioned in Section 2.2.1 the ways in which it is possible to monitor profiles. In actions. In this way, by observing the recurring patterns of users, the system can adapt this application, the ideal way is the second one, i.e., monitoring through the user’s ac- to changes in the user’s interests, likes, routines and targets. The only downside is that tions. In this way, by observing the recurring patterns of users, the system can adapt to “building” a complete profile can take some time, and if not given enough time to create changes some in the patterns recurring user’s interests, likes, by the user, theroutines data mayand targets. appear The only downside is that incomplete. “building” a complete profile More specifically, the way can take some a profile time, and is tracked hasif to notdogiven withenough timevisited the pages to create in some recurring patterns by the user, the data may appear incomplete. the application. That is, if a user visits men’s products very often, the system will know this and will increase the number of times this user has visited men’s products. All this information is stored and tracked in our system’s databases and not in cookies for various reasons as we showed in Section 2.2.2. By observing the user for some time, the system will have enough information about him/her so that the administrator can distinguish him/her from the others. Similarly, if users are browsing and constantly searching for products or information on pages of our online store that contain items for infants or children, our
Future Internet 2022, 14, 144 12 of 24 system also classifies them as potential parents. Thus, our system creates a profile for each registered user, constantly updating it with information related to gender, age, and financial and family status. 3.3. Data Analysis and Display Technique The final stage is to calculate and display statistics according to the preferences of each individual user. This option is only visible to the application administrator and allows the administrator to search for a user. The application, in turn, searches for the user in the database and all the data that make up the user. It then calculates the data and displays it so that it can be understood by the administrator. The analysis is the process in which the system takes the information where the user was looking at men’s, women’s or parent’s products and their categories and calculates them as percentages according to their choices. The data are displayed through tables where all the categories are displayed, and the administrator can clearly see the demographics and interests of the user. More specifically, as is shown in Figure 5, the system administrator can see detailed information for each user, such as their username, statistical data on the user’s gender, his/her likes and much more personal information. For example, the user in this example, based on his/her statistical analysis, is 10% male and 90% female, so she is probably a female. There is also a prediction regarding whether this user has or does not have a child. According to the user’s navigations and the percentage of traffic of each sport activity, the administrator can see in percentages whether he/she likes running, football, basketball, gymnastics, tennis, hiking, swimming or cycling. The system administrator also has access to additional information about each user, such as what date the account was created, when the user last logged in, how many times he/she has logged in to the online store since creating the account, how many times he/she has shopped in the store and how much money he/she has spent in total. The personal details of each user are also presented, for example, in which city he/she lives, at which address, his/her e-mail address, telephone number and other address details. Additionally, the administrator can see if there are any Future Internet 2022, 14, x FOR PEER REVIEW 13 of 25 discount coupons in his/her profile and a table of all the products he/she has bought in the past. So, the administrator has a complete overview of each user. Figure 5. Figure 5. Data Dataanalysis analysisand anddisplay displaytechnique of aofuser. technique a user. 4. Results and Discussion 4.1. Testing of the Application with Real Users, Analysis of the Results through Questionnaires and SPSS
Future Internet 2022, 14, 144 13 of 24 4. Results and Discussion 4.1. Testing of the Application with Real Users, Analysis of the Results through Questionnaires and SPSS As mentioned in Section 3, a profiler prototype has been designed and implemented that takes information and interprets it as logical clusters, which are capable of being interpreted by humans and other appropriate programs that will monitor them. The application represents an online store (e-shop) of sporting goods. Users log into the system and make their purchases. As users navigate through the e-shop, the system tracks the users’ movements and records them individually. In this way, we are able to understand some preferences of each user and even some personal data, such as their age, their gender or even if they are parents. At the end of the visit of the users or potential buyers of the online shop, the users are asked to fill in a questionnaire. The questionnaire contains the same questions for all users and helps us to verify and check the validity of the information and data extracted by the user analysis system. 4.2. European Data Protection Regulation The information collected is very personal and there is a risk of violation of the user’s privacy. There are legal and ethical issues regarding the surveillance of people’s privacy. The Data Protection Authority, also known as the General Data Protection Regulation (GDPR), is a constitutionally independent administrative authority. It was established by a law for the protection of every person from the processing of data concerning personal data, which incorporates a European Directive into Greek law [34]. This directive sets certain rules for the protection of personal data in all member countries belonging to the European Union. In our developed system, we respect and protect the privacy and the free development of the personality of each user, since this is a primary objective of any democratic society. Any electronic application should maintain and establish a level of security and protection that is on a par with that of existing services, but at the same time capable of ensuring that personal data is used in a lawful and transparent manner in the interest of citizens–consumers. Due to the provision of electronic services, citizens who use them disclose personal data; thus, there is electronic collection and processing of important information about each citizen, which can be used to create an extensive profile or help unauthorized persons to access all the information. As Lopes H, Pires IM, Sánchez San Blas H, García-Ovejero R, Leithard write in their article, “Data privacy has had a vast prominence in society. Several approaches are taken to realize the dream of one day. There could be a world in which there is a real state of privacy for the individual” [35]. All online applications of any institution must inspire security during transactions, as it is vital that citizens/business users have confidence in the systems used by the public. Trust is consolidated by the existence of appropriate mechanisms for user identification, security and protection of personal data. Users should be made aware of how their personal data are protected and how risks arising from malicious actions by third parties are addressed, such as in cases of hacking of personal data, unauthorized use of services, unauthorized access to data, etc. Directly intertwined with the security of Public Websites is their reliability and their acceptance by visitors–users. They should provide satisfactory security and reliability, ensuring the following parameters: - Integrity: which refers to ensuring that the information that is handled, published, stored and processed remains unchanged. - Identification: which refers to the identification of the user’s identity; - Confidentiality: which refers to access to information only by those who have the appropriate authorization. - Authentication: refers to the specific action that ensures that the identity declared by the user actually corresponds to the user.
Future Internet 2022, 14, 144 14 of 24 - Authorization: which refers to ensuring that each entity has access to those system resources to which it has been granted access. - Availability: relating to the availability of information whenever an authorized user attempts to access it. - Non-repudiation: which refers to the inability of a user to deny that he/she has performed an action related to accessing, entering and processing information. The security of public websites consists of a complex set of guidelines and rules relating to the organization of the website operator and the hosting provider, the procedures it applies, the services it provides, the technical infrastructure at its disposal and, finally, the legal framework for the protection of personal data and the security of communications. Unfortunately, however, the preceding analysis has shown that, from a legal point of view, there are many different issues that need to be addressed immediately and specifically. Among the most important issues are undoubtedly those relating to data security and, more specifically, the issues relating to the authentication of the identity of the communicating parties, the integrity of the data transmitted, the confidentiality of the data from possible unwanted disclosure to third parties and the non-derogability of the data. In order for any public or private agency to proceed with lawful processing of citizens’ personal data, it should, for example, have collected the data in a fair and lawful manner, for clear and defined purposes, the data should not be more than necessary and should be accurate and up to date. In conclusion, we must point out that if the challenges are overcome, Data Security–Legal Aspects will evolve the World Wide Web into a Web with many new possibilities and will greatly affect many of the activities of our daily lives. 4.3. Statistical Analysis of Data For the purposes of this article, the statistical program SPSS was used to group, compare and draw conclusions about the quality and reliability of the information produced by the user analysis system. In our sports e-shop, the adaptive profiling system that we created holds information and analyzes and makes predictions regarding the following categories: - Hiking - Swimming - Running - Cycling - Football - Basketball - Gym - Tennis - Sex (Male or Female?) - Parent (Is this user a parent?) Accordingly, variables for the same categories were used for the “real” data provided to us through the questionnaires. One hundred adults from all educational levels completed the questionnaires after having made some virtual purchases in our online store. The questionnaire consists of 11 questions, and provides data about respondents from different points of view, such as sex, age, interests, parenthood, education, etc. The selection of these individuals was random. The purpose of this survey was to collect, per user, his/her personal data and his/her interests and to subsequently compare these data with those recorded and predicted by our online profiling system. The results of the survey were very encouraging and showed that our system in most cases worked extremely well. Detailed examples are presented below. More specifically, the questions they were asked to answer were: Question: Which username did you use when you registered?
Future Internet 2022, 14, 144 15 of 24 This question was asked to know exactly which username he/she used when he/she created the account in our system so that we can compare our findings for that specific user. Question: What is your gender? According to the replies to the questionnaires, 57 were males and 43 were females. Our online profiling system successfully predicted the gender for 84 of those users (47 males and 37 females). This means that the success rate of our system for the gender reached a percentage of 84%. In Table 2, the success rate of the gender prediction is presented. Question: Are you a parent? Of the participants, 32 replied that they were parents and 68 replied that they were not. Based on the findings of our system, it predicted the correct parenthood for 49 of those users. In Table 3, the success rate of the Parenthood prediction is presented. Question: What are your interests? Choose the ones that interest you (Running, Football, Basketball, Gymnastics, Tennis, Hiking, Swimming, Cycling) In this question, users had the choice to pick any activities that they really like. For each one of these activities and for every user, we analyzed the findings of our profiling system. It turned out that the system worked very well and made accurate predictions. In the following tables the success rates of each activity is presented. Table 2. Gender analysis. Real Data from Profiling System Gender Success RATE Questionnaires Accurate Predictions Male 57 47 82% Female 43 37 86% Total 100 84 84% Table 3. Parenthood analysis. Real Data from Profiling System Parent Success Rate Questionnaires Accurate Predictions Yes 32 12 37.5% No 68 37 54.4% Total 100 49 49% In Table 4, the success rate of the Running activity prediction is presented. Table 4. Running activity. Real Data from Profiling System Running Success Rate Questionnaires Accurate Predictions No 78 63 81% Yes 22 11 50% Total 100 74 74% 1 1 The success rate of our system for the running activity is 74%. In Table 5, the success rate of the Football activity prediction is presented.
You can also read