EPaper - the Personalized Mobile Newspaper
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
ePaper - the Personalized Mobile Newspaper Bracha Shapira, Peretz Shoval, Joachim Meyer, Noam Tractinsky, Dudu Mimran Deutsche Telekom Laboratories at Ben-Gurion University P.O.B. 653, Beer-Sheva, Israel bshapira@bgu.ac.il, shoval@bgu.ac.il, Joachim@bgu.ac.il, noamt@bgu.ac.il, dudu@strategicboard.com ABSTRACT in direct sunlight and at a nearly 180-degree angle. This paper provides an overview of the ePaper project. The project aims to provide an end-to-end solution for the future The ePaper project, performed at the Deutsche Telekom mobile personalized newspaper. The ePaper aggregates Laboratories at Ben-Gurion University, aims at providing content (i.e., news items) from various news providers, and an end–to-end solution for the future newspaper, targeting delivers personalized newspapers on dedicated mobile, the above mentioned electronic paper devices. The ePaper electronic newspaper-like, devices. The ePaper can provide is not meant to be another application on a PDA or mobile to each subscribed user a personalized newspaper, phone. Rather, it is projected as a substitute to the according to the user's preferences, as well as a "standard newspaper that is run on a medium-format portable digital edition" of a selected newspaper. The layout of the device, with several notable advantages over a paper newspaper is adapted to the device's specifications and the newspaper, such as the possibility to provide up-to-date user's preferences. The ePaper is expected to change the information, aggregate news items from many news reading experience of newspapers and magazines, coupling providers, easy browsing and navigation, and a innovative paper-like display with novel personalization personalized edition that best fits each user's preferences. algorithms, intuitive interface and new adaptation methods The ePaper system is a client-server application that of content to device. provides an end-to-end solution for newspaper reading. On the server side, the ePaper includes aggregation and Author Keywords classification of news from multiple sources, flexible Electronic newspaper, collaborative filtering, ontology- delivering services, personalization and content adaptation. content-based filtering, personalization. On the client side, readers enjoy intuitive interface enabling easy navigation and browsing and advanced content ACM Classification Keywords adaptation capabilities, enabling the reader to switch and H5.2. Information interfaces and presentation, H.3.3. configure layouts. Information Storage and Retrieval; information filtering, H.3.1. Content Analysis and Indexing. The rest of this paper is structured as follows: Section 2 describes some related projects; Section 3 presents the 1. INTRODUCTION ePaper general architecture; Section 4 elaborates on the The publishing world is undergoing a digital revolution. novel personalization and content adaptation algorithms. After decades in the laboratory, electronic paper Section 5 concludes with the status of the project and future technologies seem posed for commercialization starting as issues. early as 2006. Electronic paper based on e-ink technology offers a visual impression close to print on paper, being 2. RELATED WORK very thin; readable, and consuming power only when The digital revolution of the publishing industry and updating the screen. The result is a reading experience that specifically the digitization of newspapers have gained lot is similar to paper - high contrast, high resolution, viewable of research attention and is the subject of many recent studies and projects. We briefly mention here some of the main recent projects. The Electronic Newspaper Initiative [1, 2] is aimed at producing an advanced multimedia news electronic newspaper. It provides personalization of news, and allowing interactive features like multimedia news on demand. One of the main goals of ELIN was to develop an authoring tool for journalists and editors of news that are using the MPEG-4 and MPEG-7 standards. The scope of the ePaper system does not include authoring tools.
However, ePaper is flexible in terms of the news items x Content reuse: finding information needed for re- format. The ePaper design consists of interpreters that publishing of content, e.g. in a second publishing might be added for every known format that a publisher channel. wishes to use without forcing them to adopt a specific standard (like the MPEG framework). ELIN is aimed x Information augmentation: finding new information that mainly for users at home, who use PC-based devices. In the can serve as background for the news to be published. ELIN context mobile devices are mainly used to send news x Story chain management: managing the relations between related SMS and MMS. The ePaper is targeted to dedicated different stories in the news. e-ink mobile devices, considering their limited capabilities. CoMet deals with matching news-items metadata and user- Personalization of contents in the ELIN project is based profile metadata, CoMet did not concentrate on newspaper mainly on rather basic, collaborative, memory-based delivery to mobile devices in general, and to e-paper algorithms that are known to have scalability disadvantages. devices in particular, though it mentions that mobile They consider the enhancement and improvement of the devices delivery may benefit from its results. Contrarily, filtering and personalization algorithms as an issue for their our ePaper Project concentrates, as far as output devices are future developments. The ePaper personalization engine concerned, on delivery to mobile devices, especially e- integrates ontology-driven content-based and novel paper-like devices. collaborative filtering algorithms to provide high quality personalization of content. 3. GENERAL ARCHITECTURE OF EPAPER SYSTEM MINDS project (Mobile Information and News Data Figure 1 presents an overview of the ePaper architecture. Services) for 3G (http://www.minds-project.net), is aimed at optimizing processes in the value chains of mobile Content Providers services. The group developed innovative mobile media services and defined European metadata standards for news. MINDS project concentrates on promoting the mobile S ystem M anagem ent T ools ePaper S ystem - Server channel for news delivery, including business issues, Aggregator metadata issues, alerting issues and technological issues. Our ePaper project aims to deliver a whole newspaper Content Manager ePaper Client product/service, rather than specific stand-alone news services or alerts about these services. DigiNews [5] is a European research and development Personalization project in the electronic media domain. It aims at finding new ways of distributing and consuming the future Content Delivery Services electronic newspaper. More specifically, it aims at combining useful features of printed newspapers, such as simplicity of use, high accessibility and high mobility, with Figure 1. ePaper architecture important features of electronic media, such as the ability to The ePaper system was implemented based on client-server update news continually and the options of multimedia architecture. The server side consists of four layers. The news. The possibilities of personalizing the delivered first, Content Layer, including the Aggregator and the newspaper's contents are also examined. The DigiNews Content Manager. The aggregator interacts with content project deals with personalization too, but with regard to (news) providers and collects news item to a local storage. adaptation of the user interface to users' preferences, while The content manager processes the content received and content personalization is done only as part of the prepares it for delivery to users. The system maintains augmented uses, and therefore only on a limited scale. Also, hierarchical news ontology based on the NewsML subject there doesn't appear to be any information in DigiNews codes defined by IPTC (www.iptc.org). The content public articles about the existence of any user's search manager classifies each news item to relevant ontology engine. Our ePaper project, in contrast, pays much attention concepts The Personalization layer consists of a novel to users' ability to browse and search for relevant news. personalization engine which prepares ranked lists of news The CoMet project [11] was carried as a successor of items to be delivered to users. The personalization engine SmartPush which built a personated delivery system for combines an ontology-driven content-based filtering economic news items. Four kinds of services have been algorithm with a time-aware collaborative filtering defined in CoMet: algorithm. The Content Delivery Services layer orchestrates the processes of the system. It interacts with the x Personalizing: filtering incoming information according Personalization layer, submits requests for personalized to users' content-based profiles. news and sends the ranked news items it receives to the client. It also receives feedback from the client (tracking
user's behavior data) and sends this data to the from the Aggregator and sends them to the Interpreter Personalization layer, which updates the user's profile to Manager. The Content Manager also receives interpreted reflect the recent user's reading preferences. The System and classified data back from the Classifier and sends it to Management Tools layer provides standard system tools the other functional units. After the content item passes all such as logging and reporting, as well as special tools for the functional services, it is ready to be sent to the the ePaper application, including the Ontology Editor that repository and used by the Personalization layer. enables maintenance of the ontology. Also included is a 3.1.2. Interpreter Manager registration system where a user can register and define to The ePaper system is able to handle news items coming the ePaper services he will subscribe. The user provides from multiple sources and in multiple formats. The information about the content providers from whom to Interpreter Manager is responsible for identifying and receive content, demographic and billing information. The activating the appropriate interpreter for each content item, user can also opt to define explicitly his areas of interest, i.e., the interpreter that is able to "understand" the specific choosing concepts from the news ontology. The Client sub- content item's format and extract the relevant fields. The system interacts with the content delivery for receiving input for the Interpreter Manager is a content item received data. The data includes profile information on the profiles from the Content Manager. The result of the Interpreter registered to the device, and a ranked list of news items that Manager's activity is the execution of the proper interpreter. suits the user preferences. The client is in charge of rendering the content and adapting it to preferred layout and 3.1.3. NewsML Interpreter presenting the content to the user. To manage the variety The NewsML Interpreter is an example to an interpreter and constraints of different mobile devices, the system implemented in the ePaper system. It receives content items supports dynamic content adaptation mechanisms based on in NewsML format, extracts relevant meta-data fields (e.g. the device that the user owns, the user's preferences and the the newspaper, language, authors, etc.), and passes back the local customizations made by each user. Thus, the parsed content to the Classifier. presentation of content functionality is loosely coupled with The ePaper may handle any other standard formats by the content preparation process, a capability that may scale developing dedicated interpreters to each standard. the number and variety of devices supporting this service easily. In the following section we provide more detail 3.1.4 Classifier about the main components, namely the content The ePaper system uses an ontology, which is a small and management, the client system, and the content delivery. limited hierarchy of the NewsCodes concepts. The The personalization layer is detailed in Section 4. Classifier component is responsible for determining the ontology concepts which will represent each news item, i.e., 3.1 Content Management to define the content-based profile of each item. For this Figure 2 presents the Content Management layer of the purpose, the Classifier component utilizes a hierarchical ePaper system. It receives pre-processed content from the multi-label classification algorithm. aggregator and stores processed content to the repository The hierarchical multi-label classification algorithm layer. The major responsibility of the Content Management implemented in ePaper uses flat multi-class classification layer is to classify (map) each news item to ontology provided by LingPipe open source software [http://www.alias-i.com/lingpipe/index.html]. LingPipe Content classification method is based on statistical language Manager modeling techniques and uses Bayesian decision theory. Interpreter Manager We apply top-down level-based approach for hierarchical repository classification. According to this approach, separate classification models are constructed at each level of the Interpreter 2 Similarity to NewsML Similarity items in other category tree. There is a separate model for classification Computational Interpreter Component sources for each concept of the ontology at every level of the Temporal hierarchy. Hence, the number of generated models is Media similarity Manager Identifier identical to the number of concepts in the hierarchy. The classification process is performed in top-down, level- Classifier Ontology manager based approach. First, the content is classified into one or News Editor more high level categories. Then, it is further classified into one or more child concepts of the categories assigned at the concepts. previous stage. Then, if one or more of second sub level concepts were assigned to the content, it is further classified Figure 2. Content Management layer into their child concepts. The classification process stops 3.1.1 Content Manager when classification to the detailed concept is not confident The Content Manager is orchestrating the content enough. The confidence thresholds are defined by management layer processes. It receives raw content items configuration parameters defined by empirical runs.
Once the results of the classifications at each level are categories, or retrieves requested items without obtained, the final classification is determined according to personalization. the received concepts’ weights and configuration parameters. The most specific concept is assigned if its Client System score is above the pre-defined multi-label threshold Application (Layout/API) Upgrade parameter; else, the concept with the highest score is Push Based Content Delivery Server Breaking News / Alerts Server Server assigned to the content. 3.1.5 Similarity Computation This component computes the similarity between a new XML Based Newspaper Application incoming content item and other "active" items existing in the repository. If the new item is deemed "very similar" to Newspaper Runtime API an existing item, two different situations are distinguished: a) that the new item is very similar to an existing item that Ontology Directory Content Delivery Favorites Archives Local came from another source; b) that the new item is very Services Services Management Management Multimedia Viewer Settings similar to an existing item that came from the same source. The objective is to prevent sending to a user a news item XML based local storage Offline Proxy CommServices that is very similar to an item that the user already read - “read later”, clicks cache Web Services IFS Content, Archives, Favorites, Profile, Settings, History unless the user opted to obtain such "redundant" news. But if the new item came from the same source, it is assumed that it contains more recent/updated information and will Operating System – Communication Services – UI Services – Portable Embedded Virtual therefore be delivered to the user. To identify similarities Machine between items, we use the vector-based classifier [8]. 3.1.6 Media Manager Media Manager is responsible to manage the processing of all media that arrives to the ePaper system including Figure 3. Client subsystem architecture conversion to the ePaper format and generation of a new item instance. A client may have several kinds of requests to the Content Delivery subsystem: a request for news items, a request for 3.2 Client Subsystem ranking of items, a request for a "standard edition" of a The client sub subsystem surrounds every functional unit newspaper, and a request for the user's profiles according to planned to support the mobile application activity on the the device. A request for news items returns to the client a device, as well as the mobile application itself. This sub set of the requested items without their ranking. A request system includes the following functionality areas: for ranking returns to the client a set of items, based on the x Local servers receiving breaking news, alerts and number of items requested and the requested categories to software upgrades which those items belong,, to be presented in the client. Another process in the Content Delivery module is the x Newspaper runtime environment, the platform on which clicks setting process. It receives from the client the ID of mobile applications designed are supposed to run the user and the clicked item, and updates the user's profiles x Infrastructure services such as: offline proxy for accordingly. maintaining connection-less environment, remote communication services and local XML persistence layer 4. PERSONALIZATION AND CONTENT ADAPTATION In this section, some of the innovative ideas of the ePaper x Client application service, including favorites project are described, namely the personalization and the management, multimedia viewing, user settings content adaptation algorithms. management and remote ontology browsing services 4.1 Personalization Figure 3 presents the client sub-system architecture. The Personalization engine of the ePaper system should consider the special characteristics of a mobile newspaper environment: 3.3 Content Delivery The Content Delivery subsystem intervenes before content x Item relevancy over time – the relevancy of different is delivered to the client side and mediates between the types of news items decreases differently over time client side and the Personalization subsystem. In order to x Items are presented to a user using hierarchical send the relevant content to a specific user, the Content navigation scheme - the engine should provide ranked Delivery interacts with the Personalization subsystem and lists of relevant items within any level of concepts requests the personalized ranked items for specific hierarchy
x New news items are continuously incoming to the and similar concepts, a similarity score is computed for system and stay active for a short period of time. The each item. cold start (new item) and sparsity problem should be Step 3: Use the collaborative filter to rank the "active" well addressed items in the repository for a user. We adjusted K-nn To address these challenges, we developed a hybrid algorithm to consider a decaying factor of item’s relevancy filtering method which combines ontology- content-based over time, considering different decaying factors for filtering with time-aware collaborative filtering. The different ontological concepts. We compute the following: decrease in item's relevancy is addressed by a time-aware x Find the user neighborhood: compute the user collaborative process. The use of the ontology enables similarity score (USS) for the user with all the other representing the items and the users with concepts from the users same vocabulary, and measuring the similarity between item and user profiles considering the hierarchical distance x Compute a time-discount weight for each click on the between concepts in the two profiles. The combination of item to be ranked the collaborative and content-based filtering techniques enables to overcome the problems of "cold start" and x Compute the weighted average of all the clicks on the sparsity, as it uses the content-based filter for new item, item: considers how similar is the “clicking user” which still have no reading history, and dynamically (USS) and the time he clicked the item (the time increases the weight of the collaborative filter, as a read discount) item accumulates more "clicks" (i.e. it is read by more x The result so far is two lists of ranked items, one users). according to the content-based filter, and the other – Few ontological-based profiling models exist in which user according to the collaborative filter. profiles are represented with ontology concepts as well as Step 4: Use a weighted combination scheme considering the the item profile [7, 11]. However, in those studies, the “maturity” of each item, i.e. how many rates (clicks) it has. computation of similarity between the user and item does The more rates, the more weight is given to the not consider concept level hierarchy; the ontology hierarchy collaborative filter. Hence, a new item is ranked based on is used naively only for profile update via feedback, e.g.; the content-based filter only; as time passes and an item fractional interest in a higher-level concept is inferred when gets more clicks, the weight of the collaborative filter a specific topic is added. In the ePaper, the content-based increases filter considers the distance between concepts in the item's profile and the user's profile, according to their location in 4.2 Content Adaptation the hierarchal ontology. Exact details on the content-based The content adaptation challenge of the ePaper project deals filtering method can be found in [9]. with the question of how to adjust news content collected to The collaborative filter of ePaper includes a dynamic time- the ePaper database for presentation to the individual decay factor, which is determined according to the age of reader. It assumes that readers differ in their preferences the item. The intuition is that news items lose relevancy regarding the density and style of information presentation, over time. We plan to learn and use different decay factors as well as in their interests. It also aims to develop an for different concepts (e.g., political related news might automated system that can generate content adaptations and lose their relevancy faster than technology related news). screen layouts without the intervention of a human editor. Some collaborative systems use a decay factor usually To address this challenge we first conducted empirical decreasing the user interest level in a concept rather than research to study the various aspects of the problem. The reducing the item weight. No consideration of different empirical part consists of three interrelated lines of decaying factors [10]. research. One series of experiments dealt with the Here is a brief overview of the main steps of the filtering arrangement of pages in a newspaper, focusing mainly on process: the comparison of serial and hierarchic navigation. The research demonstrates the advantages of each type of Step 1: Get a request to provide a ranked list of relevant structure and develops a model to determine the relative news items for a user. benefits of each. Step 2: Use the content-based filter to rank the "active" The second line of research looked at user's preferences items in the repository for a user. In essence, the relevancy regarding the layout of news sites as a function of estimation function measures the similarity (distance) information density and structure of the layout. between each concept in the item's profile to respective concepts in the user's profile, considering not only the co- The third line of research consists of a series of experiments occurring concepts but also occurrences of neighboring that aimed to generate a function for predicting the apparent (parent and child) concepts, according to the hierarchical importance of an item on a page as a function of visual ontology. Based on the number of co-occurring concepts properties of the item. The experiments showed that importance perception is a very rapid process (even after a
0.5 second exposure to a page,, people generate stable assessments of the relative importance of items). We can now predict with a high degree of accuracy the perceived importance of a certain item according to its dimensions relatively to other items and location on screen. We developed an algorithm for the automatic generation of screen layouts. The purpose of the algorithm, and its main innovation compared to previous work on automated layout generation, is the attempt to develop a system that can create layouts, based on a very limited set of parameters, for a wide range of devices that differ in display resolution, screen size, and screen dimensions. Existing approaches for achieving this goal are originated in different methodologies such as ‘stock cutting’ problems (Elmaghraby, Abdelhafiz & Hassan, 2000) and ‘floorplan area’ of circuits in VLSI manufacturing (Knog, Hong & Qiao, 1997). In most cases these approaches aim generate a layout that minimize the area consumption, for predefined number of items. In addition these items usually have a set of positioning constraints between themselves (e.g. order, Figure 4a. Teenager layout or adjacent items). Our algorithm goal is to populate an entire given area, with undefined number of items, having no display constrains between them, rather then having individual display constrains to user only. The proposed algorithm uses an iterative division method to create smaller and smaller areas of the screen. The system stops the division process when the generated areas reach the minimal area size. Following the division process, areas with one dimension smaller than the minimal area are merged according different policies. The layout, generated by the division and merging processes, is then populated by items according to their importance. The output of the algorithm is an xml-based description of the layout that is then used by the client system as basics for generation of actual layouts. The user can switch between layout, and the user selection of layout is saved to his profile. Figures 4a and 4b present different layouts generated for the ePaper system. Figure 4b – Business layout SUMMARY AND FUTURE ISSUES The full version of the ePaper prototype system is now undergoing usability tests, aimed at examining the users' reactions to the service and tests the navigation and browsing capabilities. Concurrently, we conduct evaluations the personalization and content adaptation algorithms, as well as intuitive interface related research. For the personalization algorithms we examine the effect of various parameters of the filtering performance; e.g.:
- optimal scores of partially similar concepts, according to 2. Dummer, G., Casademont, j., Einhoff, M., Boyer, A., their hierarchical distance, and their marginal and Perdrix, F. (2005). ELIN: A MPEG based news contribution delivery framework Cunningham, P.: Innovation and the Knowledge Economy. Part 2: Issues, Applications, Case - optimal number of concepts to consider in the user’s Studies. Amsterdam: IOS Press, 2005, pp. 959-966 profile and the items’ profile 3. Elmaghraby, A. S., Abdelhafiz, E. and Hassan, M. F. - schemes to analyze the user feedback (e.g. clicks; time of (2000). An intelligent approach to stock cutting reading, ranking of clicked item) optimization. Univarsity of Louisville Multimedia - optimal decaying factor (the impact of time on the Research Lab, Louisville, KY collaborative and combined filters) 4. Hyung Jun A. (2008). A new similarity measure for We are running controlled experiments with users to collaborative filtering to alleviate the new user cold- evaluate the relevancy of news items, compared to the starting problem, Information Sciences Volume 178, system’s ranking of those items based on the filtering Issue 1, Pages 37-51 methods. We are running simulations manipulating various 5. Ihlström, C., Sabelström Möller, K. and Maria Åkesson, parameters, and calculate standard and novel filtering M., (2005). Diginews - The challenge of production in measures, e,g, MAE, precision, recall, PIP (Hyung, 2008). e-paper publishing - from new consumption to new workflows. Presented at TAGA 2005 We are currently in the midst of conducting laboratory experiments about the design of the ePaper to understand: 6. Knog, T., Hong, X., and Qiao, C. (1997). VEAP: Global Optimization based Efficient Algorithm for VLSI x How the aesthetic design of online news sites affect Placement. Asia and South Pacific Design Automation users’ emotions and attitudes towards the product Conference (ASP-DAC’ 97), Chiba, Japan, pp 277-280. x The effects of typical vs. novel designs of the ePaper 7. Middleton, S.E., Alani, H., Shadbolt, N.R., Roure (relative to other news sources) on users’ preferences (2002). D.C.D.: Exploiting synergy between ontologies of those designs and recommender systems. In: The Eleventh International World Wide Web Conference The results of these experiments will provide further (WWW2002). guidelines regarding the design of the ePaper and similar products. 8. Salton, G, Wong, A., and Yang, C. S. (1975), "A Vector Space Model for Automatic Indexing," Communications of the ACM, volume 18, issue 11, pages 613–620. ACKNOWLEDGMENTS 9. Shoval, P, Maidel, V., and Shapira, B. (2008). An The ePaper project is sponsored by Deutsche Telekom Co. ontology content based filtering method. Int'l Journal of and is performed at Deutsche Telekom Laboratories at Ben- Information Theories and Applications, pp. 51-63. Gurion University 10. Tong-Queue L., Young P. (2006). A Time-Based REFERENCES Recommender System Using Implicit Feedback. 1. Casademont, J., Perdrix, F., Einhoff, M, Dummer, G., CSREA IEEE 2006, pp. 309-315. and Boyer, A (2005). ELIN: A Framework to deliver 11. Yli-Koivisto, J., and Puustjarvi, J. (2001) Using media content in an efficient way based in MPEG Ontologies in CoMet,. Proceedings of the ONTO-2001 standards, IEEE International Conference on Web Workshop on Ontologies, Vienna, Austria, September Services (ICWS'05) pp. 841-842 18, 2001, pp. 1-15.
You can also read