HERMES: A Software Architecture for Visibility and Control in Wireless Sensor Network Deployments
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
HERMES: A Software Architecture for Visibility and Control in Wireless Sensor Network Deployments Nupur Kothari† , Kiran Nagaraja, Vijay Raghunathan‡ , Florin Sultan, Srimat Chakradhar NEC Laboratories America, Princeton, NJ † University of Southern California, Los Angeles, CA ‡ Purdue University, West Lafayette, IN Abstract (c) networking issues (e.g., interference, collisions). This paper addresses the problem of software reliability in sen- Designing reliable software for sensor networks is chal- sor networks, although the proposed techniques can be used lenging because application developers have little visibility to alleviate some hardware- and network-related problems into, and understanding of, the post-deployment behavior as well (as described in Section 4). of code executing on resource constrained nodes in remote Ensuring reliable software operation in sensor networks and ill-reproducible environments. To address this prob- is extremely challenging. A combination of severe resource lem, this paper presents H ERMES, a lightweight framework constraints, lack of architectural safety features such as and prototype tool that provides fine-grained visibility and memory protection, and operation in unpredictable environ- control of a sensor node’s software at run-time. H ERMES’s ments leads to uncommon and unexpected failure modes architecture is based on the notion of interposition, which that often manifest only at run-time through complex trigger enables it to provide these properties in a minimally intru- mechanisms [1, 2]. As a result, pre-deployment testing us- sive manner, without requiring any modification to software ing conventional quality assurance tools such as simulators applications being observed and controlled. H ERMES pro- is no longer sufficient (as it does not accurately reflect the vides a general, extensible, and easy-to-use framework for system’s post-deployment behavior) and in-field testing and specifying which software components to observe and con- validation become a must. During post-deployment testing, trol as well as when and how this observation and control the more visibility a software designer can obtain into pro- is done. We have implemented and tested a fully functional gram behavior as it executes in-field, the easier the program prototype of H ERMES for the SOS sensor operating sys- will be to test, analyze, validate, and if needed, debug. Un- tem. Our performance evaluation, using real sensor nodes fortunately, obtaining fine-grained visibility into a running as well as cycle-accurate simulation, shows that H ERMES software system is hard in any embedded system and even successfully achieves its objective of providing fine-grained harder in sensor networks where the nodes under test are and dynamic visibility and control without incurring signif- several wireless hops away. In a recent survey, embedded icant resource overheads. We demonstrate the utility and software developers listed limited visibility into the system flexibility of H ERMES by using our prototype to design, im- (65%), limited ability to trace (54%), limited ability to con- plement, and evaluate three case-studies: debugging and trol execution (42%), and excessive intrusiveness of debug- testing deployed sensor network applications, performing ging techniques (40%) as the four biggest problems faced transparent software updates in sensor nodes, and imple- during embedded software testing [3]. menting network traffic shaping and resource policing. To address the above concerns, this paper proposes H ERMES, a lightweight framework and prototype tool that 1. Introduction provides sensor network developers a high degree of vis- ibility and control over the in-field execution of software As wireless sensor systems transition from research pro- in a minimally intrusive manner. Prior work to address totypes to commercial deployments, providing reliable and this issue, e.g., Marionette [4], Clairvoyant [5], has focused dependable system operation becomes crucial to ensure on providing gdb-like debugging capability in sensor net- widespread adoption and commercial success. Unreliable works. H ERMES complements, and differs from, these ap- sensor network operation is usually the result of one or more proaches in two significant ways: of the following: (a) hardware faults (e.g., failure of hard- ware components such as sensors), (b) software problems • Prior approaches follow an interactive debugging style, (e.g., bugs, incorrect program logic, unsafe operations), or ferrying debugging information between the node un-
der test and a developer, thereby requiring continu- tion through the use of debugging hooks (e.g., traps, break- ous developer participation during testing. In con- points, watchpoints) introduced into program binaries and trast, H ERMES embeds developer-defined correctness often implemented using hardware assistance. JTAG, a de- tests and validation logic onto the sensor node itself, bugging standard commonly used to test embedded soft- making in-field software testing autonomous without ware, provides a back-door into the system by providing ac- continuous developer participation. More importantly, cess to an on-chip debug module via designated test pins on H ERMES also allows developers to push corrective ac- the micro-controller. Visibility and control similar to gdb tions onto the node under test, which automatically get can be obtained. However, gdb and JTAG both enforce an invoked when anomalous software behavior occurs. interactive debugging style and require software execution to be paused to gather and observe execution state. • H ERMES does not focus on debugging individual lines In sensor networks, the prevalent approach to develop- of source code, but operates at a higher level of ab- ment and testing has been through simulation or emulation straction to provide run-time visibility and control over in a controlled environment [6, 7, 8]. Some recent work has the interactions of larger units of functionality (e.g., addressed the problem of providing post-deployment visi- tasks, modules, threads). Doing so allows us to per- bility and control. Sympathy [9] is a tool that detects and form high-level functionality testing and answer ques- localizes node and link failures using periodic application- tions that are meaningful in the context of the appli- generated reports. SNMS [10], a network monitoring sys- cation (e.g., whether the sensor driver returns sensed tem, provides an API for applications to export certain data when requested and what the observed range of monitoring information and receive simple debugging com- the sampled values is). mands from developers. Both Sympathy and SNMS rely A key feature of H ERMES is that visibility and control on modifying the application to explicitly export debug- are provided in a non-intrusive manner. No change is re- ging information and are geared towards network monitor- quired to the source code of the software being tested and ing as opposed to monitoring the functionality of software debugged. In fact, the target software as well as other soft- components in depth. Marionette [4] builds upon SNMS ware components that it interacts with are oblivious to the and uses a lightweight-RPC mechanism to provide devel- testing and continue to operate normally. H ERMES achieves opers with a remote-shell interface into a sensor node. In this by interposing all the target software’s data-flow in- addition to querying or modifying variables within a run- teractions (such as messages) and control-flow interactions ning binary, developers can also invoke functions compiled (such as inter-process communication calls, system calls, into the binary. However, advanced debugging techniques and calls to event handlers) with the rest of the system. We such as conditional breakpoints are not supported. Clair- have implemented H ERMES in the context of the SOS op- voyant [5] is a comprehensive source-level debugger for erating system and have evaluated it through cycle-accurate sensor networks that provides remote-gdb functionality for simulations as well as on MicaZ motes. To demonstrate sensor networks. Using Clairvoyant, developers can con- the flexibility and utility of H ERMES, we present three spe- nect to a remote sensor node and issue standard debugging cific case-studies that use it to perform various crucial sen- commands as well as some sensor network specific ones. sor network tasks such as post-deployment analysis and de- However, Clairvoyant’s model of remote access and control bugging of software, transparent software updates in sensor is still developer-centric and requires interactive debugging. networks, and network resource management. These case In contrast, using H ERMES, a developer can simply state the studies demonstrate that, using H ERMES, sensor network visibility and control requirements either a priori or dynam- designers can not only analyze and verify the behavior of ically and implement detached functionality testing without remotely deployed nodes, but also easily detect (and often developer intervention or interrupting execution. even prevent) incorrect and unreliable operation. Our implementation of H ERMES uses interposition as a primitive to achieve visibility and control in a non-intrusive manner. As a general technique, interposition has been well 2. Related Work studied [11] and has been used for diverse goals such as system call tracing (e.g., SunOS trace utility), emulating To provide a proper context for our work, we briefly dis- software and hardware platforms (e.g., virtual machines), cuss other approaches that provide system support for de- extending OS functionality [12, 13], and for protected exe- veloping reliable software in general-purpose as well as sen- cution of untrusted [14, 15] or buggy code [16]. sor systems. Developers on conventional computing plat- forms enjoy the luxury of a wide choice of tools that aid in software development and testing. For example, tools 3. The H ERMES Architecture such as gdb, trace, gprof, and the VALGRIND frame- work represent mature debugging options that provide vis- This section presents an overview and the design princi- ibility into program state and control over program execu- ples of the H ERMES framework. We intentionally keep the
Application A’s code OS Code 1 Libraries 2 Automatic Synthesis Compiler Installed at deployment Static Kernel Image using CIL Framework Tool Suite Binary modules loaded dynamically Application A’s Application A’s Application A’s Other Binary Interposition Stub Binary Module Binary Module Modules Dynamic Linker 1 2 Static Kernel Image Compiler Tool Suite Interposition Interposition inserted removed Application A’s Binary Module Dynamic Linking Interposition Binary Interposition Other Binary Module and Loading of the Modules Module Interposition Module At compile-time Static Kernel Image Post deployment Figure 1. Overview of interposition as provided by H ERMES. description generic (to the extent possible) and postpone the user to support advanced interposition tasks. The second discussion of implementation specifics to Section 5. step in the interposition process, shown in Figure 1 by the dashed box on the right, involves the insertion of the in- 3.1. Overview terposition module into the original running system, to in- terpose the interfaces between the target module and the runtime. For most sensor operating systems, the interfaces For purposes of illustration, consider the software run- that need to be interposed can be categorized as (i) func- ning on a sensor node to be composed of one or more soft- tions provided by the target module, (ii) handlers for events ware modules, i.e., segments of code componentized by and messages received from other modules and the runtime, functionality. Several sensor operating systems have em- and (iii) functions provided by the sensor runtime and other braced such modularity in their run-time software architec- modules that are invoked by the target module. Upon inser- ture (e.g., modules in SOS [17], processes in Contiki [18], tion, the interposition module provides a handler or function and threads in M ANTIS [19]). By definition, software mod- to the runtime corresponding to each handler or function ules use a set of well-defined interfaces to interact with provided by the target module. Similarly, for every function the rest of the system. These interfaces, which represent invoked by the target module into the runtime or another the boundary of a software module with its environment, module, the interposition module presents a corresponding present a natural opportunity for our interposition approach. interface to the target module. The interposition module Figure 1 presents an overview of module interposition as thus mimics the runtime from the target module’s perspec- enabled by H ERMES. As shown in the figure, during nor- tive and the target module from the runtime’s perspective. mal execution, software modules link and interact directly with the runtime. The basic technique underlying H ERMES is to introduce an additional software component (referred 3.2. Design Principles to as the interposition module) between the module to be interposed (referred to as the target module) and the sen- We next present the principles that drove our design of sor runtime. The interposition module can then observe and H ERMES, starting from the high-level goal of providing a control the interactions to and from the target module. powerful, flexible, and lightweight mechanism to observe Interposing a target module is a two-step process. The and control the in-field behavior of sensor software. first step, shown in Figure 1 by the dashed box on the left, Dynamic extensibility: The true potential of a framework to is done off-line and involves the generation of a customized observe and control post-deployment behavior can be real- interposition stub based on the interfaces used and provided ized only if it allows users to easily introduce, change, and by the target module. We use a compiler framework that remove interposition functionality in an incremental man- parses the target module’s source code, determines the spe- ner as and when needed. Unforeseen failures and other sce- cific interfaces that it uses to interact with the runtime, and narios encountered in sensor deployments demand such dy- automatically synthesizes a template interposition stub cus- namic extensions to functionality. Moreover, requiring the tomized for the target module. We refer to it as a template interposition functionality to be entirely incorporated into since its functionality can be extended unlimitedly by the the sensor software prior to deployment would either restrict
the type of changes possible to the interposition functional- but give the developer complete freedom to use these hooks ity or require the node’s entire binary image to be recom- as desired. For example, a designer may choose to just log piled and redistributed for each change. a target module’s interactions with the rest of the system, or As mentioned earlier, there are several sensor runtimes choose to trap, modify, or even suppress them. that support dynamic software extensibility [17, 18, 19]. Further, even runtimes with monolithic binaries are often 4. Usage Scenarios for H ERMES amenable to dynamic extensibility with some effort. For ex- ample, add-ons such as FlexCup [20] or application-specific The H ERMES interposition framework can be used to im- virtual machines [21] provide techniques to achieve dy- plement a variety of observe and control functions desired namic extensibility in TinyOS. Our prototype implemen- for deployed sensor networks. Below, we briefly describe tation of H ERMES was carried out on the SOS operating some envisioned uses of H ERMES. In Section 7, we detail system that supports dynamic extensibility [17]. three specific case studies implemented using H ERMES. Flexibility in interposition granularity: Since the level at which software behavior is observed may determine both 4.1. Observing In-Field Execution the degree of understanding and the type of control that can be exercised over the behavior, it is necessary to allow for In its most basic form, the H ERMES framework can be several vantage points to suit varied requirements. For ex- used to enable a high degree of visibility into software ex- ample, a developer interested in tapping outgoing packets ecution on remote sensor nodes. In addition to the ability from a node would prefer to just tap into a node’s radio in- to record and timestamp inter-module interactions, the in- terface rather than the messaging interface of each module. terposition code also has limited visibility into the state of In the design of H ERMES, the module interface was a the module being interposed, i.e., as allowed by the state- natural choice that allowed such latitude. H ERMES can in- access hooks available within the runtime. Traces of the terpose a subset of the module’s interfaces (one function functions of a module that were invoked, when they were at its minimum), several modules at once, the communi- invoked, the parameters passed with these calls, and how cation interface (network driver module) out of a sensor the module’s internal state changed as a result of these calls node, or even several nodes at once, thus providing several are obtainable using H ERMES. Further, H ERMES also en- choices in granularity. Further, in operating systems such as ables the following more advanced forms of visibility: SOS, Contiki, and M ANTIS, modules can represent appli- Conditional watchpoints based on in-field events: Since cations, middleware, or even extensible kernel components developers can explicitly specify which interactions to inter- (e.g., sensor drivers), making H ERMES even more powerful. pose, when to interpose, and what to do with the interposed Non-intrusiveness: Non-intrusiveness refers to the extent interaction, debugging tasks can be dynamically triggered to which the original behavior of the system is affected due in response to specific in-field events. For example, receiv- to providing visibility and control. While interposition nat- ing a packet from a new neighbor node or a message with a urally carries overhead, it must not significantly alter the certain payload can be used to trigger a detailed execution execution of the software. Further, developers should have trace. Such conditional visibility is attractive from an over- the ability to turn up or turn down the extent of interposition head point of view, and is a useful contrast over interactive, based on the permissible overhead. over-the-network debuggers. H ERMES supports selective interposition both tempo- Synthetic event generation and in-field testing: Since rally, i.e., can be turned on and off dynamically, and spa- H ERMES allows observing and controlling a module’s inter- tially, i.e., can be applied to a subset of interfaces. Sec- actions, one can present an alternate behavior of the mod- ond, unlike over-the-network debuggers such as Marionette, ule to its surroundings - a useful feature for applications. H ERMES does not interrupt execution to gather information, For example, in a network deployed for wildfire monitor- providing visibility and control while the software contin- ing, interposition modules inserted at specific nodes in the ues to execute normally. Third, H ERMES does not require network can synthesize sensor traces to mimic fire events. any modification to the source code of the module being in- The subsequent response of the network can help verify terposed. In fact, the target module is completely oblivious the readiness of the deployment - similar to real-world fire to the interposition itself. drills. Ease of use: Finally, H ERMES makes it easy for develop- ers to write the code that enables visibility and control by 4.2. Controlling In-Field Execution automatically synthesizing code stubs that handle the me- chanical details of the interposition. Developers only need A broad set of network management and maintenance to fill these stubs with the logic needed to operate on the operations can be easily implemented through H ERMES. interposed observations. In other words, we provide ready- Below, we outline uses that emphasize H ERMES’s ability made hooks for visibility and control of the target module, to control the interactions of a target module.
Dynamic access control policies: Since the functionality not affect the performance of the system. Our prototype of a deployment can be compromised or disrupted due to implementation of H ERMES for the SOS operating system, faulty or malicious nodes, it is necessary to have the abil- described below, addresses these specific challenges. ity to quarantine specific nodes and limit the disruption. Clearly, such measures need to be both dynamic and ad hoc 5.1. SOS - A Modular Sensor Network OS to handle security emergencies unforeseen at the time of de- ployment. H ERMES can be used to interpose and hence tap SOS is a sensor operating system with a structured ar- into the network stack, to dynamically add appropriate fire- chitecture based on a small kernel that is installed on all the wall rules at the nodes neighboring the rogue nodes. nodes in the network. The rest of the system and the appli- Traffic shaping to manage shared network resources: Re- cation functionality are implemented as a set of dynamically sources such as the limited wireless bandwidth must be loadable binary modules. This modularity forms the basis carefully allocated to suitably address metrics such as fair- of the SOS architecture, as it defines distinct boundaries and ness and network longevity [22]. Several factors must be allows modules to be loaded and unloaded at runtime. considered and these factors could vary drastically over SOS provides an event-driven execution model, with time, driving a need for dynamic adoption of allocation each module implementing a handler that is invoked by policies. By interposing the communication path, H ERMES the OS scheduler to dispatch messages to destination mod- can be used to dynamically (and with little overhead and ules. Modules interact with one another and with the kernel disruption) introduce resource allocation policies to adapt through both synchronous function calls and asynchronous to changing network conditions. messages. Figure 2(a) shows the basic architecture of SOS Fixing isolated in-field failures: Sensor network deploy- and the well-defined communication paths for modules to ments will continue to be marred by failures due to hos- interact with each other and with the kernel. These paths tile environments, unreliable hardware, and buggy software. provide definite points for the SOS kernel to track execu- H ERMES can be used to architect both preventive mech- tion context, e.g., when messages cross module boundaries. anisms (e.g., forcing a fail-safe operation upon seeing an Synchronous communication between modules is imple- illegal interaction) and, at the other end of the spectrum, re- mented by SOS using dynamic linking. A module’s binary covery mechanisms (e.g., stateful rollback using interaction encodes the set of functions it provides and those it sub- traces). However, we see H ERMES being used expressly for scribes to. At load time, the dynamic linker tracks down emergency recovery measures under isolated failures, espe- and links all the provide-subscribe function pairs. Modules cially in critical deployments. Certain hardware failures of can also send asynchronous messages to each other by post- a sensor node can be detected and masked to keep the sen- ing them to a queue managed by the scheduler, which in- sor node as a whole operational, until a more comprehen- vokes the message handler of the destination module. The sive corrective action can be taken. For example, on detect- module-kernel interaction takes place via API system calls ing a bad reading from a temperature sensor (say the sensor (to kernel) and asynchronous messages (from kernel). becomes stuck at zero, which is a commonly reported fail- ure), a stub interposing the sensor driver could substitute the 5.2. H ERMES Implementation Overview reading with an acceptable one estimated either from prior readings or from those of its neighbor, as well as report the Our H ERMES prototype exploits the clean module-to- error to the network operator. module and module-to-kernel communication paths in SOS to provide the network designer with the capability of trap- 5. Implementation ping all interactions of a user module with the rest of the system (function calls and messages both directed to and Implementing H ERMES for any sensor runtime poses a from a target module) at the module boundaries. H ERMES number of challenges. First, we require the interposition redirects trapped interactions to an interposition module to be transparent to the module being interposed, as well specific to the target module (Figure 2(b)). A target module as to other modules in the system. This implies that no can have its own dedicated interposition module, or multi- changes should be made to the source code of the target ple target modules can be interposed by the same interpo- module or other modules while also providing transparency sition module (therefore H ERMES increases the number of at runtime. Second, to allow flexible use of interposition modules present in the system only marginally). Interposi- capability one should be allowed to dynamically configure tion is completely transparent: (i) no changes are required interposition not only for any module in the system, but also to a target module’s source code to enable interposition, and for a subset of interfaces of each module. This configura- (ii) no other module the target interacts with is aware of tion needs to be atomic with respect to the modules in the the interposition. In addition, interposition is also dynamic system. Finally, the H ERMES implementation should have and selective, i.e., it can be turned on or off at runtime, and a low memory and computational footprint, so that it does a programmer can choose which interactions of the target
Modified by HERMES Forward A’s message to IA Link F from B to A Provide F, K Link F from IA to A rather than B to A Module IA Dynamic Linker (interposes Module A) Dynamic Linker Subscribe to F Call K Call F Subscribe to F Provide F Provide F Call F provided Module A Module B by IA Module A Module B Call kernel fn K Receive msg from A Call kernel fn K Send msg to B Send msg to B System jump table Kernel System Kernel jump table Message passing Message passing K scheduler K scheduler (a) Interactions of module A in SOS (b) Interposition of module A by module IA in SOS with H ERMES Figure 2. H ERMES implementation in SOS. to interpose. In summary, H ERMES provides flexible inter- The alternate implementation of a kernel call provided position at the granularity of individual function calls and by the interposition module may in turn make kernel calls, messages, with minimal footprint on the system. including the redirected one, e.g., after logging it, chang- H ERMES achieves transparent interposition by adding ing its parameters, etc. This could result in loopback redi- light-weight support for interaction redirection in the SOS rection (thus infinite recursion). To avoid it, we track the kernel and by leveraging the dynamic linking mechanism context from which a kernel call is made, and distinguish used in SOS. For dynamic interposition control, H ERMES between calls made from within an interposed module (to provides a new SOS kernel function that can be used to be redirected) and calls made after control crosses module’s switch interposition on or off for a given target module. boundaries (to fall through). H ERMES ensures that transitions of the target module be- Cross-module Call Redirection tween the interposed and non-interposed states are atomic. The kernel redirects cross-module calls issued by and to an To simplify a programmer’s task in using H ERMES, we interposed module to their corresponding implementations have developed tools that automatically generate a skeleton provided by its interposition module, using the dynamic of the interposition module from the code of a target mod- linking facility provided by SOS. This redirection is per- ule. The actual functionality of the interposition module is formed when either a new module is inserted into the system left to be filled in by the developer, thus granting him/her or when interposition is turned on for an existing module. the flexibility of choosing what to do with the interposed Non-interposed functions of the target module are linked di- interactions. rectly to their real implementations, with no additional call overhead. 5.3. Runtime Interposition System The kernel performs the following steps when loading and linking a new module M : (i) if an interposition module H ERMES interposes three types of interactions of a mod- for M is already present in the system and interposition is ule with the rest of the system: (i) kernel API calls (with the turned on for M , link all of M ’s subscribed and provided kernel); (ii) subscribed/provided user function calls (with functions to the interposition module; (ii) if M subscribes other modules); (iii) messages (with kernel, other modules, to functions provided by an already interposed module, link and network). We next describe the mechanisms used in our M to the respective interposition module; (iii) if an inter- modified SOS kernel to redirect these interactions, as well posed module subscribed to M ’s functions, do not link that as to enable dynamic interposition control. module to M (since it is already linked to the function pro- Kernel Call Redirection vided by its interposition module). To intercept and redirect all kernel API calls made by the in- The kernel performs the following steps when interpo- terposed module to functions provided by the interposition sition is turned on dynamically for a module M : (i) re-link module, we augment all kernel functions with a prologue the functions subscribed to by M to the corresponding func- consisting of a few lines of redirection code. The redirec- tions provided by its interposition module; (ii) re-link the tion code checks if the calling module is interposed and if its subscribers of every function provided by M to the corre- interposition module provides an alternate function to sub- sponding function provided by M ’s interposition module. stitute for the kernel call. If so, it calls the alternate func- When interposition is dynamically turned off for M , the tion provided by the interposition module, otherwise it falls kernel simply uses the default linking mechanism of SOS through to the default kernel call implementation. to re-link M into the system. In our implementation, the
/* Interposition module for Surge */ mod header t mod header = { /* Surge module (interposed) */ .mod id = SURGE INTER PID, /* module id */ ... .num prov func = 1, /* provided functions */ /* call to get hdr size in TreeRouting */ .funct = { size = SOS CALL(s->get hdr size, get hdr size proto); [0] = {inter get hdr size, "Cvv0", SURGE INTER PID, GET HDR SIZE FID} } /* TreeRouting module */ }; ... /* interposed get hdr size implementation */ /* native get hdr size implementation */ uint8 t inter get hdr size(func cb ptr p) { uint8 t tr get hdr size(func cb ptr p) { return 0; return sizeof(tr hdr t); } } Figure 3. Left: Code for interposing a cross-module function call from Surge to TreeRouting. Right: Surge and TreeRouting code snippets showing the call site and the native function implementation. above steps are guaranteed to be atomic with respect to a ule from the system or restarting it, even when the target is target’s interactions with other modules since they are ei- absent from the system. ther executed in the nonpreemptible message handler of the kernel loader or as a result of a system call made from the 5.4. Interposition-stub Synthesis nonpreemptible message handler of a user module. Incoming Message Redirection As described in the previous section, the H ERMES run- The kernel redirects a message sent to an interposed mod- time interposition system redirects, to the interposition ule to the corresponding interposition module by checking module, all the cross-module calls to and from its target if the destination module is interposed, and, if so, divert- module, the kernel calls that the target makes, and the mes- ing the message to the handler of the interposition module. sages that the target sends or receives. Consequently, an The kernel also transfers memory ownership of the diverted interposition module’s code will have a structure specific to message’s payload to the interposition module. Upon re- the module it interposes. H ERMES provides a preprocessor ceiving the message, the interposition module can use the to automatically generate a customized stub of the interposi- (unmodified) destination field of the message to discrim- tion module from the source code of its target module. This inate between redirected messages and those actually in- tool is built over the CIL compiler framework for C [23]. tended for it. The preprocessor takes as input the target module to be Outgoing Message Redirection interposed and generates an interposition module contain- In SOS, messages are sent by a user module using one of ing stubs for certain types of functions to which the kernel the post * kernel API calls. Since all kernel functions are redirects calls to/from the target: (i) functions provided by redirected to the interposition module of the caller module the interposed module (to which the kernel redirects calls (if any), all messages originating in an interposed module made by other modules), (ii) functions subscribed to by the are automatically redirected to the interposition module. interposed module (to which the kernel redirects calls made Dynamic Interposition Control by the interposed module), and (iii) kernel API functions The kernel provides dynamic interposition control at run- used by the interposed module (to which the kernel redi- time. A field in the kernel module descriptor stores a pointer rects kernel calls made by the interposed module). to the module’s interposition module, if any. This field is To further ease the programmer’s burden, the preproces- used to control the module’s interposition status (on/off) sor builds in default “null” functionality into the generated and can be set/unset using a kernel API function provided interposition module, such that directly running it causes by our modified SOS kernel. The interposition module also interposed interactions to be simply redirected to their orig- stores a duplicate of the interposition status in a reserved inal intended target. With this default functionality in place, field in its module-specific state. This copy acts as a backup the programmer needs only modify code to handle those in- in case the target module is removed from the system while teractions that she specifically wants to interpose. interposition is still turned on, or if interposition is turned Figure 3 shows an example of coding cross-module on before insertion of the target module. Upon loading a call interposition for the Surge application mod- module whose interposition module is already present in the ule. The interposition module (left) implements the system, this field is checked to determine whether or not inter get hdr size function, whose stub had been the new module’s interactions need to be redirected. This generated by our preprocessor since the Surge module enables per-module dynamic control of the kernel redirec- calls the tr get hdr size function provided by the tion mechanisms without removing the interposition mod- TreeRouting module (right). The programmer has filled
Function call SOS Surge on SOS+H ERMES generated by the H ERMES preprocessor that simply redi- Interp. off Interp. on ker id 8 40 467(Surge) rects all function calls and messages to their original des- tr get hdr size 6 6 112 tinations, without performing any computation or buffering 19 23 (Kernel) them. We ran simulations for 500 seconds using three nodes 23 31 (Non-interposed module) located within one hop of each other and collected statistics inter get ker func N/A 142(Interposition module) 347(Interposed module) using the profiling facilities in Avrora. We first evaluate the absolute overheads introduced by Table 1. Function call latencies (cycles). H ERMES in cross-module and kernel call redirection. Ta- ble 1 presents call latencies for three functions, for SOS and for SOS+H ERMES in the two cases (interposition on/off in code in the stub to return a header size of zero. If she for Surge), respectively. The first two functions are rep- chooses not to interpose this function, she can simply resentative of typical module interactions via kernel and remove its entry from the list of provided functions in the inter-module calls. ker id is a kernel function that re- module’s header shown at the top. Note that H ERMES turns a module’s ID. tr get hdr size is a function imposes no restriction on how the original application provided by TreeRouting that is subscribed to and called (right) is to be written, and requires only a simple and by Surge upon sending a packet. The third function, logical structure of the interposition code. inter get ker func, is a lookup function added by H ERMES to the SOS kernel and called from all kernel func- 5.5. Discussion tions. If the module where the kernel call originated is in- terposed, it returns a pointer to the alternate implementation We described in this section our implementation of of the kernel function provided by the interposition module. H ERMES for SOS. The architecture of H ERMES is, how- As shown in Table 1, for cross-module ever, general and can be implemented over a variety of other tr get hdr size calls, the latency increases to operating systems. For multi-threaded operating systems 112 cycles with interposition on, due to a lookup of such as Contiki [18] and M ANTIS [19], which have dynamic the interposition module’s header that the module it- linking capabilities, one could anticipate simply extending self must perform in order to find the target function. our technique for modifying the dynamic linking mecha- inter get ker func takes 23 cycles with interposition nism, and allowing for interposition of individual threads. off. With interposition on, it takes a variable number of Our technique could also be applied to TinyOS [24] using cycles depending on the call site (listed in parenthesis in a capability such as FlexCup [20], which provides dynamic Table 1), with a maximum of about 350 cycles when called linking for TinyOS. For plain TinyOS, which does not pro- from within Surge. vide for dynamic linking, interposition can either be applied ker id takes 40 cycles in SOS+H ERMES even when at compile-time, or at runtime using binary rewriting in an interposition is off, due to interposition checks introduced approach similar to the one taken by Clairvoyant [5]. by H ERMES - a baseline overhead. When interposition is on, it takes 467 cycles. This steep hike is due to a call to our 6. Evaluation suboptimal implementation of the inter get ker func lookup function, the rest being due to ker id needing to We performed an evaluation of our implementation of be redirected twice, once from the kernel to the interposi- H ERMES for SOS running Surge [25], a sensor data collec- tion module, and then from the interposition module to the tion application. A distributed tree routing protocol (imple- kernel. Although these numbers may seem high when com- mented by a TreeRouting module running on every node) pared to the plain SOS, we next show that their effect on the builds a tree rooted at the base staton that is used by every overall system performance of Surge is negligible. Surge node to send collected data towards the base staton. In our next experiment, we repeated the previous runs We micro-benchmark the overhead introduced by H ERMES on a real sensor testbed of ten MicaZ motes, out of which using Avrora [8], a cycle-accurate simulator for the Atmel one was the base staton and the others were simple Surge AVR instruction set architecture. We also present perfor- nodes, up to two hops away from it. The execution runs took mance statistics of H ERMES on the MicaZ sensor platform. about 1,000 seconds. We used the Rate Adaptive Time Syn- In the first experiment, we simulated Surge in Avrora, chronization (RATS) protocol [26] to time-synchronize the running it on two systems: over the plain SOS, and over nodes and collected statistics on packet latency and num- SOS with our H ERMES implementation (SOS+H ERMES). ber of packets delivered to the base staton. Table 2 presents In the SOS+H ERMES case, we introduced an interposition memory usage and performance statistics for Surge on plain module for Surge and ran the simulation twice, once with SOS, and on SOS+H ERMES in four cases: no interposition interposition for Surge turned off, and then with interpo- module, Surge interposition module loaded with interposi- sition turned on. We used the “null-interposition” module tion off and on, respectively, and both Surge and TreeR-
OS configuration Memory usage Packets received Average packet latency [bytes] [milliseconds] RAM ROM Heap Base Station Surge Node Base Station 1-hop Node Plain SOS 100 38,248 684 124 110 0.6 28.8 No inter. module 100 46,580 697 124 109 0.8 28.9 Inter. module present, interposition off 100 47,572 723 124 110 0.8 29.0 SOS+H ERMES Surge interposed 100 47,572 723 124 107 1.2 29.4 Surge, Tree Routing interposed 100 47,572 723 124 109 1.3 29.8 Table 2. Memory usage and performance of Surge on the MicaZ mote platform. outing interposed. The last four columns show the perfor- 1.5 mance metrics, demonstrating that interposing Surge and RATS Synchronization Error (ms) RATS not synchronized TreeRouting does not impact their operation, as the number 1 of packets delivered remains practically the same for all the scenarios (small fluctuations are due to packet losses caused 0.5 by wireless link quality). Moreover, the increase in packet latency for a non-base staton node in the SOS+H ERMES case, expected due to the overhead introduced by H ERMES, 0 is only about 3% in the worst case (both modules inter- posed). For the base staton, the relative increase is higher -0.5 because of the much smaller delivery latency in the base case. We also measured that it takes 2,223 cycles to turn -1 interposition on, which is negligible as compared to around 0 2000 4000 6000 8000 10000 22ms taken to load a new module into the system in SOS. Time (s) We hence conclude that interposition does not significantly impact the performance of the Surge application. Figure 4. Time synchronization error in RATS. In terms of memory usage, both SOS and SOS+H ERMES have the same static RAM footprint, while H ERMES causes only a marginal increase in dynamically allocated memory (heap). Note that more than one module can be interposed into the server’s time, the client uses regression to com- with no increase in memory footprint. The stack size ex- pute an estimate from these tuples. We design an inter- hibits a small increase with interposition on, due to extra position module to provide visibility into the functioning calls redirected through the interposition module. H ERMES of RATS. The interposition module intercepts all incoming adds about 8 KB to the SOS code size (ROM usage). The time-stamp messages for the RATS module at the client. interposition module that we used further increases the code When a time-stamp message arrives from the server, the in- size by about 1 KB. terposition module extracts the time-stamp values for the server and the client from the message. It then queries the RATS module for an estimated time at the server matching 7. Case Studies the time-stamp at the client. It compares the value RATS re- turns (which is an estimate) with the real server time-stamp 7.1. Debugging and Verification to compute the actual error after factoring in transmission delay. The interposition module then copies a snapshot of Section 4 described the utility of H ERMES as a tool the state of the RATS module, along with this actual er- for debugging and monitoring software functionality post- ror value, into a packet, and sends it to the base station. It deployment. This case study explores this aspect of then passes the received time-stamp message through to the H ERMES further by using it to debug and verify the func- RATS module, which continues to function normally. tionality of a specific software component, namely the Even this simple interposition module provides us a lot RATS time-synchronization protocol [26]. of visibility into the RATS protocol. We are able to ob- RATS provides pairwise time synchronization between serve exactly when time-stamp messages are received by sensor nodes. A client node that wishes to synchronize the client and how its state changes as a result. We are its time with a server node receives periodic time-stamped also able to gather insight into the protocol’s performance messages from the server node, which it time-stamps upon through online computation of the actual error in time syn- reception with its current clock value. The client thus main- chronization. Note that it is possible to code more sophis- tains a sequence of tuples comprised of its and the server’s ticated interposition functionality to get even more insight time-stamps. When queried to convert a given local time into the operation of RATS. For instance, one may use the
interposition module to model network/node failures or cor- a service disruption, we propose to run the two versions si- rupted time-stamps and observe how RATS responds. multaneously for the duration required by the new version to warm-up, i.e., build its service state. During the warm-up 7.1.1 Evaluation phase, we interpose both versions of the module to: (i) hide the presence of the updated copy from the rest of the system; We implemented the above described interposition module (ii) keep the old version online and continue to use it to an- and evaluated it on two MicaZ motes. We instrumented swer service requests; and to (iii) fork messages sent to the Surge to use RATS and ran it on both motes for 200 min- old version over to the new module to help it build service utes. The base station acted as the RATS server and the state. Thus, while the updated copy is building state, the other node as the client that tries to synchronize its time old version of the module is ensuring that the sensor net- with the base station to within a preset error limit of 1ms. work remains operational. Once the updated copy warms The interposition module at the client sends back snapshots up, the old version of the module and the interposition mod- of the state of the RATS module, along with the computed ule are removed and the updated module continues servic- error, in response to the arrival of new time-stamped packets ing requests without any interruption. from the base station. We implemented the transparent update feature for the Figure 4 plots the actual error calculated by the inter- tree-routing module using our H ERMES prototype as a case position module versus the time at which the client node study. In order to deal with the issue that SOS does not received the time-stamped packets. The data verifies the allow multiple modules with the same process ID, we in- functionality of the RATS protocol in several ways: (i) it troduce a “back-up” module - identical in functionality to validates the way in which RATS adapts its rate of time- the original tree routing module but with a different pro- stamped packets to the error in time synchronization. Ac- cess ID, to which the original module’s state is copied over, cording to [26], the rate decreases exponentially if the es- and which behaves as a substitute to it (via interposition), in timated error goes down, and it is increased in response to the above process. It should be noted here that no changes increases in the error; (ii) it verifies that the estimated er- were required to the H ERMES implementation for SOS to ror used by RATS to adapt its rate is a good approxima- implement the transparent update feature. We only needed tion of the actual error: when the actual error calculated to implement appropriate interposition modules to provide independently by the interposition module increased above transparent update support. the acceptable limit of 1 ms set by the Surge module (at 8,000 seconds into the run), RATS doubled its rate of send- 7.2.1 Evaluation ing time-stamped messages. Note that, while the interposition module was running, We evaluated the impact of an update to the routing mod- Surge packets were also being sent to the base station. With ule with the Surge application running on a 5-hop 21-node interposition on, the measured average latency of Surge network in Avrora. We ran Surge on two configurations: (i) packets increased from 27ms to 29ms, compared with plain plain SOS, and (ii) SOS+H ERMES implementing the trans- SOS, while the number of packets received at the base sta- parent update support for the routing module. tion remained the same. Thus, the Surge module was neg- For the plain SOS configuration, the update was emu- ligibly affected due to our testing of RATS and the extra lated by first removing the old module and immediately burden on the routing module. inserting an updated copy. For the configuration with H ERMES and transparent update support, the steps followed 7.2. Transparent Software Updates the sequence described above with the SOS process ID workaround. Each run was 1,500 seconds long, and in- In a functional sensor network deployment, it may be- cluded an update to the tree routing module midway through come necessary to update a software module on some or all the run. The results reported are averaged across five such of the sensor nodes. Dynamic updates might be required in runs. From the plots in Figure 5, which show the average order to fix software bugs, introduce additional features, or delivery latency for Surge packets, it is clear that when the tune operational parameters. At the same time, the mod- tree routing module is removed from the system, Surge on ule being updated may be critical to the functionality of the plain SOS sees complete disruption in packets delivered. deployment, requiring the update process to be transparent. Surge with the transparent update functionality runs with no Routing is one such critical service. An interruption to up- apparent disruption, but has higher packet delivery latencies date the routing module would not only disrupt communica- consistent with the overhead of interposition. tion temporarily, but may also result in sub-par performance For this experiment, we also instrumented the SOS ker- upon service resumption due to loss of routing state. nel and the Surge application to collect per-node statistics H ERMES can be used to eliminate the outage caused by for packet drops due to the update. None are reported for updates to such critical modules. Instead of replacing the the configuration with H ERMES and transparent update sup- old version of the module by the updated copy and taking port, while plain Surge suffers packet losses throughout the
1.1 Surge on plain SOS Surge on plain SOS Surge with Transparent Updates 1200 Surge with Transparent Updates 1 1000 0.9 800 Latency (ms) Latency (ms) Duration of Interposition Duration of Interposition 0.8 600 400 0.7 Start routing module update Routing module back online 200 0.6 0 Start routing module update Routing module state rebuilt 0.5 200 400 600 800 1000 1200 1400 400 600 800 1000 1200 1400 Time (s) Time (s) Figure 5. Delivery latency of Surge packets from base station (left) and a node 4 hops away (right). Note that the two plots have different scales. 800 Surge without Rate Control limit, the interposition module merely passes the packet No. of pkts. delivered to the base station 700 Surge with Rate Control through to the network interface, and the corresponding re- sponse is returned to the application. Note that while this 600 case study is a simple illustration, H ERMES offers the flexi- 500 bility for users to define more powerful protocol-aware rate- control schemes. 400 300 7.3.1 Evaluation 200 We evaluated the rate-control scheme on a network of nine 100 MicaZ motes set in a 3x3 grid (1.5 feet apart). Besides run- ning Surge and a TreeRouting module on each mote, we 0 0 1 2 3 4 5 6 7 8 9 also ran a time synchronization protocol (RATS) to mea- Node Address sure the latencies seen by packets during the experiments. One of the motes was designated to be a rogue node, and Figure 6. Packets delivered to base station. emulated a haywire Surge module that, once triggered, sent data packets at eight times the normal rate. In the base set of experiments, we ran Surge over plain SOS without network, ranging from 24 at the base-station, to 50 at a node H ERMES. We then ran Surge with our rate-control scheme 5 hops away. Losses increase for nodes farther away, con- implemented using H ERMES, with the rest of the experi- sistent with the longer duration taken to rebuild routing state mental setup unchanged. Both experiments were run for at those nodes. 2000 seconds, with Surge sending one packet every 8 sec- onds. 7.3. Traffic Shaping and Rate Control Figure 6 shows the number of packets received at the base station from each node, for both cases. It is easy to Section 4 described how H ERMES can be used to per- see that for plain Surge without rate-control, the rogue node form various network management tasks including access ended up successfully sending almost three times the nor- control, traffic shaping, etc. In this case study, we design mal number of data packets. Due to this, nodes 2 and 3, and evaluate an application-specific rate-control scheme us- which were close to node 4, were starved of bandwidth ing H ERMES, to illustrate this capability of our framework. which caused the time synchronization module on them to We implement our rate-control scheme by interposing the fail, crashing the nodes in the process. As a result, nodes application’s (i.e., Surge’s) communication-related I/O calls 2 and 3 report a much lower packet count. For Surge with that are used to send and receive network messages. The the interposed rate-control scheme, each node successfully interposition module simply enforces a variable, developer- delivered approximately the same number of packets to the specified upper limit by dropping packets if the current rate base station, as seen in the figure. The rate-control scheme exceeds the limit. If the current sending rate is below the was thus able to limit the rogue node’s ability to disrupt net-
work operation and ensure fair use of network resources. [9] N. Ramanathan, et al. Sympathy for the Sensor Network Debugger. In Proc. of ACM Sensys, 2005. [10] G. Tolle and D. Culler. Design of an Application- 8. Conclusion Cooperative Management System for Wireless Sensor Net- works. In Proc. of EWSN, 2005. Ensuring reliable software operation in sensor networks [11] M. B. Jones. Interposition Agents: Transparently Interpos- is a crucial problem that cannot be solved by testing in con- ing User Code at the System Interface. In Proc. of USENIX trolled environments using simulation and emulation tools OSDI, 1993. alone and should be done in the real environment. Run- [12] D. P. Ghormley, S. H. Rodrigues, D. Petrou, and T. E. Ander- time visibility and control over program execution are two son. SLIC: An Extensibility System for Commodity Operat- fundamental characteristics that will significantly ease the ing Systems. In Proc. of the USENIX Technical Conference, 1998. job of reliable software development in sensor networks. [13] Y. E. J. Gwertzman, M. Seltzer, C. Small, K. A. Smith, and Towards this, we have proposed H ERMES, a minimally- D. Tang. VINO: The 1994 Fall Harvest. Technical Report intrusive framework based on interposition that enables vis- TR-34-94, Harvard University, 1994. ibility and control of the in-field execution of sensor sys- [14] I. Goldberg, D. Wagner, R. Thomas, and E. A. Brewer. A tems. H ERMES is lightweight and requires no changes to Secure Environment for Untrusted Helper Applications. In the application software whose execution is to be observed Proc. of the USENIX Security Symposium, 1996. or controlled. Through a prototype implementation using [15] D. S. Wallach, D. Balfanz, D. Dean, and E. W. Felten. Ex- a popular sensor operating system and three realistic case tensible Security Architectures for Java. In Proc. of ACM studies, we have demonstrated the flexibility and utility of SOSP, 1997. [16] M. Swift, B. N. Bershad, and H. M. Levy. Improving the H ERMES in providing support for various operations in- Reliability of Commodity Operating Systems. ACM Trans- volved in ensuring reliability in sensor systems. actions on Computer Systems, 23(1), 2005. [17] C. C. Han, R. Kumar, R. Shea, E. Kohler, and M. Srivastava. Acknowledgments SOS: A Dynamic Operating System For Sensor Networks. In Proc. of ACM Mobisys, 2005. We would like to thank Ramesh Govindan (USC), our paper [18] A. Dunkels, B. Grnvall, and T. Voigt. Contiki - A shepherd Koen Langendoen (TU Delft), and the anonymous Lightweight and Flexible Operating System for Tiny Net- reviewers for their insightful comments, suggestions, and worked Sensors. In Proc. of ACM EmNets, 2004. feedback. Nupur Kothari was supported in part by the USC [19] S. Bhatti et al. MANTIS OS: An Embedded Multithreaded Annenberg Graduate Fellowship. Vijay Raghunathan was Operating System for Wireless Micro Sensor Platforms. supported in part by a Purdue Research Foundation grant. ACM MONET (Special Issue on Wireless Sensor Networks), 10(4), 2005. [20] P. J. Marrón, M. Gauger, A. Lachenmann, D. Minder, References O. Saukh, and K. Rothermel. FlexCup: A Flexible and Ef- ficient Code Update Mechanism for Sensor Networks. In [1] G. Tolle, et al. A Macroscope in the Redwoods. In Proc. of Proc. of EWSN, 2006. ACM Sensys, 2005. [21] P. Levis et al. Active Sensor Networks. In Proc. of ACM [2] G. Werner-Allen, K. Lorincz, J. Johnson, J. Lees, and NSDI, 2005. M. Welsh. Fidelity and Yield in a Volcano Monitoring Sen- [22] S. Rangwala, R. Gummadi, R. Govindan, and K. Psounis. sor Network. In Proc. of USENIX OSDI, 2006. Interference-Aware Fair Rate Control in Wireless Sensor [3] Embedded Software Development Issues. http://www. Networks. In Proc. of ACM SIGCOMM, 2006. embeddedforecast.com. [23] G. C. Necula, S. McPeak, S. P. Rahul, and W. Weimer. CIL: [4] K. Whitehouse, et al. Marionette: Using RPC for Interactive Intermediate Language and Tools for Analysis and Transfor- Development and Debugging of Wireless Embedded Net- mation of C Programs. In Proc. of Intl. Conf. on Compiler works. In Proc. of IEEE IPSN, 2006. Construction, 2002. [5] J. Yang, M. L. Soffa, L. Selavo, and K. Whitehouse. Clair- [24] P. Levis, et al. T2: A Second Generation OS for Embedded voyant: A Comprehensive Source-Level Debugger for Wire- Sensor Networks. Technical Report TKN-05-007, Telecom- less Sensor Networks. In Proc. of ACM Sensys, 2007. munication Networks Group, TU Berlin, 2005. [6] L. Girod, et al. A System for Simulation, Emulation, and [25] A. Woo, T. Tong, and D. Culler. Taming the Underlying Deployment of Heterogeneous Sensor Networks. In Proc. Challenges of Reliable Multihop Routing in Sensor Net- of ACM Sensys, 2004. works. In Proc. of ACM Sensys, 2003. [7] P. Levis, N. Lee, M. Welsh, and D. Culler. TOSSIM: Accu- [26] S. Ganeriwal, D. Ganesan, H. Shim, V. Tsiatsis, and M. B. rate and Scalable Simulation of Entire TinyOS Applications. Srivastava. Estimating clock uncertainty for efficient duty- In Proc. of ACM Sensys, 2003. cycling in sensor networks. In Proc. of ACM Sensys, 2005. [8] B. L. Titzer, D. K. Lee, and J. Palsberg. Avrora: Scalable Sensor Network Simulation with Precise Timing. In Proc. of IEEE IPSN, 2005.
You can also read