Similarity Search for Web Services
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Similarity Search for Web Services Xin Dong Alon Halevy Jayant Madhavan Ema Nemes Jun Zhang {lunadong, alon, jayant, enemes, junzhang}@cs.washington.edu University of Washington, Seattle Abstract The growing number of web services available within an organization and on the Web raises a Web services are loosely coupled software compo- new and challenging search problem: locating de- nents, published, located, and invoked across the web. sired web services. In fact, to address this problem, The growing number of web services available within an several simple search engines have recently sprung organization and on the Web raises a new and challeng- up [1, 2, 3, 4]. Currently, these engines provide only ing search problem: locating desired web services. Tradi- simple keyword search on web service descriptions. tional keyword search is insufficient in this context: the As one considers search for web services in more specific types of queries users require are not captured, detail, it becomes apparent that the keyword search the very small text fragments in web services are unsuit- able for keyword search, and the underlying structure paradigm is insufficient for two reasons. First, key- and semantics of the web services are not exploited. words do not capture the underlying semantics of We describe the algorithms underlying the Woogle web services. Current web service search engines re- search engine for web services. Woogle supports similar- turn a particular service if its functionality descrip- ity search for web services, such as finding similar web- tion contains the keywords in the query; such search service operations and finding operations that compose may miss results. For example, when searching zip- with a given one. We describe novel techniques to sup- code, the web services whose descriptions contain port these types of searches, and an experimental study term zip or postal code but not zipcode will not be on a collection of over 1500 web-service operations that returned. shows the high recall and precision of our algorithms. Second, keywords do not suffice for accurately specifying users’ information needs. Since a web- 1 Introduction service operation is going to be used as part of an application, users would like to specify their search Web services are loosely coupled software compo- criteria more precisely than by keywords. Current nents, published, located, and invoked across the web-service search engines often enable a user to ex- web. A web service comprises several operations plore the details of a particular web-service opera- (see examples in Figure 1). Each operation takes tion, and in some cases to try it out by entering a SOAP package containing a list of input param- an input value. Nevertheless, investigating a single eters, fulfills a certain task, and returns the result web-service operation often requires several brows- in an output SOAP package. Large enterprises are ing steps. Once users drill down all the way and find increasingly relying on web services as methodology the operation inappropriate for some reason, they for large-scale software development and sharing of want to be able to find similar operations to the services within an organization. If current trends ones just considered, as opposed to laboriously fol- continue, then in the future many applications will lowing parallel browsing patterns. Similarly, users be built by piecing together web services published may want to find operations that take similar inputs by third-party producers. (respectively, outputs), or that can compose with the current operation being browsed. Permission to copy without fee all or part of this material is To address the challenges involved in searching for granted provided that the copies are not made or distributed web services, we built Woogle1 , a web-service search for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and no- engine. In addition to simple keyword searches, tice is given that copying is by permission of the Very Large Woogle supports similarity search for web services. Data Base Endowment. To copy otherwise, or to republish, A user can ask for web-service operations similar requires a fee and/or special permission from the Endowment. to a given one, those that take similar inputs (or Proceedings of the 30th VLDB Conference, Toronto, Canada, 2004 1 See http://www.cs.washington.edu/woogle 372
W1 : Web Service: GlobalWeather mental evaluation. Section 7 discusses other types Operation: GetTemperature of search that Woogle supports, and Section 8 con- Input: Zip cludes. Output: Return W2 : Web Service: WeatherFetcher Operation: GetWeather 2 Related Work Input: PostCode Finding similar web-service operations is closely re- Output: TemperatureF, WindChill, Humidity W3 : Web Service: GetLocalTime lated to three other matching problems: text doc- Operation: LocalTimeByZipCode ument matching, schema matching, and software Input: Zipcode component matching. Output: LocalTimeByZipCodeResult Text document matching: Document matching W4 : Web Service: PlaceLookup and classification is a long-standing problem in infor- Operation1: CityStateToZipCode mation retrieval (IR). Most solutions to this problem Input: City, State (e.g. [10, 20, 27, 19]) are based on term frequency Output: ZipCode analysis. However, these approaches are insufficient Operation2: ZipCodeToCityState Input: ZipCode in the web service context because text documenta- Output: City, State tions for web-service operations are highly compact, and they ignore structure information that aids cap- Figure 1: Several example web services (not including turing the underlying semantics of the operations. their textual descriptions). Note that each web service includes a set of operations, each with input and out- Schema matching: The database community has put parameters. For example, web services W1 and W2 considered the problem of automatically matching provide weather information. schemas [24, 12, 13, 22]. The work in this area has developed several methods that try to capture outputs), and those that compose with a given one. clues about the semantics of the schemas, and sug- This paper describes the novel techniques we have gest matches based on them. Such methods include developed to support these types of searches, and linguistic analysis, structural analysis, the use of do- experimental evidence that shows the high accuracy main knowledge and previous matching experience. of our algorithms. In particular, our contributions However, the search for similar web-service opera- are the following: tions differs from schema matching in two significant 1. We propose a basic set of search functionali- ways. First, the granularity of the search is differ- ties that an effective web-service search engine ent: operation matching can be compared to finding should support. a similar schema, while schema matching looks for 2. We describe algorithms for supporting similar- similar components in two given schemas that are ity search. Our algorithms combine multiple assumed to be related. Second, the operations in a sources of evidence in order to determine simi- web service are typically much more loosely related larity between a pair of web-service operations. to each other than are tables in a schema, and each The key ingredient of our algorithm is a novel web service in isolation has much less information clustering algorithm that groups names of pa- than a schema. Hence, we are unable to adapt tech- rameters of web-service operations into seman- niques for schema matching to this context. tically meaningful concepts. These concepts are Software component matching: Software com- then leveraged to determine similarity of inputs ponent matching is considered important for soft- (or outputs) of web-service operations. ware reuse. [28] formally defines the problem by ex- 3. We describe a detailed experimental evaluation amining signature (data type) matching and spec- on a set of over 1500 web-service operations. ification (program behavior) matching. The tech- The evaluation shows that we can provide both niques employed there require analysis of data types high precision and recall for similarity search, and post-conditions, which are not available for web and that our techniques substantially improve services. on naive keyword search. Some recent work (e.g., [9, 23]) has proposed an- notating web services manually with additional se- The paper is organized as follows. Section 2 be- mantic information, and then using these annota- gins by placing our search problem in the context tions to compose services [8, 26]. In our context, of the related work. Section 3 formally defines the annotating the collection of web services is infeasi- similarity search problem for web services. Sec- ble, and we rely on only the information provided in tion 4 describes the algorithm for clustering param- the WSDL file and the UDDI entry. eter names, and Section 5 describes the similarity In [15] the authors studied the supervised classi- search algorithm. Section 6 describes our experi- fication and unsupervised clustering of web services. 373
Our work differs in that we are doing unsupervised that the users have already explored a web service matching at the operation level, rather than super- in detail. Suppose they explored the operation Get- vised classification at the entire web service level. Temperature in W1 . We identify the following im- Hence, we face the challenge of understanding oper- portant similarity search queries they may want to ations in a web service from very limited amount of pose: information. Similar operations: Find operations with similar functionalities. For example, the web-service oper- 3 Web Service Similarity Search ation GetWeather in W2 is similar to the operation We begin by briefly describing the structure of web GetTemperature in W1 . Note that we are searching services, and then we motivate and define the search for specific operations that are similar, rather than problem we address. similar web services. The latter type of search is typically too coarse for our needs. There is no for- 3.1 The Structure of Web Services mal definition for operation similarity, because, just like in other types of search, similarity depends on Each web service has an associated WSDL file de- the specific goal in the user’s mind. Intuitively, we scribing its functionality and interface. A web ser- consider operations to be similar if they take similar vice is typically (though not necessarily) published inputs, produce similar outputs, and the relation- by registering its WSDL file and a brief description ships between the inputs and outputs are similar. in UDDI business registries. Each web service con- Similar inputs/outputs: Find operations with sists of a set of operations. For each web service, we similar inputs. As a motivating example for such have access to the following information: a search, suppose our goal is to collect a variety • Name and text description: A web service of information about locations. While W1 provides is described by a name, a text description in the weather, operations LocalTimeByZipCode in W3 and WSDL file, and a description that is put in the ZipCodeToCityState in W4 provide other information UDDI registry. about locations, and thereby may be of interest to • Operation descriptions: Each operation is the user. described by a name and a text description in Alternatively, we may want to search for opera- the WSDL file. tions with similar outputs, but different inputs. For example, we may be looking for temperature, but • Input/Output descriptions: Each input and the operation we are considering takes zipcode as output of an operation contains a set of param- input, while we need one that takes city and state eters. For each parameter, the WSDL file de- as input. scribes the name, data type and arity (if the parameter is of array type). Parameters may Composible operations: Find operations that be organized in a hierarchy by using complex can be composed with the current one. One of the types. key promises of building applications with web ser- vices is that one should be able to compose a set of 3.2 Searching for Web Services given services to create ones that are specific to the application’s needs. In our example, there are two To motivate similarity search for web services, con- opportunities for composition. In the first case, the sider the following typical scenario. Users begin a output of the operation is similar to the input of the search for web services by entering keywords rele- given operation, such as CityStateToZipCode in W4 . vant to the search goal. They then start inspecting Composing CityStateToZipCode with GetWeather in some of the returned web services. Since the result W1 offers another option for getting the weather of the search is rather complex, the users need to when the zipcode is not known. In the second case, drill down in several steps. They first decide which the output of the given operation may be similar to web service to explore in detail, and then consider the input of another operation; e.g., one that trans- which specific operations in that service to look at. forms Centigrade and Fahrenheit and thereby pro- Given a particular operation, they will look at each duces results in the desired scale. of its inputs and outputs, and if the engine provides a try it feature, they will try entering some value for In this paper we focus on the following two problems, the inputs. from which we can easily build up the above search At this point, the users may find that the web ser- capabilities. vice is inappropriate for some reason, but not want Operation matching: Given a web-service opera- to have to repeat the same process for each of other tion, return a list of similar operations. ¤ potentially relevant services. Hence, our goal is to Input/output matching: Given the input (respec- provide a more direct method for searching, given tively, output) of a web-service operation, return a 374
list of web-service operations with similar inputs (re- referred to as terms. We exploit the co-occurrence spectively, outputs). ¤ of terms in web service inputs and outputs to clus- We note that these two problems are also at the ter terms into meaningful concepts. As we shall see core of two other types of search that Woogle sup- later, using these concepts, in addition to the orig- ports (See Section 7): template search and composi- inal terms, greatly improves our ability to identify tion search. Template search goes beyond keyword similar inputs/outputs and hence find similar web search by specifying the functionality, input and out- service operations. put of a desired operation. Composition search re- Applying an off-the-shelf text clustering algo- turns not only single operations, but also composi- rithm directly to our context does not perform well tions of operations that fulfill the user’s need. because the web service inputs/outputs are sparse. For example, whereas synonyms tend to occur in the 3.3 Overview of Our Approach same document in an IR application, they seldom oc- cur in the same operation input/output; therefore, Similarity search for web services is challenging be- they will not get clustered. Our clustering algorithm cause neither the textual descriptions of web services is a refinement of agglomerative clustering. We begin and their operations nor the names of the input and by describing a particular kind of association rules output parameters completely convey the underly- that capture our notion of term co-occurrence and ing semantics of the operation. Nevertheless, knowl- then describe the clustering algorithm. edge of the semantics is important to determining similarity between operation. Broadly speaking, our algorithm combines mul- 4.1 Clustering Parameters by Association tiple sources of evidences to determine similarity. We base our clustering on the following heuristic: In particular, it will consider similarity between the parameters tend to express the same concept if they textual descriptions of the operations and of the en- occur together often. This heuristic is validated by tire web services, and similarity between the param- our experimental results. We use it to cluster pa- eter names of the operations. The key ingredient of rameters by exploiting their conditional probabilities the algorithm is a technique that clusters parameter of occurrence in inputs and outputs of web-service names in the collection of web services into seman- operations. Specifically, we are interested in associ- tically meaningful concepts. By comparing the con- ation rules of the form: cepts that input or output parameters belong to, we t1 → t2 (s, c) are able to achieve good similarity measures. Sec- tion 4 describes the clustering algorithm, and Sec- In this rule, t1 and t2 are two terms. The support, s, tion 5 describes how we combine the multiple sources is the probability that t1 occurs in an input/output; of evidence. i.e., s = P (t1 ) = kIOk kIOt1 k , where ||IO|| is the to- tal number of inputs and outputs of operations, and 4 Clustering Parameter Names ||IOt1 || is the number of inputs and outputs that To effectively match inputs/outputs of web-service contain t1 . The confidence, c, is the probability that operations, it is crucial to get at their underlying t2 occurs in an input or output, given that t1 is kIO k semantics. However, this is hard for two reasons. known to occur in it; i.e., c = P (t2 |t1 ) = kIOt1t,tk2 , 1 First, parameter naming is dependent on the devel- where ||IOt1 ,t2 || is the number of inputs and out- opers’ whim. Parameter names tend to be highly puts that contain both t1 and t2 . Note that the rule varied given the use of synonyms, hypernyms, and t1 → t2 (s12 , c12 ) and the rule t2 → t1 (s21 , c21 ) may different naming rules. They might even not be com- have different support and confidence values. These posed of proper English words—there may be mis- rules can be efficiently computed using the A-Priori spellings, abbreviations, etc. Therefore, lexical ref- algorithm [7]. erences, such as Wordnet [5], are hard to apply. Sec- ond, inputs/outputs typically have few parameters, 4.2 Criteria for Ideal Clustering and the associated WSDL files rarely provide rich descriptions for parameters. Traditional IR tech- Ideally, parameter clustering results should have the niques, such as TF/IDF [25] and LSI [11], rely on following two features: word frequencies to capture the underlying seman- tics and thus do not apply well. 1. Frequent and rare parameters should be left A parameter name is typically a sequence of unclustered; strongly connected parameters in- concatenated words (not necessarily proper English between are clustered into concepts. First, words), with the first letter of every word capitalized not clustering frequent parameters is consistent (e.g., LocalTimeByZipCodeResult). Such words are with the IR community’s observation that such 375
technique leads to the best performance in au- 4.3.1 The basic agglomeration algorithm tomatic query expansion [16]. Second, leaving Agglomerative clustering is a bottom-up version of rare parameters unclustered avoids over-fitting. hierarchical clustering. Each object is initialized to 2. The cohesion of a concept—the connections be- be a cluster of its own. In general, at each iteration tween parameters inside the concept—should the two most similar clusters are merged until no be strong; the correlation between concepts— more clusters can be merged. the connections between parameters in different In our context, each term is initialized to be a concepts—should be weak. cluster of its own; i.e., there are as many clusters Traditionally, cohesion is defined as the sum of as terms. The algorithm proceeds in a greedy fash- squares of Euclidean distances from each point to ion. It sorts the association rules in descending order the center of the cluster it belongs to; correlation is first by the confidence and then by the support. In- defined as the sum of squares of distances between frequent rules with less than a minimum support ts cluster centers [14]. This definition does not apply are discarded. At every step, the algorithm chooses well in our context because of “the curse of dimen- the highest ranked rule that has not been consid- sionality”: our feature sets are so large that a Eu- ered previously. If the two terms in the rule belong clidean distance measure is no longer meaningful. to different clusters, the algorithm merges the clus- We hence quantify the cohesion and correlation of ters. Formally, the condition that triggers merging clusters based on our association rules. cluster I and J is We say that t1 is closely associated to t2 if the rule ∃i ∈ I, j ∈ J . i → j(s > ts , c > tc ) t1 → t2 has a confidence greater than threshold tc . where i and j are terms. The threshold ts is cho- The threshold tc is chosen manually to be the value sen to control the clustering of terms that do not that best separates correlated and uncorrelated pairs occur frequently. We note that in our experiments of terms. the results of operation and input/output matching Given a cluster I, we define the cohesion of I as are not sensitive on the values of ts and tc . the percentage of closely associated term pairs over all term pairs. Formally, 4.3.2 Increasing cluster cohesion k {i, j | i, j ∈ I, i 6= j, i → j(c > tc )} k The basic agglomerative algorithm merges two clus- cohI = ||I||(||I|| − 1) ters together when any two terms in the two clusters are closely associated. The merge condition is very where i → j(c > tc ) is the association rule for term loose and can easily result in low cohesion of clus- i and j. As a special case, the cohesion of a single- ters. To illustrate, suppose there is a concept for term cluster is 1. weather, containing temperature as a term, and a Given clusters I and J, we define the correlation concept for address, containing zip as a term. If, between I and J as the percentage of closely associ- when operations report temperature, they often re- ated cross-cluster term pairs. Formally, port the area zipcode as well, then the confidence of C(I, J) + C(J, I) rule temperature → zip is high. As a result, the basic corIJ = 2 k I kk J k algorithm will inappropriately combine the weather where C(I, J) =k {i, j | i ∈ I, j ∈ J, i → j(c > tc )} k. concept and the address concept. To measure the overall quality of a clustering C, The cohesion of a cluster is decided by the associ- we define the cohesion/correlation score as ation of each pair of terms in the cluster. To ensure P P that we obtain clusters with high cohesion, we merge I∈C cohI kCk (||C|| − 1) I∈C cohI two clusters only if they satisfy a stricter condition, scoreC = P = P I,J∈C,I6=J corIJ 2 I,J∈C,I6=J corIJ called cohesion condition. kCk(kCk−1)/2 Given a cluster C, a term is called a kernel term if it is closely associated with at least half2 of the The cohesion/correlation score captures the remaining terms in C. Our cohesion condition re- trade-off between having a high cohesion score and quires that all the terms in the merged cluster be a low correlation score. Our goal is to obtain a kernel terms. Formally, we merge two clusters I high scoreC that will indicate tight connections in- and J only if they satisfy the cohesion condition: side clusters and loose connections between clusters. ∀i ∈ I ∪ J . k {j | j ∈ I ∪ J, i 6= j, i → j(c > tc )} k 4.3 Clustering Algorithm 1 ≥ (||I|| + ||J|| − 1) We can now describe our clustering algorithm as a 2 series of refinements to the classical agglomerative 2 We tried different values for this fraction and found 1 2 clustering [18]. yielded the best results. 376
• If I 0 6= I, J 0 6= J, then again, merging I and J I I I I’ = I I’ I - I’ I’ I - I’ directly disobeys the cohesion condition. There J J J are two options: one is to split I into I 0 and J’ = J J’ = J J’ J-J’ I −I 0 , split J into J 0 and J −J 0 , and then merge (a) (b) (c) I 0 with J 0 (see Figure 2(c)); the other is not Figure 2: Splitting and merging clusters to split or merge. We choose an option in two steps: the first step checks whether in the first 4.3.3 Splitting and Merging option, the merged result satisfies the cohesion A greedy algorithm pursues local optimal solutions condition; if so, the second step computes the at each step, but usually cannot obtain the global cohesion/correlation score for each option, and optimal solution. In parameter clustering, an inap- chooses the option with a higher score. propriate clustering decision at an early stage may After the above processing, the merged cluster prevent subsequent appropriate clustering. Consider necessarily satisfies the cohesion condition. How- the case where there is a cluster for zipcode {zip, ever, the clusters that are split from the original code}, formed because of the frequent occurrences of clusters may not. To ensure cohesion, we further parameter ZipCode. Later we need to decide whether split such clusters: each time, we split the cluster to merge this cluster with another cluster for address into two, one containing all kernel terms, and the {state, city, street}. The term zip is closely associ- other containing the rest. We repeat splitting un- ated with state, city and street, but code is not be- til eventually all result clusters satisfy the cohesion cause it also occurs often in other parameters such condition. Note that applying such splitting strat- as TeamCode and ProxyCode, which typically do not egy on an arbitrary cluster may generate clusters of co-occur with state, city or street. Consequently, the small size. Therefore, we do not merge two clusters two clusters cannot merge; the clustering result con- directly (without applying the above judgment) and trasts with the ideal one: {state, city, street, zip} and then split the merged cluster. {code}. The solution to this problem is to split already- Remark 4.1. Our splitting-and-clustering tech- formed clusters so as to obtain a better set of clusters nique is different from the dynamic modeling in the with a higher cohesion/correlation score. Formally, Chameleon algorithm [17], which also first splits and given clusters I and J, we denote then merges. We do splitting and clustering at each step of the greedy algorithm. The Chameleon al- I 0 = {i | i ∈ I, ||{j | j ∈ I ∪ J, i → j(c > tc )}|| gorithm first considers the whole set of parameters 1 as a big cluster and splits it into relatively small ≥ (||I|| + ||J|| − 1) 2 sub-clusters, and then repeatedly merges these sub- J 0 = {j | j ∈ J, ||{i | i ∈ I ∪ J, j → i(c > tc )}|| clusters. ¤ 1 ≥ (||I|| + ||J|| − 1) (1) 4.3.4 Removing noise 2 0 0 Intuitively, I (respectively, J ) denotes the set of Even with splitting, the results may still have terms terms in I that are closely associated with terms in that do not express the same concept as other terms the union of I and J. Our algorithm makes splitting in its cluster. We call such terms noise terms. To il- decision depending on which of the four following lustrate how noise terms can be formed, we continue cases occurs: with the zipcode example. Suppose there is a clus- ter for address {city, state, street, zip, code}, where • If I 0 = I, J 0 = J, then I and J can be merged code is a noise term. The cluster is formed because directly (see Figure 2(a)). the rules zip → city, zip → state, and zip → street all • If I 0 6= I, J 0 = J, then merging I and J di- have very high confidence, e.g., 90%; even if the rule rectly disobeys the cohesion condition. There code → zip has a lower confidence, e.g., 50%, the are two options: one is to split I into I 0 and rules code → city, code → state, and code → street I − I 0 , and then merge I 0 with J (see Figure can still have high confidence. 2(b)); the other is not to split or merge. We de- We use the following heuristic to detect noise cide in two steps: the first step checks whether terms. A term is considered to be noise if in half the merged result in the first option satisfies the of its occurrences there are no other terms from the cohesion condition; if so, the second step com- same concept. After one pass of the greedy algo- putes the cohesion/correlation score for each rithm (considering all association rules above a given option, and chooses the option with a higher threshold), we scan the resulting concepts to remove score. The decision is similar for the case where noise terms. Formally, for a term t, denote ||IOt || J 0 6= J, I 0 = I. as the number of inputs/outputs that contain t, and 377
procedure MergeParameters(T , R) return (C) 4.4 Clustering Results // T is the term set, R is the association rule set // C is the result concept set We now briefly outline the results of our clustering for (i = 1, n) Ci = {ti }; //initiate clusters algorithm. Our dataset, which we will describe in sort R first by the descending order of confidence, detail in Section 6, contains 431 web services and then by the descending order of support value; 3148 inputs/outputs. There are a total of 1599 for each (r : t1 → t2 (s > ts , c > tc ) in R) terms. The clustering algorithm converges after the if t1 and t2 are in different clusters I and J seventh run. It clusters 943 terms into 182 concepts. Compute I 0 and J 0 according to formula (1); if (I 0 = I ∧ J 0 = J) merge I and J; The rest 656 terms, including 387 infrequent terms else if (splitting and merging satisfies the (each occurs in at most 3 inputs/outputs) and 54 cohesion condition and has a higher scoreC ) frequent terms (each occurs in at least 30 of the split and merge; inputs/outputs) are left unclustered. There are 59 if (I 00 = I − I 0 and/or J 00 = J − J 0 dense clusters, each with at least 5 terms. Some of does not observe the cohesion condition) them correspond roughly to the concepts of address, split I 00 and/or J 00 iteratively; contact, geology, maps, weather, finance, commerce, scan inputs/outputs and remove noise terms; statistics, and baseball, etc. The overall cohesion is return result clusters; 0.96, correlation is 0.003, and average cohesion for Figure 3: Algorithm for parameter clustering the dense clusters is 0.76. This result observes the two features of an ideal clustering. ||SIOt || as the number of inputs/outputs that con- tain t but no other terms in the same concept of t. 5 Finding Similar Operations We remove t from the concept if ||SIOt || ≥ 21 ||IOt ||. In this section we describe how to predict similarity of inputs/outputs sets and of web-service operations. 4.3.5 Putting it all together We will determine similarity by combining multiple sources of evidence. The intuition behind our match- Figure 3 puts all the pieces together, and shows the ing algorithm is that the similarity of a pair of in- details of a single pass of the clustering algorithm. puts (or outputs) is related to the similarity of the The above algorithm still has two problems. parameter names, that of the concepts represented First, the cohesion condition is too strict for large by the parameter names, and that of the operations clusters, so it may prevent closely associated large they belong to. Note that parameter name similar- clusters to merge. Second, early inappropriate merg- ity compares inputs/outputs on a fine-grained level, ing may prevent later appropriate merging. Al- and concept similarity compares inputs/outputs on though we do splitting, the terms taken off from the a coarse-grained level. The similarity between two original clusters may have already missed the chance web-service operations is related to the similarity of to merge with other closely associated terms. We their descriptions, that of their inputs and outputs, solve the problems by running the clustering algo- and that of their host web services. rithm iteratively. After each pass, we replace each Input/output similarity: We identify the in- term with its corresponding concept, re-collect as- put i of a web-service operation op with a vector sociation rules, and then re-run the clustering algo- i = (pi , ci , op), where pi is the set of input param- rithm. This process continues when no more clusters eter names, and ci is the set of concepts associated can be merged. with the parameter names (as determined by the We illustrate with an example that the iteration clustering algorithm described in Section 4). While of clustering does not sharply loosen the cluster- comparing a pair of inputs, we determine the sim- ing condition. Consider the case where {zip} is not ilarity on each of the three components separately, clustered with {temperature, windchill, humidity}, be- and then combine them. We treat op’s output o as cause zip is closely associated with only temperature, a vector o = (po , co , op), and process it analogously. but not the other two. Another iteration of cluster- ing will replace each occurrence of temperature, wind- Web-service operation similarity: We identify chill and humidity with a single concept, say weather. a web-service operation op with a vector op = The term zip will be closely associated with weather; (w, f, i, o), where w is the text description of the however, the term weather is not necessarily closely web service to which op belongs, f is the textual de- associated with zip, because that requires zip to oc- scription of op, and i and o denote the input and cur often when any of temperature, windchill, or hu- output parameters. Here too, we determine similar- midity occurs. Thus, the iteration will (correctly) ity by combining the similarities of the individual keep the two clusters. components of the vector. 378
Observe that there is a recursive relationship be- adding the terms in their inputs and outputs to the tween the similarity of inputs/outputs and the simi- bag of words. larity of web-service operations. Intuitively, this re- Web service description similarity: To compute lationship holds because each one depends on the the similarity of web service descriptions, we create other, and any decision on how to break this recur- a bag of words from the following: the tokenized sive relationship would be arbitrary. In Section 5.2 web service name, WSDL documentation and UDDI we show that with sufficient care for the choice of description, the tokenized names of the operations in the combination weights, we can guarantee that the the web service, and their input and output terms. recursive computation converges. We again apply TF/IDF on the bag of words. 5.1 Computing Individual Similarities 5.2 Combining Individual Similarities We now describe how we compute similarities for We use a linear combination to combine the similar- each one of the components of the vectors. ity of each component of the operation. Each type Input/output parameter name similarity: of similarity is assigned a weight that is dependent We consider the terms in an input/output as a on its relevance to the overall similarity. Currently bag of words and use the TF/IDF (Term Fre- we set the weights manually based on our analysis quency/Inverse Document Frequency) measure [25] of the results from different trials. Learning these to compute the similarity of two such bags. weights based on direct or indirect user feedback is To improve our accuracy, we pre-process the a subject of future work. terms as follows. As noted earlier, there is a recursive dependency between the similarity of operations and that of in- 1. Perform word stemming and remove stopwords. puts/outputs. We prove that computing the recur- Stemming improves recall by removing term sive similarities ultimately converges. suffixes and reducing all forms of a term to a sin- gle stemmed form. Stopword removal improves Proposition 1. Computing operation similarity precision by eliminating words with little sub- and input/output similarity converges. ¤ stantive meaning. 2. Group terms with close edit distance [21] and Proof (Sketch): Let Sop , Si and So be the simi- replace terms in a group with a normalized larity of operations, of inputs, and of outputs. Let form. This step helps normalize misspelled and wi and wo be the weights for input similarity and abbreviated terms. output similarity in computing operation similarity, 3. Remove from the output bag the terms that and wop be the weight for operation similarity in refer to the inputs. For example, in the out- computing input/output similarity. put parameter LocalTimeByZipCodeResult, the We start by assigning zero to the operation simi- term By indicates that the following terms de- larity, and based upon it compute input/output sim- scribe inputs; thus, terms Zip and Code can be ilarity and operation similarity iteratively. We can removed. prove that if z = wop (win + wout ) < 1, the compu- tation converges and the results are: 4. Extract additional information from names of web-service operations. Most operations are (∞) (0) 1 Sop = Sop · named after the output (e.g., GetWeather), 1−z (∞) (0) wop (0) and some include input information (e.g., Zip- Si = Si + Sop · 1−z CodeToCityState). We put such terms into the wop corresponding input/output bag. So(∞) = So(0) + Sop (0) · 1−z Input/output concept similarity: To compute (0) (0) (0) where sop , si and so are the results of the first the similarity of the concepts represented by the in- (∞) (∞) (∞) puts/outputs, we replace each term in the bag of round, and sop , si and so are the converged words described above with its corresponding con- results. ¤ cept, and then use the TF/IDF measure. Note that the clustering algorithm is applied on the in- 6 Experimental Evaluation put/output terms after preprocessing. We now describe a set of experiments that vali- Operation description similarity: To compute date the performance of our matching algorithms. the similarity of operation descriptions, we consider Our goal is to show that we produce high precision the tokenized operation name and WSDL documen- and recall on similarity queries and to investigate tation as a bag of words, and use the TF/IDF mea- the contribution of the different components of our sure. Furthermore, we supplement information by method. 379
6.1 Experimental Setup Func Comb Woogle We implemented a web-service search engine, called 1 0.9 Woogle, that has access to 790 web services from 0.8 the main authoritative UDDI repositories. The cov- 0.7 erage of Woogle is comparable to that of the other 0.6 Precision 0.5 web-service search engines [1, 2, 3, 4]. We ran our 0.4 experiments on the subset of web services whose as- 0.3 sociated WSDL files are accessible from the web, so 0.2 we can extract information about their functionality 0.1 0 descriptions, inputs and outputs. This set contains Top 2 Top 5 Top 10 431 web services, and 1574 operations in total. (a) Woogle performs parameter clustering, operation matching and input/output matching offline, and Top 2 Top 5 stores the results in a database. TF/IDF was im- 1 plemented using the publicly available Rainbow [6] 0.9 0.8 classification tool. 0.7 Our experiments compared our method, which 0.6 Precision 0.5 we refer to as Woogle, with a couple of naive 0.4 algorithms Func and Comb. The Func method 0.3 matches operations by comparing only the words in 0.2 0.1 the operation names and text documentation. The 0 Comb method considers the words mentioned in Similar Input Similar Output Compose with Input Compose with Output the web service names, descriptions and parameter (b) names as well; in contrast to Woogle, these words are all put into a single bag of words. Figure 4: Top-k precision for Woogle similarity search. Performance Measure: We measured over- all performance using recall(r), precision(p), R- pose with the output of the given operation, and precision(pr ) and Top-k precision (pk ). Consider operations that compose with the input of the given these measures for operation matching. Let Rel be operation. We evaluated the precision of these re- the set of relevant operations, Ret be the set of re- turned lists, and report the average top-2, top-5 and turned operations, Retrel be the set of returned rel- top-10 precision. evant operations, and Retrelk be the set of relevant operations in the top k returned operations. We de- We selected a benchmark of 25 web-service opera- fine tions for which we tried to obtain similar operations |Retrel| |Retrel| from our entire collection. When selecting these, p= , r= we ensured that they are from a variety of domains |Ret| |Rel| and that they have different input/output sizes and |Retrelk | |Retrel|Rel| | description sizes. To ensure the top-10 precision is pk = , pr = p|Rel| = meaningful, we selected only operations for which k |Rel| Woogle and Comb both returned more than 10 Among the above measures, pr is considered relevant operations. (Func may return less than 10 to most precisely capture the precision and rank- relevant operations because typically it obtains re- ing quality of a system. We also plotted the re- sult sets of very small size.) call/precision curve (R-P curve). In an R-P curve Figure 4(a) shows the results for top-k precision figure, the X-axis represents recall, and the Y-axis on operation matching. The top-2, top-5, and top- represents precision. An ideal search engine has a 10 precisions of Woogle are 98%, 83%, 68% respec- horizontal curve with a high precision value; a bad tively, higher than those of the two naive methods by search engine has a horizontal curve with a low pre- 10 to 30 percentage points. This demonstrates that cision value. The R-P curve is considered by the IR considering different sources of evidence, and con- community as the most informative graph showing sidering them separately, will increase the precision. the effectiveness of a search engine. We also observe that Comb has a higher top-2 and top-5 precision than Func, but its top-10 precision 6.2 Measuring Precision is lower. This demonstrates that considering more Given a web service, Woogle generates five lists: sim- evidence by simple combination does not greatly en- ilar operations, operations with similar inputs, op- hance performance. erations with similar outputs, operations that com- Figure 4(b) shows the precision for the four other 380
1 0.9 1 0.8 0.9 0.7 Func 0.8 Comb 0.7 0.6 FuncWS Percentage 0.5 FuncIO 0.6 ParIO Percentage ConIO ParOnly 0.5 0.4 ParConIO ConOnly 0.4 Woogle 0.3 Woogle 0.3 0.2 0.2 0.1 0.1 0 Precision Recall R-precision 0 Precision Recall R-precision (a) (a) 1.2 1.2 1 1 Func 0.8 Comb FuncWS 0.8 Precision ParIO 0.6 FuncIO Precision ConIO ParOnly 0.6 ParConIO ConOnly 0.4 Woogle Woogle 0.4 0.2 0.2 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall Recall (b) (b) Figure 5: Performance for different operation matchers. Figure 6: Performance of different input/output match- ers returned lists. Note that we only reported the top-2 and top-5 precision, as these lists are much smaller From that list we chose the set of similar operations in size. From the 25-operation test set, we selected and labeled them as relevant. The rest are labeled 20 where both input and output parameters are not as irrelevant. In a similar fashion, we label relevant empty, and the sizes of the returned lists are not too inputs and outputs. short. Figure 4(b) shows that for the majority of the In this experiment we also wanted to test the con- four lists, the top-2 and top-5 precisions are between tributions of the different components of Woogle. 80% and 90%. To do that, we also considered the following stripped-down variations of Woogle: 6.3 Measuring Recall • FuncWS: consider only operation descriptions In order to measure recall of similarity search, we and web service descriptions; need to know the set of all operations that are rele- vant to a given operation in the collection. For this • FuncIO: consider only operation descriptions, purpose, we created a benchmark of 8 operations inputs and outputs; from six different domains: weather(2), address(2), • ParOnly: consider all of the four components, stock(1), sports(1), finance(1), and time(1) (weather but compare inputs/outputs based on only pa- and address are two major domains in the web ser- rameter names; vice corpus). We chose operations with different • ConOnly: consider all of the four components, popularity: four of them have more than 30 similar but compare inputs/outputs based on only the operations each, and the other four each have about concepts they express. 10 similar operations. Among the 8 operations, one has empty input, so we have 15 inputs/outputs in Figure 5(a) plots the average precision, recall and total. When choosing the operations, we ensured R-precision on the eight operations in the bench- that their inputs/outputs convey different numbers mark for each of the above matchers and also for of concepts, and the concepts involved vary in pop- Func, Comb, and Woogle. Figure 5(b) plots the ularity. average R-P curves. We observe the following. For each of the 8 operations, we hand-labeled First, Woogle generally beats all other match- other operations in our collection as relevant or ir- ers. Its recall and R-precision are 88% and 78% re- relevant. We began by inspecting a set of operations spectively, much higher than those of the two naive that had similar web service descriptions, or similar methods. Second, considering evidences from dif- operation descriptions, or similar inputs or outputs. ferent sources by simply putting them into a big 381
bag of words (Comb) does not help much. This observe the following. Matching inputs/outputs by strategy only beats Func, which considers evidence comparing the expressed concepts significantly im- from a single source. Even FuncWS, which dis- proves the performance: the three concept-aware cards all input and output information, has a bet- matchers obtain a recall 25 percentage points higher ter performance than Comb. Third, FuncIO per- than that of ParIO. Based on concept compari- forms better than FuncWS. It shows that in oper- son, the performance of input/output matching can ation matching, the semantics of input and output be further improved by considering parameter name provides stronger evidence than the web service de- similarity and host operation similarity. scription. This observation agrees with the intuition that operation similarity depends more on input and output similarity. Fourth, Woogle performs bet- 7 Searching with Woogle ter than ParOnly, and also slightly better than Similarity search supplements keyword search for ConOnly. ParOnly has a higher precision, but web services. Besides, its core techniques power a lower recall; ConOnly has a higher recall, but a other search methods in the Woogle search engine, lower precision. By considering parameter match- namely, template search and composition search. ing (fine-grained matching) and concept matching These two methods go beyond keyword-search by (coarse-grained matching) together, Woogle ob- directly exploring the semantics of web-service op- tains a recall as high as ConOnly, and a precision erations. Because of lack of space, we describe them as high as ParOnly. only briefly. An interesting observation is that Woogle beats FuncIO in precision up till the point when the re- Template search: The user can specify the func- call reaches 80%. Also, the recall of Woogle is 8 tionality, input and output of the desired web-service percentage points lower than that of FuncIO. This operation, and Woogle returns a list of operations is not surprising because verbose textual descrip- that fulfill the requirements. It is distinguished from tions of web services have two-fold effects: on the the keyword search in that (1) it explores the under- one hand, they provide additional evidence, which lying structure of operations; and (2) the parameters helps significantly in the top returned operations, of the returned operations are relevant to the user’s where the input and output already provide strong requirement, but do not necessarily contain the spe- evidence; on the other hand, they contain noise that cific words that the user uses. For example, the user dilutes the high-quality evidence, especially at the can ask for operations that take zipcode of an area end of the returned list where real evidence is not and return its nine-day forecast by specifying input very strong. as zipcode, output as forecast, and description as the In our experiments, we also observe that com- weather in the next nine days. The inputs of the re- pared with the benefits of our clustering technique turned operation can be named zip, zipcode, or post- and that of the structure-aware matching, tuning the code. The outputs can be forecast, weather, or even parameters in a reasonable range and pre-processing temperature, humidity at the end of the list of the the input/output terms improve the performance returned operations. only slightly. Template search is implemented by considering a user-specified template as an operation and applying 6.3.1 Input/output matching the similarity search algorithm. A key challenge is We performed an additional experiment focusing on to perform the operation matching efficiently on-the- the performance of input/output matching. This ex- fly. periment considered the following matchers: Composition search: Much of the promise of web services is the ability to build complex services by • Woogle: matches inputs/outputs by consider- composition. Composition search in Woogle returns ing parameter names, their corresponding con- not only single operations, but also operation com- cepts, and the operations they belong to. positions that achieve the desired functionality. The • ParConIO: considers both parameter names composition can be of any length. For example, and concepts, but not the operations. when an operation satisfying the above search re- • ConIO: considers only concepts. quirement is not available, it will be valuable to re- • ParIO: considers only parameter names. turn a composition of an operation with zipcode as input and city and state as output, and an operation Figure 6(a) shows the average recall, precision with city and state as input and nine-day forecast as and R-precision on the fifteen inputs/outputs in the output. benchmark for each of the above matchers. We also Based on the machinery that we have already plotted the average R-P curves in Figure 6(b). We built for matching operation inputs and outputs, we 382
can discover compositions automatically. The chal- [4] Web service list. http://www.webservicelist.com/. lenge lies in avoiding redundancy and loop in the [5] Wordnet. http://www.cogsci.princeton.edu/ wn/. composition. Another challenge is to discover the [6] rainbow. http://www.cs.cmu.edu/ mccallum/bow, 2003. compositions efficiently on-the-fly. [7] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Verkamo. Fast discovery of association rules. Ad- vances in Knowledge Discovery and Data Mining, 1996. 8 Conclusions and Future Work [8] J. Cardoso. Quality of Service and Semantic Composi- As the use of web services grows, the problem of tion of Workflows. PhD thesis, University of Georgia, searching for relevant services and operations will 2002. get more acute. We proposed a set of similarity [9] D.-S. Coalition. Daml-s: Web service description for the semantic web. In ISWC, 2002. search primitives for web service operations, and de- [10] S. Cost and S. Salzberg. A weighted nearest neighbor scribed algorithms for effectively implementing these algorithm for learning with symbolic features. Machine searches. Our algorithm exploits the structure of the Learning, 10:57–78, 1993. web services and employ a novel clustering mecha- [11] S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. nism that groups parameter names into meaning- Furnas, and R. A. Harshman. Indexing by latent seman- ful concepts. We implemented our algorithms in tic analysis. JASIS, 41(6):391–407, 1990. Woogle, a web service search engine, and experi- [12] H.-H. Do and E. Rahm. COMA - A System for Flexible mented on a set of over 1500 operations. The experi- Combination of Schema Matching Approaches. In Proc. of VLDB, 2002. mental results show that our techniques significantly [13] A. Doan, P. Domingos, and A. Halevy. Reconciling improve the precision and recall compared with two schemas of disparate data sources: a machine learning naive methods, and perform well overall. approach. In Proc. of SIGMOD, 2001. In future work, we plan to expand Woogle to in- [14] D. Hand, H. Mannila, and P. Smyth. Principles of Data clude automatic web-service invocation; i.e., after Mining. The MIT Press, 2001. finding the potential operations, Woogle should be [15] A. Hess and N. Kushmerick. Learning to attach semantic able to fill in the input parameters and invoke the metadata to web services. In ISWC, 2003. operations automatically for the user. This search [16] K. S. Jones. Automatic keyword classification for infor- is particularly promising because it will, in the end, mation retrieval. Archon Books, 1971. be able to answer questions such as “what is the [17] G. Karypis, E. H. Han, and V. Kumar. Chameleon: A hierarchical clustering algorithm using dynamic model- weather of an area with zipcode 98195.” ing. COMPUTER, 32, 1999. While this paper focuses exclusively on searches [18] L. Kaufman and P. J. Rousseeuw. Finding Groups in for web services, the search strategy we have de- Data: An Introduction to Cluster Analysis. John Wiley veloped applies to other important domains. As a & Sons, New York, 1990. prime example, if we model web forms as web ser- [19] L. S. Larkey. Automatic essay grading using text classi- vice operations, a deep-web search can be performed fication techniques. In Proc. of ACM SIGIR, 1998. by first searching appropriate web forms with a de- [20] L. S. Larkey and W. Croft. Combining classifiers in text sired functionality, and then automatically filling in categorization. In Proc. of ACM SIGIR, 1996. the inputs and displaying the results. As another [21] V. Levenshtein. Binary codes capable of correcting dele- tions, insertions and reversals. Soviet Physics Daklady, example, applying template search and composition 10:707–710, 1966. search to class libraries (considering each class as [22] S. Melnik, H. Garcia-Molina, and E. Rahm. Similarity a web service, and each of its methods as a web- Flooding: A Versatile Graph Matching Algorithm. In service operation) would be a valuable tool for soft- Proc. of ICDE, 2002. ware component reusing. [23] M. Paolucci, T. Kawmura, T. Payne, and K. Sycara. Se- mantic matching of web services capabilities. In Proc. of International Semantic Web Conference(ISWC), 2002. Acknowledgments [24] E. Rahm and P. A. Bernstein. A survey on approaches We would like to thank Pedro Domingo, Oren Et- to automatic schema matching. VLDB Journal, 10(4), 2001. zioni and Zack Ives for many helpful discussions, and thank the reviewers of this paper for their insight- [25] G. Salton, editor. The SMART Retrieval System— Experiments in Automatic Document Retrieval. Prentice ful comments. This work was supported by NSF Hall Inc., Englewood Cliffs, NJ, 1971. ITR grant IIS-0205635 and NSF CAREER grant [26] E. Sirin, J. Hendler, and B. Parsia. Semi-automatic com- IIS-9985114. position ofweb services using semantic descriptions. In WSMAI-2003, 2003. References [27] Y. Yang and J. Pedersen. A comparative study on fea- ture selection in text categorization. In International [1] Binding point. http://www.bindingpoint.com/. Conference on Machine Learning, 1997. [2] Grand central. http://www.grandcentral.com/directory/. [28] A. M. Zaremski and J. M. Wing. Specification matching of software components. TOSEM, 6:333–369, 1997. [3] Salcentral. http://www.salcentral.com/. 383
You can also read