Automated Identification of Semantic Similarity between Concepts of Textual Business Rules - inass
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Received: August 14, 2020. Revised: October 19, 2020. 147 Automated Identification of Semantic Similarity between Concepts of Textual Business Rules Abdellatif Haj1* Youssef Balouki1 Taoufiq Gadi1 1 Laboratory of Mathematics, Computer and Engineering Sciences, Department of Maths, Faculty of Sciences and Techniques, Hassan I University, Settat, Morocco * Corresponding author’s Email: haj.abdel@gmail.com Abstract: Business Rules (BR) are usually written by different stakeholders, which makes them vulnerable to contain different designations for a same concept. Such problem can be the source of a not well orchestrated behaviors. Whereas identification of synonyms is manual or totally neglected in most approaches dealing with natural language Business Rules. In this paper, we present an automated approach to identify semantic similarity between terms in textual BR using Natural Language Processing and knowledge-based algorithm refined using heuristics. Our method is unique in that it also identifies abbreviations/expansions (as a special case of synonym) which is not possible using a dictionary. Then, results are saved in a standard format (SBVR) for reusability purposes. Our approach was applied on more than 160 BR statements divided on three cases with an accuracy between 69% and 87% which suggests it to be an indispensable enhancement for other methods dealing with textual BR. Keywords: Business rules, Semantic similarity, Synonym extraction, Abbreviation identification, NLP. stakeholders. Using more than one designation for 1 Introduction the same concept is one of theme. BRs are generally neither few statements to be Business Rules are expected to assert the manually processed, nor enough to be processed structure or control the behaviour of the business [1]. using machine learning methods. Probably, this They should be separated from other layers and justifies the fact that approaches tend to ignore the presented in an unambiguous language to all existence of synonyms [4], or provide a user interface stakeholders [2]. The heterogeneity of different (UI) to add theme manually [5], or assume the prior intervenors coupled with the divergence of their existence of a domain knowledge containing the interests make the natural language the most business vocabulary [6]. While other approaches try appropriate format to express business rules. Then, IT to identify synonyms at advanced steps of the people tend to transform them to a more formal development process [7]. However, the identification languages for analyse and design purposes. Doing of semantic similarity between terms at early steps such transformation manually is time consuming and will - without a doubt - improve the quality of next error prone, which explain the great interest in the last steps models, and then decrease the number of decade to make it automatic. iterations as well as the process time. Approaches dealing with textual business needs In this paper, we present an approach to enhance largely rely on Natural Language Processing (NLP), existing approaches dealing with NL BR statements or adopt a Controlled Natural Language (CNL) [3] by automatically identifying synonyms using NLP, easily understood by machines. knowledge-based algorithm, and heuristics. Existing approaches focus on extracting Our approach was tested over more than 160 BR knowledge (semantic) from text, but ignore other statements divided on 3 different cases, with an issues leading to inconsistencies, resulting specially accuracy between 67% and 80%. from the fact that BR are usually written by different International Journal of Intelligent Engineering and Systems, Vol.14, No.1, 2021 DOI: 10.22266/ijies2021.0228.15
Received: August 14, 2020. Revised: October 19, 2020. 148 This paper should be of interest to all software least one common statement have two different engineering people who work with natural language, meaning. especially those who are interested in transforming BR statements should be atomic, well-formed, text to formal specifications. This approach makes and written in business terms [1] instead of the following contributions: paragraphs. Thus, limiting term context to terms 1. Enhance methods dealing with Text to Model surrounding it (such in machine learning methods) is transformation, by identifying terms with same not a good choice in BR environment (H1 & H2). meaning. Furthermore, one BR statement is often formulated 2. Improve business vocabulary formulation. by one person; thus, the probability to find synonym 3. Preserve the BR integrity. terms in the same statement is very low compare to Automatic extraction of synonym terms is largely other statements written by other people (H3). Finally, related to semantic similarity (relatedness) we consider that the writer of statement intentionally identification between entities, namely: terms, chooses proper terms for proper expressions; thus phrases, paragraphs, or documents [8-9]. To identify hypernym (hyponym) are not considered as synonym. Word-to-Word similarity, 3 main methods are used Altogether, in our word we decided to identify terms in the literature [10] : (i) String-based methods in having same dictionary-based sense before to be which similarity between two character strings are refined using heuristics. In addition, a solution to measured; (ii) Corpus-based methods use statistics relate abbreviations with their expansions collected from collection of texts (Corpus); and (iii) (definitions) is also proposed. Knowledge-based methods which depend on The next section looks at existing approaches predefined linguistic information derived from dealing with synonym identification from natural semantic networks such as WordNet [11]. Certainly, language specifications; Section 3 details our these methods are not all suitable to our objective. For approach to identify synonym terms from textual BR example, string-based methods surely will identify as well as abbreviation-expansion pairs; Section 4 entities having close sequence of characters; but evaluates our method and discuses obtained results; result such as “renter” and “rental” are considered and Section 6 draws the limits coupled with false positive in our case. As well, knowledge-based directions for future work and concludes the paper. methods alone are not sufficient, and must be supported by the context in which the term was found, 2 Related works as clearly shown in the definition of the “synonym” Despite the problems that may arise from the concept: “words that denote the same concept and existence of more than one designation with same are interchangeable in many contexts” (WordNet meaning, synonyms identification is neglected in [11]). Regarding corpus-based approaches, the most approaches dealing with NL BRs as can be seen context plays an important role. For example, in Table 1. sentences may be transformed to a Vector Space In this section we will give an overview of Models [12] to extract their meaning. Word2Vec existing approaches dealing with synonyms method [13] is widely used at this level to establish identification from natural language specifications. semantic similarity via word statistics which relay on In other approaches, there is an important the context in which terms are used. The accuracy of difference in the way semantic similarity between methods relying on this solution, is related to the terms is handled. V. Gruhn [15] have created a large number of data available at the input size. catalogue of synonyms and antonyms to help in As a solution, we tried to put ourselves in the detecting error patterns that can be found in Event- shoes of business writers, to recognize the different Driven Process Chains. A. Awad [16] proposed an possible sources of synonym before making the automated approach for querying a business process following hypotheses: model repository, in which the most common sense H1) Terms of the same statement are considered was selected from WordNet [11] dictionary. Other neighbours of each other. In other words, all terms of approaches tried to extract synonyms using the same statement participate somehow to construct dictionary (like WordNet) without explanation the context of each term of this statement. neither about what was the algorithm used, nor if the H2) Terms having no common neighbour means context was taken into consideration or not, such in that they have different meaning, even if they are [17, 18] in which SBVR business vocabularies and synonym according to dictionaries. rules are extracted from UML use case diagrams; or H3) Terms which are neighbours are not [19] which presented an automatic approach to synonym. Which means that terms appearing in at generate BPMN models from natural language text; International Journal of Intelligent Engineering and Systems, Vol.14, No.1, 2021 DOI: 10.22266/ijies2021.0228.15
Received: August 14, 2020. Revised: October 19, 2020. 149 Table 1. A comparative table of approaches dealing with synonyms in textual specifications Approach Input Output Synonym Identification P. K. Chittimalli [4] Textual Business SBVR: Entities, facts, and No Rules rules S. Roychoudhury [5] legal English text SBVR model Manual I. S. Bajwa [6] Textual OCL Constraints Manual specification M. Selway [14] Textual Business Formal model Manual specification V. Gruhn [15] Event-Driven error patterns Manual Process Chains P. Danenas [17] Use Case Diagram SBVR Business Based entirely on dictionary results. Vocabularies and Rules (method not cited) T. Skersys [18] Use Case Diagram SBVR Business Based entirely on dictionary results. Vocabularies and Rules (method not cited) F. Friedrich [19] Natural Language BPMN models Based entirely on dictionary results. Text (method not cited) Y. Zhang [20] English requirement-to-code links Based entirely on dictionary results. requirements (method not cited) A. Awad [16] Business Process List of process Based entirely on dictionary result model query models (most common sense) M. Ehrig [21] Business Process Similarity between Semi-automatic: require a pre- models Business Process models existing ontology-based vocabulary. F. Pittke [7] Textual Process Homonyms and synonyms Automatic: Combine scores models terms computed from translations in different languages using BabelNet F. Dalpiaz [24] user stories near-synonyms terms Automatic: calculate the similarity between terms then between their context using Cortical.io Our Method Natural Language Synonyms terms & Automatic: use NLP and knowledge- Business Rules abbreviation/expansion based algorithm refined using saved according to SBVR heuristics. standard. or [20] which combined various features to recover collection of websites to calculate the semantic requirement-to-code links. M. Ehrig [21] proposed an similarity. approach for (semi-) automatic detection of It can be concluded that, there is no single agreed synonyms and homonyms of process element names method to extract synonyms from text which is valid using a preexisting ontology-based vocabulary get for all goals; but we are facing a non-trivial task that from UML Profile. They combined three similarity differs depending on the objective and context. measures, syntactic (compare the number of common In the end, despite the divergence in their area of characters), linguistic (the number of WordNet application, the last approach [24] remains the closest senses), and structural (places of concepts with their to ours, as it extracts nouns automatically from a set attributes and transitions). F. Pittke [7] proposed a of sentences, before calculating the similarity by technique to detect and resolve lexical ambiguities in taking the context into consideration. While our textual process models caused by homonyms and approach deals with textual BR statements having no synonyms by combining scores computed from standard format and uses heuristics to improve the translations in different languages. To find word with accuracy of identified synonyms. Our approach is similar senses, they used a graph-based multilingual unique in that it also identifies joint approach based on the multilingual knowledge abbreviations/expansions (as a special case of base BabelNet [22] with XMeans clustering [23]. F. synonym) which is not possible using a dictionary, Dalpiaz [24] try to detect terminological ambiguity in before saving obtained results in a standard format user stories by calculating the similarity between (SBVR) for reusability purposes. terms then between their context. For this reason, Cortical.io [25] is used, which is considered as a very powerful AI-based tool, with a high processing speed. Cortical.io relies on matrices created from a large International Journal of Intelligent Engineering and Systems, Vol.14, No.1, 2021 DOI: 10.22266/ijies2021.0228.15
Received: August 14, 2020. Revised: October 19, 2020. 150 3 Our approach Our aim is to enhance approaches dealing with BR written in natural language by identifying terms Figure. 2 Example of our NLP pipeline output having the same meaning. Initially, our approach begins with a linguistic analysis using an NLP pipeline to tokenize, tag, and generate the parse tree of each statement. After that, single as well as multi word terms are extracted using algorithm 1. Next, synonym terms are identified using algorithm 2. Then, Abbreviations are identified using algorithm 3. Finally, an SBVR-based XMI file [26] is generated containing concepts and their synonyms (Fig. 1). SBVR, or “Semantic of Business Vocabulary and Rules” [26] is an Object Management Group (OMG) standard [27] that help the business vocabulary and rules interchanging amongst organizations in totally technology- independent format. 3.1 Step 1: Linguistic analyses In this step, each BR statement is splitted into tokens. Then, Stanford CorNLP [28] is used to annotate tokens of each statement with a part-of- speech (POS) tag, (e.g. noun, verb, adjective etc.). Finally, a dependency parse tree is generated for each statement. Fig. 2 shows an example of obtained result. The output result is saved in an XML file and used as input in the next steps. 3.3 Step 3: Similarity identification 3.2 Step 2: Terms extraction In this step, extracted terms are compared with First, for each BR statement, we extract all nouns each other to identify those having same meaning. At and mark them as a single word noun. Next, each the same time, terms having abbreviation format are noun founded in a compound relation with other identified before seeking their expansions nouns or having modifiers will be rectified to create (definitions) if exist. Finally, the frequency of each a multi word noun and added as a new entry term is calculated to distinguish the preferred according to algorithm 1. designation. 3.3.1. Synonym identification To label terms as candidate to be semantically similar, they should have a distance greater than “0,5” regarding the dictionary senses. In addition, according to previously cited hypotheses (H1, H2, H3), similar concepts should have at least one common neighbor, but no common statement (see algorithm 2). The distance “0,5” was adopted after a set of experiments (see subsection 4.1). 3.3.2. Abbreviation-expansion identification Figure. 1 An overview of our approach In this sub section we will relate terms having an abbreviation format to their expansion if exist. There is Three main approaches to identify Abbreviation - Expansion pairs [29]: (i) heuristics International Journal of Intelligent Engineering and Systems, Vol.14, No.1, 2021 DOI: 10.22266/ijies2021.0228.15
Received: August 14, 2020. Revised: October 19, 2020. 151 approaches (NLP and pattern-based) in which rules are applied [30-35]; (ii) machine learning based approaches [36] in which labelled examples are used for machine training. (iii) Web-Based approaches, in which abbreviation/definitions are identified based on Web resources [37]. There are also hybrid approaches such in [38] which mixes hand-coded constraints with supervised learning. Web-based approaches will be a good choice for the well-known abbreviations, but business rules Algorithm 3 is based on the following heuristics: usually use a domain specific terminology in which Letters of an abbreviation should match the first uncommon abbreviations are used. In the other hand, letter (or letters) of each word in the full form. machine learning based approaches require a lot of Potential abbreviation will be compared with data for a good machine training which is not possible multi words terms (composed of more than one word) in limited number of business rule statements. Since An abbreviation is not a known (dictionary) word. abbreviations usually follow standard format defined For this reason, they are generally tagged as proper in their dictionary-based definition; (“An nouns (NNP). abbreviation consisting of the first letters of each Some special cases are taken into consideration word in the name of something, pronounced as a such as: word” - Cambridge dictionary) we decided to use “X” can be related to a word staring with “EX” heuristics each of which can reject any candidate (e.g. “Exchange” of “XML”) abbreviation. Lowercased “s” at the end of the abbreviation Strong constraints can reject true positive pairs of could be ignored as it is usually used for plural. abbreviation/expansion, such as : an abbreviation “2” and “4” can mean “to” and “for” respectively. must be in one word between parentheses [39-40], or A candidate abbreviation is compared with its must be in capital letters [35], or adjacent to potential expansion in two times: with and without parentheses [31], or expansions are always considering prepositions. surrounding their abbreviations [41]. For this reason, we relied the following algorithm (algorithm 3) that 3.4 Step 4: SBVR XMI output file do not go beyond the dictionary-based definition of abbreviation: After extracting all terms and identifying synonyms and abbreviations with preferred designations, our result is saved in an XMI file that International Journal of Intelligent Engineering and Systems, Vol.14, No.1, 2021 DOI: 10.22266/ijies2021.0228.15
Received: August 14, 2020. Revised: October 19, 2020. 152 Figure. 5 The accuracy of LIN algorithm by distances for Figure. 3 An excerpt of the SBVR XMI file in which two synonyms detection synonym terms are saved As can be seen in Fig. 4, LIN algorithm was respect the SBVR XMI Schema [26]. Fig. 3 is an adopted in our approach as it gives more accurate example. results compared to others. Adopted algorithm has been tested using different 4 Experiments and evaluation distances between terms and it has been concluded that by selecting terms having a distance greater than Our experiments were carried out in two steps. 0,5 we could obtain the most accurate results as can First, we have tested some knowledge-based be seen in Fig. 5. algorithms to select the most accurate one to be adopted in our approach. Then, results obtained from 4.2 Validating our approach selected algorithm has been refined using our heuristics. Our approach has been tested over more than 160 BR statements taken from 3 different sources: T. 4.1 Selecting knowledge-based algorithm Morgan [2] (case 1), G. C. Witt [49] (case 2) and I. Graham [50] (case 3). To choose the most accurate knowledge-based To remove all disturbances that can be caused by algorithm, we have injected 20 pairs of random the ambiguity of the natural language, we have synonym terms in a set of 157 BR statements reformulated some statements to follow quality randomly selected from EU-Rent case study found in criteria cited in [49]. However, few statements (3%) SBVR documentation [26]. were incorrectly tagged by Stanford CoreNLP [28]. Next, we have used 8 algorithms to identify The source of error at this level was either a confusion injected synonyms, namely: WUP [42], RES [43], between noun and verb (such in “references” and JCN [44], LIN [45], LCH [46], LESK [47] , HSO [48] “requests”), or a lack of some punctuation marks and PATH. These algorithms were tested using which has affected some statement parsing. different distances between terms each time. Nevertheless, 97% of correct analysed statements is considered very satisfying. Table 2 summarises obtained results at this first step of our approach which concerns the linguistic analyse. To simulate BR writing context, we have divided each of the three cases on 3 parts; then we sent one part of each case to two different engineers and asked Table. 2 Number of statements and extracted terms for each case Source BR Terms Terms Incorrectly statements Without analysed repetition terms Case 1 48 141 83 2,9% Case 2 56 216 91 5,5% Figure. 4 The precision and F1-score obtained from Case 3 58 310 66 0,6% different knowledge based algorithms International Journal of Intelligent Engineering and Systems, Vol.14, No.1, 2021 DOI: 10.22266/ijies2021.0228.15
Received: August 14, 2020. Revised: October 19, 2020. 153 them to reformulate some BR statements by Table. 3 Our approach results of synonyms identification introducing some synonyms and abbreviations. Next, for each case our approach tried to identify injected synonyms and Source Precision Recall F1-Score Repeated abbreviations. terms Case 1 67% 71% 69% 41% 4.2.1. Synonyms identification Case 2 73% 79% 76% 58% To evaluate synonyms identification, we Case 3 87% 87% 87% 78% calculated the number of True Positive (TP), False Positive (FP) and False Negative (FN) leading to The precision, Recall and F1-score for each case calculate the Precision (Eq. (1)), Recall (Eq. (2)) and are summarized in Table 3 and Fig. 7. F1-Score (Eq. (3)). According to Fig. 7, it is clearly recognized that the precision is proportional to the frequency rate of = (1) each term. This could be attributed to the fact that + more terms appear frequently in different statements, = (2) more they reflect a comprehensive meaning, and + more their similar terms are likely to be identified. 2 This conclusion is approved by the fact that most − = + (3) synonym pairs that were not identified appear in less than 3 different statements. Fig. 6 shows to what extent our heuristics were Taken together, these results suggest that all able to improve the precision of synonyms identified terms having at least one common neighbour term but by the adopted knowledge-based algorithm (LIN no common statement; and having a distance greater measure). There was a precision improvement than 0,5 according to LIN algorithm are more likely between 29% and 42%, when we have selected to be synonym. Nevertheless, few terms that were synonym terms having at least one common synonym according to their context and have neighbour term. common terms as neighbours were missed because of their distance less than 0,5 (e.g. “confirmation” and “response” have 0.43). As well, some terms with a distance greater than 0,5 but do not have a common neighbour were also missed. 4.2.2. Abbreviations identification Regarding abbreviation-expansion, all pairs that have respected the standard format defined by the abbreviation definition cited above, were correctly identified with a high precision. Standard abbreviations such as “second=2nd” and Figure. 6 The precision of synonyms identification using “centimeter=cm” could be easily identified using knowledge-based algorithm with and without heuristics dictionaries, so they were neglected as they may affect our results. The precision, Recall and F1-score for each case are summarized in Table 4. However, few abbreviations have been related to more than one expansion, caused specially by abbreviations composed of two letters such as “BC” Table 4. Our approach results of abbreviation-expansion identification for each case Source Precision Recall F1 Case 1 86% 86% 92% Figure. 7 The relation between the precision of our Case 2 86% 86% 92% approach and terms frequency Case 3 100% 100% 100% International Journal of Intelligent Engineering and Systems, Vol.14, No.1, 2021 DOI: 10.22266/ijies2021.0228.15
Received: August 14, 2020. Revised: October 19, 2020. 154 which is related to both “Business Class” and Author Contributions “Booking Confirmation”. As well, some abbreviations that did not respect the definition were All authors analysed the data, discussed the also not identified such as “Average Hours of work results, provided critical feedback, helped shape the research and contributed to the final manuscript. per Week = Ah/w”. The latter should be written as “Ahw/w” to fit the definition and then could be identified by our approach. References [1] “Business Rules Group”, 5 Conclusion http://www.businessrulesgroup.org/ (accessed Our contribution is to enhance existing Aug. 02, 2020). approaches dealing with natural language business [2] T. Morgan, Business rules and information rules by automatically identifying synonym terms as systems: aligning IT with business goals. well as abbreviation-expansion pairs. Our aim is to Boston: Addison-Wesley, 2002. extract terms having same meaning but different [3] T. Kuhn, “A Survey and Classification of designations. For this reason, heuristics were used Controlled Natural Languages”, Comput. with Natural Language Processing without human Linguist., Vol. 40, No. 1, pp. 121–170, 2014, doi: 10.1162/COLI_a_00168. intervention. Obtained results are stored in an SBVR- [4] P. K. Chittimalli, C. Prakash, R. Naik, and A. based XMI to facilitate its integration in existing approaches. Bhattacharyya, “An Approach to Mine SBVR Our experiments were carried out on three Vocabularies and Rules from Business different groups of business rules from three different Documents”, In: Proc. of the 13th Innovations in domains. Amongst a total of 160 natural language Software Engineering Conf. on Formerly known statements, just 3% were incorrectly tagged by the as India Software Engineering Conf. Jabalpur, NLP tool. Regarding the accuracy, it ranged between India, pp. 1–11, 2020, doi: 10.1145/3385032.3385046. 69% and 87% for synonyms identification, and more than 90% for abbreviation/expansion relations. In the [5] S. Roychoudhury, S. Sunkle, D. Kholkar, and V. other hand, our heuristics prove their effectiveness Kulkarni, “From Natural Language to SBVR Model Authoring Using Structured English for with regards to the precision improvement which Compliance Checking”, In: Proc. of 2017 IEEE ranged between +29% and +42%. 21st International Enterprise Distributed Object The efficiency of our approach relies principally on two factors: (i) Term frequency: as it depends on Computing Conf. (EDOC), pp. 73–78, 2017, doi: term’s neighbours to reflect its real meaning. (ii) 10.1109/EDOC.2017.19. Knowledge-based measure: which gives the degree [6] I. S. Bajwa and M. A. Shahzada, “Automated of similarity between words using information Generation of OCL Constraints: NL based derived from semantic networks. Approach vs Pattern Based Approach”, Mehran The present method has only examined Univ. Res. J. Eng. Technol., Vol. 36, No. 2, Art. abbreviation-expansion pairs, as well as synonym No. 2, 2017, doi: 10.22581/muet1982.1702.04. terms composed of same number of words having a [7] F. Pittke, H. Leopold, and J. Mendling, dictionary-based relation. Nevertheless, Other types “Automatic Detection and Resolution of Lexical of synonyms going beyond the scope of this approach Ambiguity in Process Models”, IEEE Trans. should be also identified especially those composed Softw. Eng., Vol. 41, No. 6, pp. 526–544, 2015, of terms having different senses. doi: 10.1109/TSE.2015.2396895. Machine learning methods are undoubtedly the [8] D. R. Kubal and A. V. Nimkar, “A survey on most adequate solution to identify complex terms that word embedding techniques and semantic have no dictionary-based relation but require rather similarity for paraphrase identification”, Int. J. more data for training which is not usually available Comput. Syst. Eng., Vol. 5, No. 1, p. 36, 2019, in BR context. Thus, we believe that our method doi: 10.1504/IJCSYSE.2019.098417. could be the most efficient alternative, regarding the [9] M. Farouk and Department of Thermotechnik Computer Science, Assiut University, Markaz relatively small number of BR statements and could be usefully employed to improve approaches dealing El-Fath, Assiut Governorate 71515, Egypt;, with NL BR. “Measuring Sentences Similarity: A Survey”, Indian J. Sci. Technol., Vol. 12, No. 25, pp. 1– 11, 2019, doi: Conflicts of Interest 10.17485/ijst/2019/v12i25/143977. The authors declare no conflict of interest. International Journal of Intelligent Engineering and Systems, Vol.14, No.1, 2021 DOI: 10.22266/ijies2021.0228.15
Received: August 14, 2020. Revised: October 19, 2020. 155 [10] W. H. Gomaa and A. A. Fahmy, “A Survey of [21] M. Ehrig, A. Koschmider, and A. Oberweis, Text Similarity Approaches”, Int. J. Comput. “Measuring similarity between semantic Appl., Vol. 68, No. 13, pp. 13–18, 2013. business process models”, In: Proc. of the [11] C. Fellbaum, Ed., WordNet: an electronic lexical Fourth Asia-Pacific Conf. on Comceptual database. Cambridge, Mass: MIT Press, 1998. modelling - Volume 67, Ballarat, Australia, pp. [12] P. D. Turney and P. Pantel, “From Frequency to 71–80, 2007, Accessed: Aug. 03, 2020. [Online]. Meaning: Vector Space Models of Semantics”, [22] R. Navigli and S. P. Ponzetto, “BabelNet: The J. Artif. Intell. Res., Vol. 37, pp. 141–188, 2010, automatic construction, evaluation and doi: 10.1613/jair.2934. application of a wide-coverage multilingual [13] T. Mikolov, K. Chen, G. Corrado, and J. Dean, semantic network”, Artif. Intell., Vol. 193, pp. “Efficient estimation of word representations in 217–250, 2012, doi: vector space”, ArXiv Prepr. ArXiv13013781, 10.1016/j.artint.2012.07.001. 2013. [23] T. Ishioka, “Extended K-means with an Efficient [14] M. Selway, G. Grossmann, W. Mayer, and M. Estimation of the Number of Clusters”, in Stumptner, “Formalising natural language Intelligent Data Engineering and Automated specifications using a cognitive Learning — IDEAL 2000. Data Mining, linguistic/configuration based approach”, Inf. Financial Engineering, and Intelligent Agents, Syst., Vol. 54, pp. 191–208, 2015, doi: Vol. 1983, K. S. Leung, L.-W. Chan, and H. 10.1016/j.is.2015.04.003. Meng, Eds. Berlin, Heidelberg: Springer Berlin [15] V. Gruhn and R. Laue, “Detecting Common Heidelberg, pp. 17–22, 2000. Errors in Event-Driven Process Chains by Label [24] F. Dalpiaz, I. van der Schalk, S. Brinkkemper, F. Analysis”, Enterp. Model. Inf. Syst. Archit. B. Aydemir, and G. Lucassen, “Detecting EMISAJ, Vol. 6, No. 1, Art. No. 1, 2011, doi: terminological ambiguity in user stories: Tool 10.18417/emisa.6.1.1. and experimentation”, Inf. Softw. Technol., Vol. [16] A. Awad, A. Polyvyanyy, and M. Weske, 110, pp. 3–16, 2019, doi: “Semantic Querying of Business Process 10.1016/j.infsof.2018.12.007. Models”, In: Proc. of 2008 12th International [25] “Cortical.io - Next generation of AI business IEEE Enterprise Distributed Object Computing applications”. https://www.cortical.io/ (accessed Conf., pp. 85–94, 2008, doi: Aug. 03, 2020). 10.1109/EDOC.2008.11. [26] “Semantics of Business Vocabulary and Rules”. [17] P. Danenas, T. Skersys, and R. Butleris, “Natural https://www.omg.org/spec/SBVR/About- language processing-enhanced extraction of SBVR/ (accessed Aug. 03, 2020). SBVR business vocabularies and business rules [27] “Model Driven Architecture (MDA) | Object from UML use case diagrams”, Data Knowl. Management Group”. Eng., p. 101822, 2020, doi: https://www.omg.org/mda/ (accessed May 02, 10.1016/j.datak.2020.101822. 2020). [18] T. Skersys, P. Danenas, and R. Butleris, [28] C. D. Manning, M. Surdeanu, J. Bauer, J. R. “Extracting SBVR business vocabularies and Finkel, S. Bethard, and D. McClosky, “The business rules from UML use case diagrams”, J. Stanford CoreNLP natural language processing Syst. Softw., Vol. 141, pp. 111–130, 2018, doi: toolkit”, In: Proc. of 52nd annual meeting of the 10.1016/j.jss.2018.03.061. association for computational linguistics: [19] F. Friedrich, J. Mendling, and F. Puhlmann, system demonstrations, pp. 55–60, 2014. “Process Model Generation from Natural [29] R. Menaha and VE. Jayanthi, “A Survey on Language Text”, in Advanced Information Acronym–Expansion Mining Approaches from Systems Engineering, pp. 482–496, 2011, doi: Text and Web”, Smart Intelligent Computing 10.1007/978-3-642-21640-4_36. and Applications, Singapore, pp. 121–133, 2019, [20] Y. Zhang, C. Wan, and B. Jin, “An empirical doi: 10.1007/978-981-13-1921-1_12. study on recovering requirement-to-code links”, [30] J. Pustejovsky, J. Castano, B. Cochran, M. In: Proc. of 2016 17th IEEE/ACIS International Kotecki, M. Morrell, and A. Rumshisky, Conf. on Software Engineering, Artificial “Extraction and disambiguation of acronym- Intelligence, Networking and meaning pairs in medline”, Medinfo, Vol. 10, No. Parallel/Distributed Computing (SNPD), pp. 2001, pp. 371–375, 2001. 121–126, 2016, doi: [31] A. S. Schwartz and M. A. Hearst, “A simple 10.1109/SNPD.2016.7515889. algorithm for identifying abbreviation International Journal of Intelligent Engineering and Systems, Vol.14, No.1, 2021 DOI: 10.22266/ijies2021.0228.15
Received: August 14, 2020. Revised: October 19, 2020. 156 definitions in biomedical text”, in Biocomputing [46] C. Leacock and M. Chodorow, “Combining 2003, World Scientific, pp. 451–462, 2002. local context and WordNet sense similarity for [32] K. Taghva and J. Gilbreth, “Recognizing word sense identification. WordNet, An acronyms and their definitions”, Int. J. Doc. Electronic Lexical Database”, MIT Press, 1998. Anal. Recognit., Vol. 1, No. 4, pp. 191–198, [47] S. Banerjee and T. Pedersen, “An adapted Lesk 1999. algorithm for word sense disambiguation using [33] S. Yeates, “Automatic Extraction of Acronyms WordNet”, In: Proc. of International Conf. on from Text.”, in New Zealand Computer Science Intelligent Text Processing and Computational Research Students’ Conf., 1999, pp. 117–124. Linguistics, pp. 136–145, 2002. [34] L. S. Larkey, P. Ogilvie, M. A. Price, and B. [48] G. Hirst and D. St-Onge, “Lexical chains as Tamilio, “Acrophile: an automated acronym representations of context for the detection and extractor and server”, In: Proc. of the fifth ACM correction of malapropisms”, WordNet Electron. conference on Digital libraries, pp. 205–214, Lex. Database, Vol. 305, pp. 305–332, 1998. 2002. [49] G. C. Witt, Writing effective business rules a [35] Y. Park and R. J. Byrd, “Hybrid text mining for practical method. 2012. finding abbreviations and their definitions”, [50] I. Graham, Business rules management and 2001. service oriented architecture: a pattern language. [36] J. Liu, C. Liu, and Y. Huang, “Multi-granularity Chichester, England; Hoboken, NJ: John Wiley, sequence labeling model for acronym expansion 2006. identification”, Inf. Sci., Vol. 378, pp. 462–474, 2017, doi: 10.1016/j.ins.2016.06.045. [37] D. Sánchez and D. Isern, “Automatic extraction of acronym definitions from the Web”, Appl. Intell., Vol. 34, No. 2, pp. 311–327, 2011, doi: 10.1007/s10489-009-0197-4. [38] D. Nadeau and P. D. Turney, “A Supervised Learning Approach to Acronym Identification”, in Advances in Artificial Intelligence, Berlin, Heidelberg, pp. 319–329, 2005 doi: 10.1007/11424918_34. [39] E. Adar, “SaRAD: A simple and robust abbreviation dictionary”, Bioinformatics, Vol. 20, No. 4, pp. 527–533, 2004. [40] J. T. Chang, H. Schütze, and R. B. Altman, “Creating an online dictionary of abbreviations from MEDLINE”, J. Am. Med. Inform. Assoc., Vol. 9, No. 6, pp. 612–620, 2002. [41] J. Xu and Y. Huang, “Using SVM to extract acronyms from text”, Soft Comput., Vol. 11, No. 4, pp. 369–373, 2007. [42] Z. Wu and M. Palmer, “Verbs semantics and lexical selection”, In: Proc. of the 32nd annual meeting on Association for Computational Linguistics, pp. 133–138, 1994. [43] P. Resnik, “Using information content to evaluate semantic similarity in a taxonomy”, ArXiv Prepr. Cmp-Lg9511007, 1995. [44] J. J. Jiang and D. W. Conrath, “Semantic similarity based on corpus statistics and lexical taxonomy”, ArXiv Prepr. Cmp-Lg9709008, 1997. [45] D. Lin, “Extracting collocations from text corpora”, in First workshop on computational terminology, pp. 57–63, 1998. International Journal of Intelligent Engineering and Systems, Vol.14, No.1, 2021 DOI: 10.22266/ijies2021.0228.15
You can also read