Design, Prototypical Implementation, and Evaluation of an Active Machine Learning Service in the Context of Legal Text Classification - sebis TU ...
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Design, Prototypical Implementation, and Evaluation of an Active Machine Learning Service in the Context of Legal Text Classification Johannes Muhr, July 10th 2017, Munich Chair of Software Engineering for Business Information Systems (sebis) Faculty of Informatics Technische Universität München wwwmatthes.in.tum.de
Outline 1. Motivation 2. Research Questions 3. Research Approach & Objectives 4. Knowledge Base 5. LexML 6. Evaluation July 10th 2017, Final presentation Johannes Muhr © sebis 2
Motivation I • Processes of legal experts (scientists and lawyers) are • time-intensive • knowledge-intensive • data-intensive • Legal Documents • are growing strongly in recent years • are changing over time • are becoming more complex à Therefore, searching for relevant information is • expensive [1] • difficult [2] [1] Roitblat, H. L., et al. [2] Paul, G.L. and J.R. Baron. July 10th 2017, Final presentation Johannes Muhr © sebis 3
Motivation II “There seems to be little question that machine learning will be applied to the legal sector. It is likely to fundamentally change the legal landscape.” [3] • Legal Data Science in incorporating machine learning is gaining more and more attention, because • process time and memory space are cheap • algorithms can process textual data fast and accurately • suitable frameworks are available [3] Liu, B. July 10th 2017, Final presentation Johannes Muhr © sebis 4
Research Questions What are common concepts, strategies and technologies used for text classification? How can (active) machine learning support the classification of legal documents and their content (norms)? What form does the concept and design of an active machine learning service take? How well does the active machine learning service perform in classifying legal documents and their content (norms)? July 10th 2017, Final presentation Johannes Muhr © sebis 5
Research Approach & Objective Environment IS Research Knowledge Base People / Develop / Build Foundations / Organizations Methodologies • LexML • People working • (Legal) Text with legal Business Applicable Classification content Needs Knowledge • (Active) Assess Refine Machine Technology Learning Justify / Evaluate • Machine • Lexia Learning • Document Frameworks Classification • Norm Classification Applications in the Additions to the Von Alan, R.H., et al. Appropriate environment Knowledge Base July 10th 2017, Final presentation Johannes Muhr © sebis 6
Knowledge Base Iterative and interactive machine learning concept utilizing the strategy that the learning algorithm can select the data from which it learns. Most informative instances Algorithms making use of the predictions Preprocessing à Feature Vector à Learning made by the learning algorithm (classifier) to Algorithm (e.g. Naïve Bayes, Logistic Regression) find informative instances 4 Labeled instances Spark MLlib (ML framework) Predicted Query 1 2 Unlabeled 3 Strategy • ML Pipeline suitable for text Instances classification Unlabeled 4 instances other instances July 10th 2017, Final presentation Johannes Muhr © sebis 7
LexML – Basics • Independent service using Apache Spark MLlib as machine learning framework • Implementing the logic for active learning to perform multiclass (legal) text classification • Accessible via Rest API • Configurable via Lexia • Label • Classifier • Naïve Bayes, Logistic Regression, Multilayer Perceptron • Data • Query strategy • Export of evaluation results in a xlsx-file July 10th 2017, Final presentation Johannes Muhr © sebis 9
Evaluation – Experimental Setup • Two experiments • Document classification • Norm classification • Simulating active learning using the available labeled dataset • Evaluation of twelve different machine learning combinations • Nine active learning strategies • Three passive learning strategies • Evaluation process • After each learning round, the resulting model was applied to the test data • Export of common evaluation measures (F1, accuracy, precision & recall) • Use the average result of five rounds per combination as reference value July 10th 2017, Final presentation Johannes Muhr © sebis 10
Evaluation – Document Classification I • Classifying 1000 random documents from the datev corpus into one of three classes using a subset of words of the document • Random, but even split in training- and test dataset Class Sub-classes Support Gesetzestext (Gesamttext), Richtlinie (Gesamttext), EU-Richtlinie Law (“Gesetz”) 12.8% (Gesamttext), Judgment (“Urteil”) Urteil, Beschluss, Gerichtsbescheid 65.5% Generic legal Aufsatz, Anmerkung, Verfügung, Verfügung (koordinierter Ländererlass), document (“Sonstige Kurzbeitrag, Erlass, Erlass (koordinierter Ländererlass), Schreiben, 21.6% Dokumente”) Schreiben (koordinierter Ländererlass), Übersicht, Mitteilung July 10th 2017, Final presentation Johannes Muhr © sebis 11
Evaluation – Document Classification II Average Accuracy of Classifiers using Active Learning vs. Passive Learning 100 95 90 Naïve Bayes predominant classifier, followed by Accuracy (%) 85 Multilayer Perceptron 80 Active learning clearly superior to passive learning 75 70 Labeled Instances (%) 65 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 99 Naive Bayes Naive Bayes (random) Logistic Regression Logistic Regression (random) Perceptron Perceptron (random) July 10th 2017, Final presentation Johannes Muhr © sebis 12
Evaluation – Norm Classification I • Classifying 504 sentences (norms) of the law of tenancy section of the German civil code (BGB) as one of eight possible classes • Random, but even split in training- and test dataset Class Sub-classes Support Recht Gebot (positiv/negativ/Soll-Pflicht (H)), Verbot (/Duldung (U)) 25.0% Pflicht Erlaubnis (/Erlaubnis beschränkt), Ermächtigung 21.6% Unwirksamkeit (/Wirksamkeit beschränkt (sachlich)/(persönlich)), Einwendung Unzulässigkeit (/Zulässigkeit beschränkt), Ausschluss (/Ausschluss 18.3% (Einw rh) beschränkt), Nichtigkeit Rechtsfolge Rechtzuweisung (/Rechtsübergang), Pflichtzuweisung (/Pflichterweiterung), 9.9% (RF vAw) Rechtseigenschaft, Freistellung Form, Frist (/Fälligkeit), Maßstab (inkl. Berücksichtigung), Verfahren 9.7% Entscheidungskompetenz Direkt § /direkt Vorschriftenkomplex, Analog § /analog Vorschriftenkomplex, Verweisung 9.1% Negativ Fortführungsnorm Ausnahme, Erweiterung, Einschränkung, Gleichstellung 3.8% Definition Direkt, Indirekt, Negativ 2.6% July 10th 2017, Final presentation Johannes Muhr © sebis 13
Evaluation – Norm Classification II Average Accuracy of Classifiers using Active Learning vs. Passive Learning 100,00 90,00 Active learning clearly superior to Logistic Regression passive learning best classifier 80,00 Accuracy (%) 70,00 60,00 50,00 40,00 30,00 5 9 13 17 21 25 29 33 37 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 100 Labeled Instances (%) Naive Bayes Logistic Regression Perceptron Naive Bayes (random) Logistic Regression (random) Perceptron (random) July 10th 2017, Final presentation Johannes Muhr © sebis 14
Summary Conclusions • Active Learning is a promising approach to conduct legal text classification • Better results using less instances • The choice of the parameters is a very complex issue Limitations • Evaluation with “only” one default setting • Reusability of trained model • Black-box classifier Future Work • Conduct of additional evaluations rounds with (varying learning settings) • Implementation of additional features • Query strategies • Learning algorithms • Combination of (active) machine learning and rule-based learning July 10th 2017, Final presentation Johannes Muhr © sebis 15
Johannes Muhr Advisor: Bernhard Waltl Technische Universität München Faculty of Informatics Chair of Software Engineering for Business Information Systems Boltzmannstraße 3 85748 Garching bei München Tel +49.89.289.17132 Fax +49.89.289.17136 matthes@in.tum.de wwwmatthes.in.tum.de
Bibliography I § Busse, D. (2000). Textsorten des Bereichs Rechtswesen und Justiz. In G. Antos, K. Brinker, W. Heineman, & S. F. Sager (Eds.), Text- und Gesprächslinguistik. Ein internationales Handbuch zeitgenössischer Forschung. (Handbücher zur Sprach- und Kommunikationswissenschaft) (pp. 658-675). Berlin/New York: de Gruyter § Cardellino, C., Villata, S., Alemany, L. A., & Cabrio, E. (2015). Information Extraction with Active Learning: A Case Study in Legal Text. Paper presented at the International Conference on Intelligent Text Processing and Computational Linguistics. § de Maat, E., K. Krabben, and R. Winkels. Machine Learning versus Knowledge Based Classification of Legal Texts. in JURIX. 2010 § Francesconi, E. and A. Passerini, Automatic classification of provisions in legislative texts. Artificial Intelligence and Law, 2007. 15(1): p. 1-17. § Gruner, R. H. (2008). Anatomy of a Lawsuit - A Client’s Analysis and Discussion of a Multi-Million Dollar Federal Lawsuit. Retrieved from http://www.gruner.com/writings/AnatomyLawsuit.pdf § Landset, S., Khoshgoftaar, T. M., Richter, A. N., & Hasanin, T. (2015). A survey of open source tools for machine learning with big data in the Hadoop ecosystem. Journal of Big Data, 2(1), 24. doi:10.1186/s40537-015-0032-1 § Liu, B., Machine Learning and the Future of Law, L.T. Blog, Editor. 2015. § Novak, B., Mladenič, D., & Grobelnik, M. (2006). Text Classification with Active Learning. In M. Spiliopoulou, R. Kruse, C. Borgelt, A. Nürnberger, & W. Gaul (Eds.), From Data and Information Analysis to Knowledge Engineering: Proceedings of the 29th Annual Conference of the Gesellschaft für Klassifikation e.V. University of Magdeburg, March 9–11, 2005 (pp. 398-405). Berlin, Heidelberg: Springer Berlin Heidelberg. § Paul, G.L. and J.R. Baron, Information inflation: Can the legal system adapt. Rich. JL & Tech., 2006. 13: p. 1. § Ratner, A., Leveraging Document Structure for Better Classification of Complex Legal Documents July 10th 2017, Final presentation Johannes Muhr © sebis 17
Bibliography II § Roitblat, H.L., A. Kershaw, and P. Oot, Document categorization in legal electronic discovery: computer classification vs. manual review. Journal of the American Society for Information Science and Technology, 2010. 61(1): p. 70-80. § Šavelka, J., Trivedi, G., & Ashley, K. D. (2015). Applying an Interactive Machine Learning Approach to Statutory Analysis. § Segal, R., Markowitz, T., & Arnold, W. (2006). Fast Uncertainty Sampling for Labeling Large E-mail Corpora. Paper presented at the CEAS. § Settles, B. (2010). Active learning literature survey. University of Wisconsin, Madison, 52(55-66), 11. § Sunkle, S., Kholkar, D., & Kulkarni, V. (2016, 5-9 Sept. 2016). Informed Active Learning to Aid Domain Experts in Modeling Compliance. Paper presented at the 2016 IEEE 20th International Enterprise Distributed Object Computing Conference (EDOC). § Tong, S. (2001). Active learning: theory and applications. Citeseer. § Tong, S., & Koller, D. (2002). Support vector machine active learning with applications to text classification. Journal of Machine Learning Research, 2(1), 45-66 § Vern R. Walker, Ji Hae Han, Xiang Ni, Kaneyasu Yoseda (2017). Semantic Types for Computational Legal Reasoning: Propositional Connectives and Sentence Roles in the Veterans’ Claims Dataset, Paper presented at the ICAIL § Von Alan, R.H., et al., Design science in information systems research. MIS quarterly, 2004. 28(1): p. 75-105. July 10th 2017, Final presentation Johannes Muhr © sebis 18
Backup – Related Work • Ratner (2014) performed an experiment in which he classified legal contract documents into categories, such as “Arbitration Agreements” or “Manufacturing Contract” • Šavelka, Trivedi and Ashley (2015) examined the relevance and applicability of the individual statutory provisions • Roitblat, Kershaw and Oot (2010) compared the classification accuracy of computers relative to traditional human manual review • De Maat, Krabben and Winkels (2010) conducted a norm classification experiment classifying sentences of the dutch law • Walker, Han, Ni and Yoseda (2017) examined semantic types for argument mining in legal texts to provide a common basis for machine learning July 10th 2017, Final presentation Johannes Muhr © sebis 19
Backup– Document Classification I Average F1 of Naive Bayes per Class 100 90 80 Diversity of Generic Legal Documents is too large 70 F1 (%) 60 50 Uneven distribution of instances is recognizable at the beginning 40 30 3 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63 67 71 75 79 83 87 91 95 99 Labeled Instances (%) Law (NB) Judgment (NB) Generic (NB) July 10th 2017, Final presentation Johannes Muhr © sebis 20
Backup – Document Classification II Comparison of Query Strategies (Naive Bayes) 100 95 90 85 Accuracy (%) 80 75 70 65 60 3 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63 67 71 75 79 83 87 91 95 99 Labeled Instances (%) vote entropy margin sampling QBC vote entropy QBC soft vote entropy random July 10th 2017, Final presentation Johannes Muhr © sebis 21
Backup – Document Classification III Average Precision of Naive Bayes per Class 100 90 80 70 Precision (%) 60 50 40 30 20 3 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63 67 71 75 79 83 87 91 95 99 Labeled Instances (%) Law (NB) Judgment (NB) Generic (NB) July 10th 2017, Final presentation Johannes Muhr © sebis 22
Backup – Document Classification IV Average Recall of Naive Bayes per Class 100 90 80 70 Recall (%) 60 50 40 30 20 3 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63 67 71 75 79 83 87 91 95 99 Labeled Instances (%) Law (NB) Judgment (NB) Generic (NB) July 10th 2017, Final presentation Johannes Muhr © sebis 23
Backup – Norm Classification I Average F1 of Logistic Regression per Class 100,00 90,00 80,00 70,00 60,00 F1(%) 50,00 40,00 30,00 Recognition rate is highly 20,00 Overfitting visible diversified 10,00 0,00 5 9 13 17 21 25 29 33 37 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 100 Labeled Instances (%) Recht Pflicht Einw rh RF vAw Verfahren Verweisung Fortführungsnorm Definition July 10th 2017, Final presentation Johannes Muhr © sebis 24
Backup – Norm Classification II Comparison of Query Strategies (Logistic Regression) 100,00 90,00 80,00 Accuracy (%) 70,00 60,00 50,00 40,00 30,00 5 9 13 17 21 25 29 33 37 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 100 Labeled Instances (%) vote entropy margin sampling QBC vote entropy QBC soft vote entropy random July 10th 2017, Final presentation Johannes Muhr © sebis 25
Backup – Norm Classification III Average Precision of Logistic Regression per Class 100,00 90,00 80,00 70,00 60,00 Precision (%) 50,00 40,00 30,00 20,00 10,00 0,00 5 9 13 17 21 25 29 33 37 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 100 Labeled Instances (%) Recht Pflicht Einw rh RF vAw Verfahren Verweisung Fortführungsnorm Definition July 10th 2017, Final presentation Johannes Muhr © sebis 26
Backup – Norm Classification IV Average Recall of Logistic Regression per Class 100,00 90,00 80,00 70,00 60,00 Recall (%) 50,00 40,00 30,00 20,00 10,00 0,00 5 9 13 17 21 25 29 33 37 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 100 Labeled Instances (%) Recht Pflicht Einw rh RF vAw Verfahren Verweisung Fortführungsnorm Definition July 10th 2017, Final presentation Johannes Muhr © sebis 27
Backup – Norm Classification V Accuracy of Logistic Regression applying Vote Entropy Strategy 100,00 90,00 80,00 Accuracy (%) 70,00 60,00 50,00 40,00 30,00 5 9 13 17 21 25 29 33 37 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 100 Labeled Instances (%) Round 1 Round 2 Round 3 Round 4 Round 5 Average July 10th 2017, Final presentation Johannes Muhr © sebis 28
Backup – Evaluation Measures • Confusion Matrix True Class A B Predicted A TP FP class B FN TN "# • Precision = "#$%# "# • Recall = "#$%& '∗#)*+,-,./∗0*+122 • F1 (F-score) = #)*+,-,./$0*+122 "#$"& • Accuracy = #$"&$%#$%& July 10th 2017, Final presentation Johannes Muhr © sebis 29
Backup – Data Model July 10th 2017, Final presentation Johannes Muhr © sebis 30
Backup – Active Learning Engine July 10th 2017, Final presentation Johannes Muhr © sebis 31
Backup – Workflow Norm Classification July 10th 2017, Final presentation Johannes Muhr © sebis 32
Backup – Query Strategies I Uncertainty Sampling § Margin Sampling ∗ § 4 = ø =' − ø =? § Ambiguous instances with small margins should help the model to discriminate between them. § Vote Entropy § @∗ = − ∑F ø log ø § Approach based on the Shannon entropy, measuring the variable’s average information content: July 10th 2017, Final presentation Johannes Muhr © sebis 33
Backup – Query Strategies II Query By Committee § Vote Entropy ∗ G F,J G F,J § GH = − ∑F K K § y involves all possible labeling, V(y,x) is the number of committees that “voted” for label y for instance x, and |C| is the committee size. § Soft Vote Entropy ∗ § NGH = − K K § Taking the confidence of the decision into account: K = ? ∑+ ∈K + is the average (“consensus”) probability that y is K correctly labeled according to the committee. July 10th 2017, Final presentation Johannes Muhr © sebis 34
Backup – Classifier I Naïve Bayes • Feature d • Class c • P(d) constant (each feature has the same probability of being in the dataset) • Assumes strong independence between the single features P dc P c P cd = P d Logistic Regression • Uses the characteristic that the probability of observing the true label (logit transformation) can be modelled as a linear combination of the attributes July 10th 2017, Final presentation Johannes Muhr © sebis 35
Backup – Classifier II Multilayer Perceptron • Several layers and units (nodes) in this layer Output Layer • Weights are assigned to each edge y • Back propagation is used as learning algorithm • Forward phase, weights are fixed and training Hidden Layer h h h instances are propagated through the 1 2 3 network • Backward phase, the desired output is compared to the actual output at each output neuron to calculate the error rate Input Layer x x x • For each neuron in the network, all weights 1 2 3 are adjusted so that the actual output better approximates the desired output July 10th 2017, Final presentation Johannes Muhr © sebis 36
Backup – Supervised Machine Learning July 10th 2017, Final presentation Johannes Muhr © sebis 37
Backup – Text Classification Process July 10th 2017, Final presentation Johannes Muhr © sebis 38
Backup – Front-end (Configuration) Import of norms from selected legal documents Configuration of active learning settings July 10th 2017, Final presentation Johannes Muhr © sebis 39
Backup– Front-end (Training) Labeling of legal documents July 10th 2017, Final presentation Johannes Muhr © sebis 40
Backup – Comparison of Machine Learning Frameworks I MLlib Mahout Weka Scikit-learn Mallet Current Version (2nd Feb. 2.1 0.12.2 3.8 0.18 2.0.8 2016) Apache Apache Common General Public Berkeley Software License Software Software License (GPL) Distribution (BSD) Public License Foundation (AFS) Foundation (AFS) (CPL) Open Source yes yes yes yes yes OpenTable, Popular Users Verizon ? Pentaho Evernote, Spotify ? MapReduce Processing wrapper for Spark (deprecated), Spark none none Platform available Spark Mainly Scala, Interface Java, Scala, Java (for older Java, R Python Java Language Python, R versions) Suitable for large yes yes partially yes partially datasets Community good moderate good good moderate Support Documentation very good moderate good very good good Support for moderate NLP/Textual Data good (via Lucene) good good very good /AL Configuration simple c difficult c simple c simple very simple July 10th 2017, Final presentation Johannes Muhr © sebis 41
Backup – Comparison of Machine Learning Frameworks II MLlib Mahout Weka Scikit-learn Mallet Classification Algorithms ü ü NB ü a (Spark (batch, ü ü optimized) incremental) ü SVM ü b only via MLlib ü / (batch) ü MLP ü a only via MLlib ü / (batch) Multiclass one vs. all, one vs. all, one vs. all only via MLlib / Classification one vs. one one vs. one Multilabel by extensions (e.g. one vs. all / one vs. all / Classification Mulan, Meka) Pipeline ü a / / ü ü Total Algorithm very high high very high very high moderate Coverage Preprocessing (by extensions Stemming / ü ü ü (Snowball)) Stop Words ü / ü ü ü Coverage of Feature very high low very high very high moderate Selection Methods Text Representation Word Vector ü ü ü ü ü TF-IDF ü ü ü Evaluation Confusion Matrix ü ü ü ü ü ROC/AUC Curve ü / ü ü ü July 10th 2017, Final presentation Johannes Muhr © sebis 42
Backup – Literature study • Use of online Platforms like • Google Scholar, • Web of Science, • Institute of Electrical and Electronics Engineers (IEEE), • or Online Public Access Catalogue (OPAC) and Google Books • Backwards Search July 10th 2017, Final presentation Johannes Muhr © sebis 43
Backup – Hourly Billings by Task and Individual • Manual document classification is very expensive and time consuming • 13,5 Million $ were spent for classifying 1,6 Million items needing 4 month (= 8,50$ per document) [1] • A lot of time is wasted with (document) discovery [2] Hours: 1 446 = 26,4 % 5 486 Dollars: 483 986 = 28,5 % 1 697 322 [1] Roitblat, H. L., et al. (2010). Document categorization in legal electronic discovery: computer classification vs. manual review. [2] Gruner, (2008). A Client’s Analysis and Discussion of a Multi-Million Dollar Federal Lawsuit February 13th 2017, Kick-off presentation Johannes Muhr © sebis 44
You can also read