Applying Complementary Credit Scores to Calculate Aggregate Ranking
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Journal of Corporate Finance Research / Methods Vol. 15 | № 3 | 2021 DOI: 10.17323/j.jcfr.2073-0438.15.3.2021.5-13. JEL classification: C38, C44, G24, G34 Applying Complementary Credit Scores to Calculate Aggregate Ranking Zinaida Seleznyova Lecturer, Research Assistant, HSE School of Finance, HSE Financial Engineering & Risk Management Lab, National Research University Higher School of Economics, Moscow, Russia, zseleznyova@hse.ru, ORCID The journal is an open access journal which means that everybody can read, download, copy, distribute, print, search, or link to the full texts of these articles in accordance with CC Licence type: Attribution 4.0 International (CC BY 4.0 http://creativecommons.org/licenses/by/4.0/). 5 Higher School of Economics
Journal of Corporate Finance Research / Methods Vol. 15 | № 3 | 2021 Abstract Researchers have been improving credit scoring models for decades, as an increase in the predictive ability of scoring even by a small amount can allow financial institutions to avoid significant losses. Many researchers believe that ensembles of classifiers or aggregated scorings are the most effective. However, ensembles outperform base classifiers by thousandths of a percent on unbalanced samples. This article proposes an aggregated scoring model. In contrast to previous models, its base classifiers are focused on identifying different types of borrowers. We illustrate the effectiveness of such scoring aggregation on real unbalanced data. As the effectiveness indicator we use the performance measure of the area under the ROC curve. The DeLong, DeLong and Clarke-Pearson test is used to measure the statistical difference between two or more areas. In addition, we apply a logistic model of defaults (logistic regression) to the data of company financial statements. This model is usually used to identify default borrowers. To obtain a scoring aimed at non-default borrowers, we employ a modified Kemeny median, which was initially developed to rank companies with credit ratings. Both scores are aggregated by logistic regression. Our data Russian banks that existed or defaulted between July 1, 2010, and July 1, 2015. This sample of banks is highly unbalanced, with a concentration of defaults of about 5%. The aggregation was carried out for banks with several ratings. We show that aggregated classifiers based on different types of information significantly improve the discriminatory power of scoring even on an unbalanced sample. Moreover, the absolute value of this improvement surpasses all the values previously obtained from unbalanced samples. The aggregated scoring and the approach to its construction can be applied by financial institutions to credit risk assessment and as an auxiliary tool in the decision-making process thanks to the relatively high interpretability of the scores. Keywords: forward stepwise selection, logistic regression, Kemeny median, credit scoring, statistical learning For citation: Seleznyova, Z. (2021) “Applying Complementary Credit Scores to Calculate Aggregate Ranking”, Journal of Corporate Finance Research | ISSN: 2073-0438, 15(3), pp. 5-13. doi: 10.17323/j.jcfr.2073-0438.15.3.2021.5-13. 6 Higher School of Economics
Journal of Corporate Finance Research / Methods Vol. 15 | № 3 | 2021 cial statements. So, this ranking is potentially aimed at Introduction non-default companies. Scoring models have been developing for decades. Re- We propose to use logistic regression as the strong learner. searchers have proposed and compared different ap- It should be noted that logistic regressions, including ridge proaches to data preparation for model construction and and lasso, were used as ensemble models in [1; 9] and proved approaches to selecting factors which influence credit to be superior to other methods considered in these papers. quality and their generation. They have also studied the best approaches to assessing credit score capability/accu- Our study is based on a sample of banks during the pe- racy and the credit score methods themselves. This was riod between July 1, 2010, and July 1, 2015. This sample done to improve scoring accuracy, insofar as a gain or is characterized by a low default concentration of 5.76%. loss of a percentage point in accuracy can lead to mul- Financial performance indicators, identifiers of external or timillion profits or losses for banks and other financial government support, and ratings of credit rating agencies institutions [1]. were used to create rankings. It was shown that the aggre- gation of two base classifiers focused on the identification Over the past ten years, scholars have believed that the of different types of borrowers results in an improvement best practice is to use machine learning models [2] and of the predictive power of aggregated credit scoring in so-called “ensembles” [3] to construct credit scores. The comparison to base classifiers. basic idea of an ensemble lies in the aggregation of mod- elled base classifiers (scores) with the help of a model/al- The interpretability of weak and strong learners makes it gorithm. There exist different classifications of ensembles possible to use aggregate rankings not only as an addition- [3–5]; however, their division into bagging, boosting, and al parameter for decision making in financial institutions stacking ensembles is the most common. Bagging is the but also to evaluate default probability in risk management combination of several independent scorings (base classi- [10; 11]. The proposed weak learners constitute the scien- fiers, weak learners) constructed in a parallel way on the tific novelty of this paper: they were trained using poten- basis of independent random samples. Random forests are tially complementary information (ratings and financial a well-known example of bagging. Boosting is the aggre- statements). We know of only one similar study [12] that gation of several successively constructed base scorings. trained weak learners using market indicators and finan- Stacking is the combination of different base classifiers (for cial statements. However, the ensemble did not outperform example, logistic regression and a decision tree) that are the base classifier in discriminatory power [12]. trained simultaneously. They are combined in the ensem- ble model (strong learner), which includes different voting rules, statistical models and machine learning methods. Literature Review The ensemble paradigm makes ensembles relevant: several The number of papers devoted to credit scoring methods aggregated classifiers usually show greater discriminatory has grown exponentially over the past 30 years [3]. In the power/accuracy than a single classifier [5]. Nevertheless, last five years, researchers have continued their attempts some researchers have shown that ensemble models some- to improve credit scoring for legal entities [13–15]and times fail to surpass machine learning methods in regard even more so for financial institutions involved in lending to certain criteria [4; 6]. Also, their practical applicability is to SMEs. The importance of credit scoring has increased usually limited: in most cases, they are “black boxes” which recently because of the financial crisis and increased cap- are difficult to interpret because machine learning meth- ital requirements for banks. There are, however, only few ods and other ensembles are often used as the base clas- studies that develop credit coring models for SME lending. sifiers. Therefore, some researchers [7] have attempted to The objective of this study is to introduce a novel, more simplify the interpretability of ensemble models, including accurate credit risk estimation approach for SMEs business machine learning methods. lending. Based on traditional statistical methods and re- cent artificial intelligence (AI). However, the majority of In this paper, we focus on using complementary weak papers make use of databases of natural persons [16]. The learners to calculate aggregate rankings. We chose the reason is that such databases are in open access and avail- logistic model of defaults and a modified Kemeny median able for parsing. These samples have been used to compare [8] as two weak classifiers of this type due to their relatively well-known approaches to credit scoring calculation [17] high interpretability. We consider them to be complemen- the volume of databases that financial companies manage is tary for our purposes for the following reasons: so great that it has become necessary to address this prob- The logistic model (regression) is usually trained for defin- lem, and the solution to this can be found in Big Data tech- ing default borrowers using corporate financial statements. niques applied to massive financial datasets for segmenting In other words, the first weak learner is focused on default risk groups. In this paper, the presence of large datasets is borrowers. approached through the development of some Monte Car- The modified Kemeny median has been proposed as a tool lo experiments using known techniques and algorithms. In for credit rating aggregation. Usually, companies which addition, a linear mixed model (LMM) and propose new have a better-than-average creditworthiness want to have ones [18]. Different ensembles [18; 19] and logistic regres- credit ratings in particular because they are ready to dis- sions [20] have been identified as the best scoring methods. close to a rating agency more information than just finan- In addition, papers dedicated to the comparison of well- 7 Higher School of Economics
Journal of Corporate Finance Research / Methods Vol. 15 | № 3 | 2021 known methods often consider neural networks [21] and In order to calculate base classifiers, a preliminary prepara- decision trees [22] to be the best. tion of data is carried out. One of the stages of preliminary Such a diversity of best methods is partially explained by preparation is parameter selection by means of forward the wide range of simultaneously applied classification feature selection. Nevertheless, it is necessary to describe quality criteria. Many authors [4; 9] agree that it is better the data sample before we explain the methodology in de- to use several model performance measures at once. Nev- tail. This is due to the fact that the choice of methods de- ertheless, other researchers [23; 24] continue to apply only pends on the data. conventional methods calculated on the basis of an error matrix. Data In this paper, we propose looking at credit scoring aggrega- The main data pool comprises publicly available infor- tion from a slightly different perspective. Usually, only one mation on 958 banks for the period between July 1, 2010, type of data is used to create base scorings: financial state- and July 1, 2015, which represents approximately 80% of ments or characteristics of natural persons [25]normally all banks operating in the Russian Federation during this taking between 50% and 80% of the total project time. It period. 134 of these banks had two or more ratings calcu- is in this stage that data in a relational database are trans- lated by seven credit rating agencies: AK&M, Expert RA formed for applying a data mining technique. This stage (EXP), National Rating Agency (NRA), RusRating (RUS), is a complex task that demands from database designers Fitch Ratings, Moody’s Analytics, and Standard & Poor’s. a strong interaction with experts having a broad knowl- This data pool was formally divided into three parts: data edge about the application domain. Frameworks aiming on banks up to July 2014, data on banks after July 2014, and to systemize this stage have significant limitations when data on banks with two and more credit ratings. applied to Credit Behavioral Scoring solutions. This paper proposes a framework based on the Model Driven Devel- Data on banks up to and including July 2014 comprises opment approach to systemize the mentioned stage. This 70% of the observations of the main pool or 13,570 ob- work has three main contributions: 1 or company market servations. The default concentration is 4.6%. In terms of indicators [13]and even more so for financial institutions default/non-default observations, this sample is highly un- involved in lending to SMEs. The importance of cred- balanced. It comprises indicators from bank report forms it scoring has increased recently because of the financial 101 and 102 and statutory requirements information (form crisis and increased capital requirements for banks. There 135) posted on the website of the Bank of Russia1 and in- are, however, only few studies that develop credit coring formation on support from the Russian government or models for SME lending. The objective of this study is to foreign banks. This sample was used to train the logistic introduce a novel, more accurate credit risk estimation model of defaults. approach for SMEs business lending. Based on tradition- Data on banks after July 2014 consists of 4,261 observa- al statistical methods and recent artificial intelligence (AI. tions with a default concentration of 9.25%. The list of in- Indicator categories from financial statements complement dicators was the same as in the sample described above. each other, and machine learning methods can be applied This sample was used to test the logistic model of defaults. to assess the nonlinear relations between them. However, Data on banks with two and more credit ratings is part the creditworthiness of a company may be characterized of the two samples described above. This sample consists by factors that are recorded only partially or not at all in of observations on 134 banks. The sample size is 1,700 statements. These ratings may potentially complement the observations, 17 of which are defaults. This sample is also indicators of corporate financial statements: companies unbalanced and has a default concentration of 2.72%. In disclose more information to credit rating agencies (CRAs) addition to the indicators described above, it includes CRA than one can find in the public domain [26]. In addition, ratings. For the purposes of creating scoring ratings, cat- companies with a better creditworthiness, all other things egories were assigned numerical values, where 0 was at- being equal, tend to resort to CRAs: such companies are tributed to the higher rating category of each credit rating developing and need external ratings to expand into new agency (CRA). Then, the numerical value of each lower markets, for example. Thus, one may conjecture that the category was increased by 1. As the last two columns of Ta- collective opinion of credit rating agencies may comple- ble 1 show, the number of assigned rating categories varied ment information from financial statements. greatly from agency to agency. In this paper, we will use classical logistic regression as the base classifier and as the aggregated model. This practice was applied in the sample is class imbalanced [9; 27]. Class imbalance may affect the accuracy of default predictions, as classifiers tend to be biased towards the majority class (good borrowers, which showed the advantage of this ap- proach over base classifiers. 1 URL: https://www.cbr.ru/credit/ 8 Higher School of Economics
Journal of Corporate Finance Research / Methods Vol. 15 | № 3 | 2021 Table 1. Descriptive statistics of 7 CRA ratings Variable Number of Average Mode Standard Min. Max. observations deviation AK&M 209 1.92345 2 0.67502 1 4 EXP 652 1.63497 2 0.87912 0 6 FCH 609 4.92939 0 3.78709 0 14 MDS 1108 6.22563 9 3.12457 0 15 NRA 627 3.70973 3 1.73995 0 13 RUS 246 4.23577 6 2.57644 0 10 SNP 511 5.04305 6 2.8694 0 21 Source: author’s calculations. scorings [23; 24], insofar as it does not generate a bias of If we consider previous papers that, in one way or anoth- estimators due to an inappropriately chosen way of impu- er, studied CRA ratings using Russian data (for example, tation of missing values [30]. The forward stepwise selec- [28]), we see that the general distribution of agency ratings tion method was used for features selection for the logis- has changed little. The most frequent ratings are low rat- tic model. This approach adds a relevant variable to the ings in the investment grade or best ratings in the specu- defined significant variables. If this variable is significant lative grade. The data on ratings is taken from the RUData and significantly improves the model, it is also included. In system2. Consensus and aggregate rankings are calculated spite of its simplicity, this approach is still widely used to using this sample. select parameters [16]. Multicollinearity was controlled by means of a correlation matrix. It was controlled both at the The low default concentration and small size of the sample intermediate stage of model building and at the final stage. of banks with several ratings is insufficient for dividing it into training and test samples to create a logistic model. Logistic regression. In credit scoring problems, the logis- This is why samples of banks with one or no ratings are tic model may be formulated as follows: bank i has rating used in this study. yi , which is equal to 0 if there is no bank default and 1 if there is bank default. This rating depends on the latent var- * iable yi , which represents the bank credit quality: Methodology This chapter consists of several parts. “Logistic Regression” 1 if 0 ( default ) yi = , describes the preliminary preparation of data for making 0 otherwise ( no default ) a scoring using the logistic model of defaults, the logistic (1) model itself, and ways of validating it. “Modified Kemeny 1 if y*i ⩾ 0 (default). Median” has a similar structure. “Aggregation” describes 0 otherwise (no default) the mechanism for aggregating the two rankings obtained from the logistic model and the modified Kemeny median. * In turn, yi linearly depends on X– factors that may pre- “Model Power Indicator” describes the tool applied to ver- dict the bank creditworthiness. They may be continuous ify the efficiency (power) of obtained rankings. and categorical quantities that represent relevant finan- cial, macroeconomic and other indicators. In this case, the Logistic Regression probability of a bank being default or non-default is as fol- Linear prediction of the logistic model of defaults or the lows, respectively: “continuous” rating of the defaults prediction model is used as the first baseline ranking (classifier) [29]. Due to ( ) ( ) P ( yi = 1) = P y*i ≥ 0 = P Xi' β + ε ≥ 0 ; its simplicity, transparency, interpretability and a relatively (2) high discriminatory power, this scoring model continues P ( yi = 0 ) = 1− F(X β ) ' i to be the industry standard [3; 28]. , ' Data preparation. In this paper, observations with miss- where Xi is the transposed matrix of factors describing ing data were not used for building the logistic regression. the bank’s creditworthiness, ε is an unobservable random Such an approach is frequently used for calculating credit component with logistic distribution, and F is a logistic 2 URL: https://rudata.info/ 9 Higher School of Economics
Journal of Corporate Finance Research / Methods Vol. 15 | № 3 | 2021 distribution function. The linear predictions are calculated Validation. It is impossible to apply common validation as follows: measures such as cross-validation types to this method. n The reason is that the modified Kemeny median is a result Ricontin = ∑X β . j =1 ' i of a non-parametric approach that cannot be used for an- other sample directly without mapping. contin The Kemeny median was originally a voting method that In this paper, R is used as one of the base scorings was subsequently used as an aggregator of credit ratings focused on default borrowers. for banks. The collective opinion of credit rating agencies Validation. The complete sample of banks is used to build may complement information from financial statements: the logistic model, regardless of whether they have a rating companies disclose to credit rating agencies information or not. This sample is divided into training and test sub- which may be absent from publicly available data. Thus, it samples. This is done on an out-of-time basis and it’s no is expected that the combination of the logistic model of coincidence. Such a validation method is used in credit defaults built on publicly available data and the ranking ob- scoring studies [31; 32]. tained from CRA ratings will surpass these base classifiers. Modified Kemeny Median Aggregation Data preparation. Unlike the previous method, observa- Logistic regression is applied as a strong classifier in this tions with missing data for certain variables were used for paper. The binary default/non-default variable yi , ar- building a modified Kemeny median (consensus ranking). To create a consensus ranking, we used the ratings of seven ranged in the same way as in function (1), still serves as the rating agencies operating in Russia from July 2010 to July interpretable factor. However, to create an aggregated scor- 2015. ing, yi is predicted using the following two factors: Modified Kemeny median. Another base classifier is rep- resented by the Kemeny median [8], whose application ( ) P ( yi = 1) = F γ 0 + γ 1Ricontin + γ 2 Ricons + ε ≥ 0 , (4) results in the so-called “consensus ranking”. This method where γ j is a coefficient obtained from assessing the logis- is based on the interpretation of credit rating as a relative tic regression with the help of the maximum likelihood ranking of objects in accordance with a CRA’s opinion on method and F is the logistic distribution function. The the credit quality of each object. On the basis of the rat- aggregated scoring itself is calculated as follows: ings specific nature as expert information, we modified the concept of Kemeni distance between rankings. This made Riagregated = γ 0 + γ 1Ricontin + γ 2 Ricons . (5) it possible to find a unique solution that least contradicts the opinions of rating agencies with an acceptable accuracy Model Power Indicator within an acceptable time: We use the indicator of the area under the ROC curve (here- m after, AUCROC) as a measure of the discriminatory power R cons argmin ∑ϕ k =1 k d ( R, Rk ) + λδ 2 ( R, Rk ) , (3) of all scorings. This indicator is appropriate for unbalanced samples – in particular, because it takes different errors into account [1, p. 2]. In addition, this indicator does not under- cons rate or overrate its values due to erroneous classification or where R is the resulting (aggregated) rating, m is default distribution [7, p. 38]. The resulting indicator values the number of aggregated ratings, Rk is the kth rating, should be interpreted as follows: the closer the AUCROC d ( R′, R '') is the rank measure of distance between ratings value to 1, the greater the discriminatory power of the credit R ' and R '' (number of contradictory rankings for all pairs indicator. This indicator is described in more detail in [33]. of companies), λ is the regularization parameter (relative The statistical significance of differences between the AU- significance of the secondary criterion), δ ( R ', R '') is the 2 CROC of base classifiers and the aggregated model is de- additional (secondary) criterion (shows the extent of con- fined by means of the DeLong, DeLong and Clarke-Pear- tradiction significance), son test [34] at a 10% significance level. m and ϕk > 0, ∑ϕ k =1 k 1 = Results are weights representing the degree of confidence in the This section deals with the discriminatory powers of credit ratings of a given agency. scorings made with the help of base classifiers and through the aggregation of scorings. R cons is a non-strict bank ranking. Each combination of ratings is assigned its own numerical value, and so the cons Logistic Model of Defaults granularity degree of R depends on the number of The model was trained on a sample of Russian banks from such combinations, and the order of each combination in the period July 1, 2010 – July 1, 2014. The following factors R cons depends on its inconsistency with other ratings. were selected: 10 Higher School of Economics
Journal of Corporate Finance Research / Methods Vol. 15 | № 3 | 2021 1) Ratio of the deposits of a legal entity to its bank reason for this is that this aggregated rating is based on assets. information about banks which basically have a rating. 2) Regulatory requirement of “the biggest possible credit This is a positive signal for the market: the bank is not risks” Н7. afraid of its creditworthiness assessment and can afford it in practice. 3) Regulatory requirement of long-term liquidity Н4. 4) Amount of granted short-term credits. Figure 2. ROC and AUCROC of the consensus ranking on a sample of banks with two or more ratings The AUCROC of the obtained logistic model is equal to 68.26%. In the test sample, the AUCROC is 70.03%. The 1 AUCROC consistency in these two non-overlapping sam- 0.9 ples indicates that the model has not been retrained. In the sample of banks with two and more ratings, the AUCROC 0.8 is equal to 71.4% (Figure 1). The quicker growth of ROC 0.7 diagram at the origin means that the default model defines 0.6 Sensitivity default banks better. Figure 1. ROC and AUCROC of the logistic model on a 0.5 sample of banks with two or more ratings 0.4 1 0.3 0.9 0.2 0.8 0.1 0.7 0 0 0.2 0.4 0.6 0.8 1 0.6 Sensitivity 100%-specifity 0.5 Perfect classification 0.4 Random classification Consensus ranking, AUCROC=71.28% 0.3 0.2 This ranking also has high discriminatory power from a 0.1 practical standpoint and is as good as statistical models and 0 machine learning methods in a low-default environment 0 0.2 0.4 0.6 0.8 1 [36; 37]. The consensus ranking is statistically indiscerni- 100%-specifity ble at a 10% significance level with a logistic model of de- faults according to the DeLong, DeLong and Clarke-Pear- Perfect classification son test (p-value = 99.3%). Random classification Logit model, AUCROC=71.24% Aggregate Ranking According to the quality criterion of scoring models from The aggregated ranking was built from the two previous [35], this model shows a good discriminatory power from rankings. Logistic regression was the aggregated model. a practical standpoint. This is confirmed by the results of We obtained a scoring with the AUCROC equal to 76.16% [9; 21], which built logistic regressions using unbalanced (Figure 3). samples. In such a case, the AUCROC of the logistic model Statistically, the ranking surpasses the two base scorings usually lies in the range 60–74%. at a 10% significance level3, showing the relevance of ag- gregating several baseline rankings and ensembling. In ad- Consensus Ranking dition, one should note that aggregated scoring includes The consensus ranking was calculated on the basis of the best characteristics of both baseline rankings. It defines a sample consisting of banks with two or more ratings. default and non-default borrowers with similar precision. The consensus AUCROC is equal to 71.28% (Figure 2). Moreover, previously proposed versions of aggregate clas- This ranking defines trustworthy borrowers better, as the sifiers showed a growth in the AUCROC not exceeding 3% right part of the ROC diagram is almost horizontal. The [38; 39] on unbalanced samples. contin 3 H 0 : AUCROC = AUCROC aggregated = , p − value 1.79%. H0 : =AUCROC cons AUCROC aggregated = , p − value 9.22% . contin cons H 0 : AUCROC = AUCROC = AUCROC aggregated , p −= value 0% . 11 Higher School of Economics
Journal of Corporate Finance Research / Methods Vol. 15 | № 3 | 2021 Figure 3. ROC and AUCROC of the aggregated model 2. de Castro Vieira JR, Barboza F, Sobreiro VA, Kimura and two base classifiers on a sample of banks with two or H. Machine learning models for credit analysis more ratings improvements: Predicting low-income families’ 1 default. Appl Soft Comput J. 2019 Oct 1;83:105640. 0.9 3. Louzada F, Ara A, Fernandes GB. Classification 0.8 methods applied to credit scoring: Systematic 0.7 Sensitivity review and overall comparison. Vol. 21, Surveys 0.6 in Operations Research and Management Science. 0.5 Elsevier Science B.V.; 2016. p. 117–34. 0.4 0.3 4. Xia Y, Liu C, Li YY, Liu N. A boosted decision 0.2 tree approach using Bayesian hyper-parameter 0.1 optimization for credit scoring. Expert Syst Appl. 0 2017 Jul 15;78:225–41. 0 0.2 0.4 0.6 0.8 1 5. Ensemble methods: bagging, boosting and 100%-specifity stacking | by Joseph Rocca | Towards Data Science Perfect classification [Internet]. [cited 2020 Oct 4]. Available from: Random classification https://towardsdatascience.com/ensemble-methods- Consensus ranking, AUCROC=71.28% bagging-boosting-and-stacking-c9214a10a205 Aggregate ranking, AUCROC=76.16% 6. Ma X, Sha J, Wang D, Yu Y, Yang Q, Niu X. Study Logit model, AUCROC=71.24% on a prediction of P2P network loan default based on the machine learning LightGBM and XGboost Conclusion algorithms according to different high dimensional data cleaning. Electron Commer Res Appl. 2018 Financial institutions need to identify both default and Sep 1;31:24–39. Available from: https://linkinghub. non-default contractors or customers in order to enable elsevier.com/retrieve/pii/S156742231830070X their management to take informed decisions when solv- 7. Hayashi Y. Application of a rule extraction algorithm ing risk management problems. In this paper, we propose family based on the Re-RX algorithm to financial the aggregation of credit scorings made with methods fo- credit risk assessment from a Pareto optimal cused on different types of borrowers: the logistic model of perspective. Oper Res Perspect. 2016 Jan 1;3:32–42. defaults and the modified Kemeny median. Logistic regres- sion is used as the strong learner. 8. Buzdalin AV, Zanochkin AU, Kurbangaleev MZ, Smirnov SN. Сredit ratings aggregation as a task Our data sample consists of Russian banks from the pe- of building consensus in the system of expert riod July 2010 – July 2015, including credit ratings. From assessments. Globalnie rinki i finansoviy engineering. a practical standpoint, the discriminatory power of base- 2017;4(3):181–207. (In Russ.). line rankings is high and typical for credit scorings in a low-default environment. However, their aggregation us- 9. Zanin L. Combining multiple probability predictions ing logistic regression resulted in a significant growth in in the presence of class imbalance to discriminate the discriminatory power of scoring. Moreover, this incre- between potential bad and good borrowers in the ment surpassed the increments of ensembles or aggregated peer-to-peer lending market. J Behav Exp Financ. rankings on unbalanced samples described in earlier liter- 2020 Mar 1;25:100272. ature. As long as the applied classifiers demonstrate a rela- 10. Committee on Banking Supervision B. Basel tively high interpretability, such a model can be also used Committee on Banking Supervision International by financial institutions for risk management. Convergence of Capital Measurement and Capital In further research, feature engineering techniques (for Standards A Revised Framework Comprehensive example, principle component analysis) may be applied Version [Internet]. 2006 [cited 2020 Oct 4]. Available as explanatory factors, provided the obtained index is from: https://www.bis.org/basel_framework/ interpretable. It is also possible to expand the set of base 11. Carta S, Ferreira A, Reforgiato Recupero D, Saia M, scorings by adding market scorings and some other inter- Saia R. A combined entropy-based approach for a pretable scorings obtained, for example, from discriminant proactive credit scoring. Eng Appl Artif Intell. 2020 analysis, decision trees, etc. Jan 1;87:103292. 12. Tserng HP, Chen PC, Huang WH, Lei MC, Tran QH. References Prediction of default probability for construction firms using the logit model. J Civ Eng Manag. 1. Abellán J, Castellano JG. A comparative study on base 2014;20(2):247–55. classifiers in ensemble methods for credit scoring. Expert Syst Appl. 2017 May 1;73:1–10. Available 13. Li K, Niskanen J, Kolehmainen M, Niskanen M. from: https://linkinghub.elsevier.com/retrieve/pii/ Financial innovation: Credit default hybrid model for S0957417416306947 SME lending. Expert Syst Appl. 2016;61. 12 Higher School of Economics
Journal of Corporate Finance Research / Methods Vol. 15 | № 3 | 2021 14. Barboza F, Kimura H, Altman E. Machine learning 28. Karminsky A.М. Credit rating and its modelling. models and bankruptcy prediction. Expert Syst Appl. Moscow: HSE Publishing House; 2015. 304 p. (In 2017 Oct 15;83:405–17. Russ.). 15. Maldonado S, Peters G, Weber R. Credit scoring 29. Aivazyan S, Golovan S, Karminsky A, Peresetckiy A. using three-way decisions with probabilistic rough Approaches to comparing rating scales. Prikladnaya sets. Inf Sci (Ny). 2020 Jan 1;507:700–14. ekonometrika. 2011;3(23):13–40. (In Russ.). 16. Dastile X, Celik T, Potsane M. Statistical and machine 30. Fitzmaurice G, Kenward MG. Handbooks of Modern learning models in credit scoring: A systematic Statistical Methods Handbook of Missing Data literature survey. Appl Soft Comput J. 2020 Jun Methodology. 1st ed. New York: Chapman and Hall/ 1;91:106263. CRC; 2014. 600 p. 17. Pérez-Martín A, Pérez-Torregrosa A, Vaca M. Big 31. Mushava J, Murray M. An experimental comparison Data techniques to measure credit banking risk in of classification techniques in debt recoveries scoring: home equity loans. J Bus Res. 2018 Aug 1;89:448–54. Evidence from South Africa’s unsecured lending 18. Ala’raj M, Abbod MF. A new hybrid ensemble credit market. Expert Syst Appl. 2018 Nov 30;111:35–50. scoring model based on classifiers consensus system 32. Garrido F, Verbeke W, Bravo C. A Robust profit approach. Expert Syst Appl. 2016;64. measure for binary classification model evaluation. 19. Ala’Raj M, Abbod MF. Classifiers consensus system Expert Syst Appl. 2018 Feb 1;92:154–60. approach for credit scoring. Knowledge-Based Syst. 33. Tasche D. Estimating discriminatory power and PD 2016 Jul 15;104:89–105. curves when the number of defaults is small. 2009 20. Fang F, Chen Y. A new approach for credit scoring May 24 [cited 2018 Dec 5]; Available from: http:// by directly maximizing the Kolmogorov–Smirnov arxiv.org/abs/0905.3928 statistic. Comput Stat Data Anal. 2019 May 34. DeLong ER, DeLong DM, Clarke-Pearson DL. 1;133:180–94. Comparing the Areas under Two or More Correlated 21. Teply P, Polena M. Best classification algorithms in Receiver Operating Characteristic Curves: A peer-to-peer lending. North Am J Econ Financ. 2020 Nonparametric Approach. Biometrics. 1988 Jan 1;51:100904. Sep;44(3):837. 22. Butaru F, Chen Q, Clark B, Das S, Lo AW, Siddique A. 35. Pomazanov M.V. Credit risk-management in a bank: Risk and risk management in the credit card industry. internal ratings-based approach (IRB) [Internet]. 1st J Bank Financ. 2016 Nov 1;72:218–39. ed. Penikas G.I, editor. Moscow: Urait; 2019 [cited 2020 Oct 4]. 265 p. Available from: https://urait. 23. Sousa MR, Gama J, Brandão E. A new dynamic ru/book/upravlenie-kreditnym-riskom-v-banke- modeling framework for credit risk assessment. podhod-vnutrennih-reytingov-pvr-437044. (In Expert Syst Appl. 2016 Mar 1;45:341–51. Russ.). 24. Maldonado S, Pérez J, Bravo C. Cost-based 36. Fitzpatrick T, Mues C. An empirical comparison feature selection for Support Vector Machines: An of classification algorithms for mortgage default application in credit scoring. Eur J Oper Res. 2017 prediction: Evidence from a distressed mortgage Sep 1;261(2):656–65. market. In: European Journal of Operational 25. Neto R, Jorge Adeodato P, Carolina Salgado A. Research. Elsevier; 2016. p. 427–39. A framework for data transformation in Credit 37. Shen F, Zhao X, Li Z, Li K, Meng Z. A novel ensemble Behavioral Scoring applications based on Model classification model based on neural networks and Driven Development. Expert Syst Appl. 2017 Apr a classifier optimisation technique for imbalanced 15;72:293–305. credit risk evaluation. Phys A Stat Mech its Appl. 26. Karminsky A, Polozov A. Handbook of Ratings: 2019 Jul 15;526:121073. Approaches to Ratings in the Economy, Sports, 38. Xia Y, Liu C, Da B, Xie F. A novel heterogeneous and Society. Handbook of Ratings: Approaches to ensemble credit scoring model based on bstacking Ratings in the Economy, Sports, and Society. Springer approach. Expert Syst Appl. 2018 Mar 1;93:182–99. International Publishing; 2016. 1–356 p. 39. He H, Zhang W, Zhang S. A novel ensemble method 27. 27. Li K, Niskanen J, Kolehmainen M, Niskanen M. for credit scoring: Adaption of different imbalance Financial innovation: Credit default hybrid model for ratios. Expert Syst Appl. 2018 May 15;98:105–17. SME lending. Expert Syst Appl. 2016 Nov 1;61:343–55. Contribution of the authors: the authors contributed equally to this article. The authors declare no conflicts of interests. The article was submitted 06.07.2021; approved after reviewing 08.08.2021; accepted for publication 14.08.2021. 13 Higher School of Economics
You can also read