With the adoption of the IFRS 9 accounting standard into EU law, it is full steam ahead for banks to deploy credit models that estimate Expected Credit Loss (ECL) accounting values. The standard requires firms to account for lifetime ECL on loans that have experienced a “significant increase in credit risk” (SICR), but allows firms to reach their own conclusions as to just how much credit risk ought to be viewed as “significant”.
Most firms have developed their first suite of ECL models, and are able to estimate 12-month and lifetime ECL under various input assumptions such as forward economic scenarios. However, many firms have held off finalising their definition of SICR until additional forward economic scenarios have been developed. The wait-and-see approach also allows some degree of consensus in interpretation to emerge.
Credit risk, and movements in credit risk, are of course quantifiable using well-established techniques. But what should count as a “significant” increase? The IFRS 9 standard strongly suggests in paragraph 5.5.9 that SICR should be a function of movements in remaining lifetime PD as well as other qualitative factors, but gives little detail on how or where to calibrate the PD threshold.
In this article, we explore binary classifier techniques borrowed from machine learning applications, and explain how they can be used to select the optimal definition of SICR.
Receiver Operating Characteristic
Pioneers of Radar faced a similar problem in choosing the threshold at which to call the blip on the screen a big flock of birds, or a bomber. The Receiver Operating Characteristic (ROC) curve, which measures the discriminatory power of a binary classifier and is familiar to scorecard modellers, has its roots in radar engineering.
With the application of machine learning techniques for automated decision-making, it became necessary to set a decision rule which optimises the trade-off between identifying all bombers (the true positive result), all birds (the true negative result) and falsely calling bombers birds (false negative) or birds bombers (false positive).
Other examples of automatic classifiers include sorting of post, recognising hand-written numbers on cheques, and medical imaging. Many applications introduce the concept of a “cost function” – in the postal example the cost of misclassification is empirically measurable in terms of additional time spent resolving errors as well as compensation to customers.
IFRS 9 features and cost
The science of machine learning places considerable emphasis on first identifying which characteristics are of interest – termed the “feature vector” – as well as the relative cost of correct and incorrect classification. The IFRS 9 standard allows us to circumvent this first step by providing a handy (and concise) non-exhaustive list of factors which may be included in the feature vector (paragraph B5.5.17) including changes in credit spread, credit rating, economic conditions, borrower’s financial results, and pricing.
In addition, the IFRS 9 standard disallows losses (i.e. the inclusion of collateral cash flows) as an input into stage allocation (paragraph B5.5.9), leading us to conclude that expected credit losses (i.e. the obvious choice of cost function) should not be introduced to weight the classification to stage 1 or 2. Based on this information, standard machine learning techniques can be used to set the optimal boundary for classifying accounts into those likely to default (which should be classed as stage 2) and those not (which belong in stage 1).
Linear Discriminant Functions
In the case where the classes are linearly separable and assumed multivariate Normal, then the straight line through the origin which gives maximum class separation can be found with relative ease to be a linear combination of feature vectors on the margin or, equivalently, a function of the class means and standard deviations. The use of a discriminant function handily avoids the problem of having to estimate probability densities, and the approach can be extended to:
- Situations where the classes are not separable; and
- Non-linear boundaries between classes, as alluded to by the IFRS 9 standard (paragraph B5.5.9) permitting a larger absolute movement for the best-rated customers at initial recognition.
The four possible results of a binary classifier are:
- True positive;
- True negative;
- False positive; and
- False negative.
These can be described using a 2x2 matrix, often referred to as a confusion matrix. Note that when weighted by a cost function, the term “cost matrix” is also used. Many market participants have adopted approaches which evaluate variations in true positive rate (TPR), true negative rate (TNR), false positive rate (FPR) and false negative rate (FNR). Definitions of “positive” and “negative” are typically articulated using business rules which often include progression to arrears and default from stages 1 and 2, as well as rules around accounts’ propensity to flip-flop between stages.
Unsurprisingly, stages 1 and 2 are not separable (an expected result – with perfect separation, banks could choose to lend money only to borrowers who are known never to default) and some trade-off between the four metrics described above is therefore required. In the sciences, Matthews Correlation Coefficient and Mean Square Error are often used to combine the metrics into a single criterion to be minimised. However, few banks have combined the four metrics, and have instead relied on judgement to select their preferred trade-off between business rules, i.e. a semi-informed guess at the optimal boundary between stage 1 and stage 2.
Perceptron and Multi-Layer Networks
With a linear decision boundary, this approach is consistent with analysis of linear discriminants. However, the decision boundary is often non-linear and the Normality assumption mentioned is usually violated. Linear discriminant analysis can be extended to cope with non-linear boundaries by transforming explanatory variables before taking the weighted combination. This is the “perceptron”. The perceptron permits a much wider range of decision boundaries, and is trained using gradient descent. Since the confusion matrix may have regions of constant performance, perceptron training instead introduces an error function which always has non-zero gradient.
The perceptron is, however, not able to deal with any general separation problem such as SICR. A perhaps extreme example of a problem which cannot be solved using a single perceptron is the “exclusive or” problem. A generalised decision boundary can, however, be achieved by introducing successive transformations or “layers”. This is the “multi-layer perceptron”, and is the cornerstone of modern artificial intelligence. A two-layer network is sufficient to approximate any continuous decision boundary.
The general problem of how to find the boundary between stages 1 and 2 can be addressed by introducing “features” described in the IFRS 9 standard as a vector of information, and training a multi-layer perceptron to discriminate between accounts which subsequently default, and accounts which do not.
It is, however, important to recognise that:
- When a machine “learns” to mimic observed patterns of behaviour, it has not learned to articulate the rationale for its results. Therefore, the outputs of a multi-layer perceptron (and non-linear decision boundaries) can be difficult or impossible for management to explain.
- Monotonicity seems likely to be a desirable feature of decision boundaries in credit risk applications. Therefore, it is likely that sufficient SICR classification performance can be achieved using a single layer perceptron.
- It may be desirable to enforce an intuitive functional form on the decision boundary, such a “PD multiple” (straight line through the origin) or “rating notch difference” (probit or logit). This can in fact be achieved without the perceptron: Linear discriminant techniques also work if features are pre-mapped into a higher-dimensional space using the desired functional form.
A final word of caution: This kind of analysis is data-intensive and should be performed across the economic cycle. After all, the radar operator who calibrates the receiver in December may be in for a shock when birds return from their winter migration.