We get asked a lot by banks and lenders if AI or machine learning will predict the next recession, or at least help them weather a turn in the economy. Our answer: Not really, but we think properly monitored AI can tell you that an economic turn is happening as soon as it starts so you can fix your credit models quickly.
Let’s just talk a little about machine learning monitoring. Not the sexiest subject, we know, but it’s super important if you want to stay in business and stay compliant with the Federal Reserve’s guidance on model risk management. The Fed, FDIC, and the OCC are clear that to do model risk management properly you have to monitor your models, and the monitoring method should reflect the modeling method you’re using. We couldn’t agree more. If you use the wrong monitor to watch your ML models, you’ll miss really important stuff that could land you in hot water.
An ML underwriting model is a bit like a Formula 1 car: an amazing feat of engineering that requires a good bit of monitoring for optimal race-day performance. F1 vehicles are dripping with up to 150 sensors per car, and often generate 3 terabytes of data per race that engineers track both in real-time and post-race. This data is also used in real-time for things like setting tire pressures at the next pit stop. ML lending models similarly need continuous monitoring and iteration to ensure they’re working efficiently in a world where economic and lending conditions are in constant flux.
Models aren’t designed to deal with lots of external change; they’re making predictions based on the data they’re trained on. They’ll return safe credit decisions only as long as the profiles of new loan applicants match the model’s training population. In the real world, applicant pools shift over time, and an algorithm trained on one population will start to mischaracterize applicants from different or new populations, resulting in an unpredictable and unsafe model.
Lenders have a few methods in their toolkits for monitoring models. They can monitor inputs for out-of-range values (really high incomes, for instance, could indicate fraud); they can monitor model scores using measures like the Population Stability Index (PSI), which measures how much a variable has shifted in distribution between two samples or over time. Shifts in score distributions can indicate a change in the underlying applicant pool. These methods work great on linear models because each input is considered separately by the model to arrive at a score. But machine learning models are different, scrutinizing subtle interactions to arrive at a more accurate prediction. Something different is needed to adequately monitor ML models and the interactions they capture.
Our view is that you need to use ML to monitor ML. Our real-time model monitoring, a built-in feature of our ZAML suite of tools, has a monitoring model running in parallel to the scoring model. The monitoring model spots anomalies and insights in real-time, catching problems before they catch the lender. The software uses advanced statistical and ML techniques to monitor incoming applicant data and checks to see how they correspond to the hundreds of input variables that the scoring model was trained on. The ZAML monitoring approach doesn’t just check input ranges, it monitors multivariate interactions, ensuring the ML models you put into production are working properly.
Checking inputs and their interactions is key. If you just check outputs, you might miss something. Here’s an example from a Zest customer. In the first chart, things look stable, based on model outputs and their distribution according to the PSI metric. But in the second chart, you can see that ZAML’s monitoring tool has flagged that the population has shifted. Its multivariate analysis has captured increasing anomalies in input data from new applicants, which the scoring model has not been trained to use in its predictions. That’s good insight a lender can use to decide it’s time to refit or retrain the scoring model. The method works by measuring the error rate of the monitoring model and calling out big errors (e.g., at the 95th percentile).
Interestingly, the above charts were generated by a Zest customer in a rapidly deteriorating economy. Without ZAML tools, this customer would have unwittingly thought no changes to its model were necessary. The chart on the left, the PSI score monitor, shows no change to the model’s output distributions. The chart on the right shows the change in input distributions one might expect from a major shift in the economy like the one being experienced by our customer. Sad that the chart on the left is recommended practice in the industry.
Without ZAML tools, a lender might have missed this impact to the model’s predictive power. In some lending businesses defaults take decades to materialize (e.g., in 30-year fixed mortgages), so it’s really important to catch these things before defaults start to pile up. Because of ZAML, this lender was able to see the risk to their loan book early, analyze it further, and correct it by building a new model to address a problematic segment (customers new to the bank).
The right monitoring can help financial organizations weather political and economic volatility. Real-time ML monitoring can drill down into what is causing outliers and recommend a retrain based on updated information.
World number one F1 driver Lewis Hamilton knows when he steps into his car he’s got a team of engineers monitoring every valve and connector, and a meteorologist advising him on weather conditions as they’re about to change. That’s where lending is going, too.