Lenders looking to switch to machine learning from the old ways of doing underwriting often bump up against a tough question: These ML models are incredibly powerful but incredibly complex. How do I know why I’m getting the results I’m getting?
Great question, because a lot of ML is black-box decision-making, gleaning insight from an infinite number of correlations among many pieces of information that may seem completely unrelated. That’s not a big deal if you’re using ML to optimize a digital ad campaign or guessing whether a photo on Airbnb is a living room or a bedroom. You just have to know that it works, but not how it’s deriving decisions.
But for regulated uses such as credit underwriting, the government requires an explanation of how you made certain decisions to ensure there was no bias against people for reasons including ethnicity, age, and gender. And regardless of regulatory requirements, it’s just downright dangerous to operate a lending business when you don’t know how decisions are being made. That’s why we built real explainability into Zest Automated Machine Learning (ZAML) so lenders can operate ML models safely and profitably. Let’s walk through a few ways ZAML does this.
Finding Out What Matters Amid Hidden Connections
What we call real explainability means being able to understand the significance of every single piece of data that goes into your model and how it interacts with the other variables, whether your model uses a dozen inputs or a thousand. It’s why Discover Financial Services chose ZAML to help optimize its credit risk, sometimes deriving insight from unexpected sources. As The Wall Street Journal reported, Discover found that customers using a landline or mobile phone are safer bets than those with the same credit score who use Voice-Over-IP calling services.
Real explainability is essential during model development when it’s time to figure out which variables are most influencing the prediction and how those variables are influencing the prediction. The chart below was generated by our ZAML Explain software and shows how an applicant’s traditional credit score, in combination with other pieces of information, affects that applicant’s model score (a higher score means a lower likelihood of default). Each of the dots in the chart represent a single credit applicant. The bottom axis is the traditional credit score (from below 500 to over 700). The vertical axis measures the impact that credit score has on an applicant’s model score.
Now the lender can see that variable’s true impact to risk prediction. For example, above about a 575 credit score, that variable has a positive impact on the applicant’s model score. But above a 625 credit score there is very little marginal improvement to the model score (since the line becomes almost completely horizontal). All the work that borrowers do to get their credit score into the 700s doesn’t matter to this lender. We see that dynamic a lot in our modeling work.
The spreading cloud in the upper left edge of the chart tells a different story. Below 550 there are wide differences in model scores among applicants with the same credit score. The model has learned to use other variables to differentiate borrowers in this credit score bucket. A traditional model would potentially lose a lender a lot of money lumping all 550s into the same risk bucket.
Producing Key Factors
Explainability is necessary for a machine learning model to meet the demands of compliance and legal departments — especially around questions of bias. On the model level, you need to know the importance of each piece of data on the final output from most impactful to least. On an individual level, you need to know which variables played the largest role in the decision to approve or deny a loan. That helps modelers determine if a specific variable is causing problems and if it needs to be adjusted or can just be left out of the model going forward.
Here we show graphically the top reasons a borrower was declined in one recent model, along with the impact of each of those reasons on their model score. In this example, ZAML weighted items including credit bureau scores, past due amounts, collections actions and “derogatory” fields such as tax non-payments, credit liens and bankruptcies. The top ten reasons should be the same for most borrowers, it’s the sum of variables 20 through 400 that matter.
Other machine learning explainability techniques can’t do this. Instead, they ballpark likely causes using simplified versions of the models — a risky proposition if that information is surfaced by a regulator or a lawsuit.
Disparate Impact Analysis
A good explainability solution should provide consistent and accurate analysis of what’s driving bias in a credit model. The table below shows how far apart three different explainability techniques can be at ranking variables in a model by how much impact they have on racial bias. This is dangerous stuff to get wrong. How do we know our technique is the more accurate one? We’ve written about the reasons other techniques can’t always get it right.
Monitoring For Success
Another reason lenders need total transparency into their ML models? The world changes, and with it so do customer habits and market forces. A risk factor that doesn’t affect model outcomes today may prove to be important in the months to come. To ensure a model performs as designed once it’s in production, a lender needs constant and reliable model monitoring — a transparency feature built into ZAML software. Here’s an example of one lender’s ZAML monitoring dashboard, zeroing in the input of disposable income.
As you can see, in the early summer of 2015, there was a shift in the average customer’s disposable income. That could have huge consequences if unnoticed: maybe it’s causing people to get rejected for loans who otherwise would be great customers, or maybe a bank thinks its loans are less risky than they actually are. ZAML can alert banks to the change, understand what caused it (in this example, the jump came from a change in how the Zest client defined disposable income) and analyze the effect on model performance.
Monitoring can also let you know when it is time to revise the model to reflect changed real-world conditions. In other machine learning environments, and in traditionally built finance models, this single shift might not have been caught for a quarter or a year. With machine learning models that use thousands of variables, these drifts could be occurring ten-fold. With ZAML, you’ll catch them all in real-time.
Having real transparency or explainability is essential to doing ML for underwriting. Without it, you don’t know what you’ve got. Curious to know more? Drop us a note at firstname.lastname@example.org.
Photo by Bud Helisson on Unsplash