If you’ve considered using machine learning (ML) in your credit or lending business, chances are pretty good that you have questions. Banks are coming around to realize that ML can do a better job finding more good borrowers and fewer bad ones. But once you start digging a little deeper, things can get a little confusing.
We’re here to help.
Participants in a recent Zest webinar with our CTO Jay Budzik, participants had the following questions about implementing machine learning. Here are Jay's answers.
Watch the original webinar How To Make More Money With Machine Learning and be sure to sign up for our next Zest webinar in the series: How To Build Transparent ML Models With Zest.
Do you have any credit unions as customers?
Yes, in fact, we do. They're actually some of our favorite customers because they have a charter to help their communities.
Can you implement machine learning in a live interview with the applicant?
Absolutely. We have folks who use a machine learning-based score in their manual underwriting process, either via a call center that's filling out an application over the phone or in an auto dealer setting, for example.
What has been your most surprising experience with implementing machine learning in the financial sector?
That's a great question. Frankly, just the scale of the impact that you can achieve using machine learning was a surprise to me. That simply from moving from a logistic regression framework to a machine learning framework, you can make such huge gains.
What alternative data sources would you recommend? Which portions of bureau or application data help to increase model effectiveness?
We've actually seen quite a lot of good results from alternative data. I can't recommend particular vendors, but it depends primarily on the product that you're offering and the segment that you're after. But we've seen very good results from public records data, ID verification data and the like that has broader coverage and allows you to underwrite those folks who might have no hit from the credit bureaus.
Can you discuss further the need for enhanced model monitoring on the input data side, as in doing it on a more frequent basis compared to the standard regression model?
ML models look at interactions between the variables. Those interactions mean that every combination of every variable with every other variable is a term in the model. If you think of a regular logistic regression model, it's a simple linear equation. There are no interactions. Each of the variables stands on its own. But if each of those variables is combined with all the other variables, you have lots of combinations that are running through.
So a standard monitoring approach would be to look at each variable. If you've got your income normally falling in a certain range and it stays within that certain range, then you might think everything's OK. But what happens if your income is combined in the machine learning model with another factor, like your length of employment? The model actually relies pretty heavily on that combination. Well, if my length of employment stays in the same range and my income stays in the same range but the ratios don't and I only used a univariate approach, I'd miss the fact that the combination of these two changed, because I was considering each only in isolation.
So that's the kind of thing that you need to watch out for with machine learning models. You need an approach to monitor not just the univariate distributions, but the combinations, the multivariate distributions. That's kind of the approach that we provide with our monitoring methods.
Can you implement machine learning in a database less than a year old and with less than 100,000 records?
Absolutely. In fact, we've helped some pretty large companies develop their first lending products, so they had zero data to start with other than maybe some behavioral data. The way that you go about that is that you do some testing to gather more data. But even a database of 100,000 records, as long as we have enough defaults, we should be fine to do a project on.
How do you think a lender can leverage machine learning for pricing?
That's a great question. Most risk-based pricing schemes are based on cuts on a score. If you have a FICO score of, say, 700 and above you might have one interest rate, another interest rate for 600 to 700, and another interest rate for 500 to 600, for example. Now, those kinds of pricing rules are pretty easy to understand. But they're based on a FICO score. That credit score has a certain accuracy associated with it. It's rank ordering the risk of these applicants. And a credit score is sort of a general-purpose thing. It doesn't know what products you're applying for, it doesn't know what normally happens with your lending product in your community with your database of consumers, and so it's limited in its ability to have predictive accuracy.
If you replace the credit score with a more accurate machine learning model and then do the same type of thing you'll end up with more accurate pricing that actually reflects the underlying risk. What that means is that the folks you're giving a great deal to deserve that great deal. They're not going to go bad, and so you're going to be more profitable. And the folks that you're giving maybe a little bit of a higher interest rate to are actually going to default at a higher rate. You can be more sure of their default rate walking into that pricing decision, and so your yield is going to be better on that as well.
And because you're offering people the right price, and maybe your competitors aren't because they're using a traditional model like a FICO score, those consumers are going to now be more likely to accept your better pricing. So there's a really pretty interesting competitive advantage you can create by using risk-based pricing in conjunction with a machine learning model. In fact, our customer Prestige Financial Services leveraged its machine learning model to do risk-based pricing and saw a doubling in their lending business as a result. One of the factors was an increased take rate as a result of doing better pricing.
Can unsupervised learning be used to create underwriting models?
Sort of. Machine learning can be classified into two categories: supervised and unsupervised. Supervised means you have a labeled outcome — something that's good or bad or you’re trying to predict cats or dogs -- and have to assign the right label. Unsupervised machine learning says, "I don't have a labeled outcome. I'm just going to take a look at trends in the data generally and get something out of it." It sort of automatically generates its own outcomes.
Typically, unsupervised machine learning involves clustering or coming up with sort of the things that are like each other and separating the groups. Those models are really great at working with unstructured data like texts, voice data, things like that, that don't have labels associated with them but that you want to use in a supervised setting later on. You can use unsupervised learning as a pre-processing step sometimes if you're looking at bringing in some of that data like call center data and stuff like that. Oftentimes we find that it's just not necessary to move into unsupervised machine learning approaches to get that big initial first win out of your adoption of this new technology.
The other thing that some people mean by unsupervised machine learning is that you kind of set up your algorithm to automatically retrain itself. In the machine-learning academic literature, that's called online learning or reinforcement learning. That type of technique we think is really cool. But I would be cautious about using that in a financial service setting because you want to review the model every time, make sure that it's right and not introduce something into your consumer business that's going to create harm. We often recommend that instead of automatically refreshing your models, especially in underwriting, that instead, you set up a sort of automated pipeline so that you can refresh them quickly, but that you put in a manual step to review the results before they go out in front of consumers.