Alternative data has gotten a bad rap. In the world of credit risk, alternative data is everything not included in a credit bureau report. But to many people outside that world, alternative data includes information gleaned from social media sites and anywhere else customers may share information about themselves — information that we’ve found brings little value and lots of headline risk to institutions that touch it.
So let’s distinguish between extra data that’s useful and extra data that is not truly useful. Let's call the former inclusive data. Inclusive data is non-credit-bureau data that is highly predictive, appropriate for financial services use and, really, what any modern credit model worth the effort should be using.
Folding in information beyond the credit bureau's summary data sets is imperative if we’re going to make the credit system fairer and more accessible to everyone. Consider that the credit industry has remarkably little information on hand about people who don’t fit into the neat credit box. Some 30% of Americans don’t have credit cards, and 10% don’t even have a credit score.
Credit professionals know there are a wide variety of reasons for that. Someone without a credit card may be a young person who prefers to use Venmo and debit cards. Or they could be a Ph.D. who has just immigrated to the U.S. to join the staff of a university and has yet to establish credit locally.
Assuming that these people are bad risks just because they aren’t typical credit-bureau types limits the pool financial institutions can pull from when offering loans. Additional data would be a way to find worthy borrowers while at the same time excluding riskier ones by painting a more well-rounded image of each potential customer. Using more data points is one way that Zest Automated Machine Learning (ZAML) consistently helps our clients both raise revenue without raising risk and lower risk without cutting revenue.
An analysis we did with an auto lender found that alternative data contribute little to the model’s performance.
The bad rap alternative data gets fades away when you restrict its usage to inclusive data. Lenders don’t even have to go very far to find it. A major U.S. credit card issuer we worked with increased its loan approval rate by 10% with no added risk by using the transaction and CRM data it already had. An analysis we did with an auto lender found that alternative data contribute little to the model’s performance. Often just switching from a simple logistic regression model to an ML model built off the much bigger raw data set in a person’s credit file — not the summaries — is enough to improve loan portfolio profitability.
But widening the search for sources that are adjacent to a credit file yields some valuable options. One type of inclusive data point is rent history, which usually is missing from consumer credit files, especially when the renter has a smaller-scale landlord. (Some larger apartment companies do license their rental records.) A steady history of making the rent is a very positive indicator that would help a lot of people if it were included in credit files. And there are lots of other inclusive data types that can be predictive as well. People can live full financial lives outside credit bureau data by, for example, buying a car at an independent, self-financing dealer or preferring to use prepaid debit cards for shopping.
Any expanded data usage should start with common sense. Consider the headline risk of using quixotic and odd data sources and correlations. Consider too how you’ll explain to regulators or attorneys why a credit model used a variable based on a borrower’s Instagram posts or Netflix watchlist. Your customer base will be the ultimate judge of what they consider fair game when you’re evaluating their credit request. More data is useful but not all data is equally important, so don’t worry so much about alternative data., Instead, focus on inclusive data.