Algorithms and Data Could Determine Creditworthiness

Leonid Bershidsky

14 May 2018, 09:00 PM IST

(Bloomberg) -- I have great respect for Apple, but I refuse to buy its $1,000 phones. Instead, I use a $250 Android device with a long battery life. In the emerging big data-based economy, however, that could cost me in ways I can’t even predict.

A recent paper by Tobias Berg of the Frankfurt School of Finance and Management in Germany and his collaborators showed that a person’s digital footprint can be as predictive of financial behavior as a credit score. One of their findings, for example, is that the difference in loan default rates between iPhone and Android owners “is equivalent to the difference in default rates between a median FICO score and the 80th percentile of the FICO score.” iPhone ownership, apparently, is a reliable proxy for higher income and thus for creditworthiness.

Fintech companies already trust big data more than traditional scoring methods. The financial industry will increasingly make judgments about us from the minutest, most innocuous traces we leave on the internet. And it’s likely the algorithmic decisions that use the statistical analysis of these traces will often be wrong. That means we should seek out research like Berg’s and give the algorithms based on it much more thought.

Here are some of the other important variables mentioned in the Berg paper, based on the analysis of data collected by a German e-commerce company that sells furniture as it processed 270,399 purchases. (It ships the furniture first and gets paid later, so defaults are observable; the annualized default rate is around 3 percent, roughly in line with the statistics for consumer loans issued by German banks and comparable with U.S. rates.)

Those who order from mobile phones are three times as likely to default as those who order from desktops. A customer who arrives at a shopping site from a comparison engine is half as likely to default as one who clicks on a search engine ad.

A customer who uses her name in her email address is 30 percent less likely to default than one who doesn’t. But it’s better if the email address is linked to a paid internet or cable package than if it’s from a free service, especially an outdated one like hotmail.com or yahoo.com. And it’s better if the address contains no numbers.

Those who shop between noon and 6 p.m. are half as likely to default as midnight to 6 a.m. buyers. Businesses can also expect more trouble from those who make an error when typing in their email address or put in their name and address in all lowercase letters.

These findings seem intuitive. People with regular habits and better self-control are relatively more reliable than those who lack those qualities. People who pay for services (and expensive devices) are likely more affluent than people who don’t. According to the Berg paper, the model based on these parameters — the most rudimentary data we provide to any site on which we have to register — is slightly more predictive of default than the German equivalent of a FICO score. A model that uses both the digital footprint and the credit score is even more predictive.

There are, however, multiple problems with this kind of modeling, even apart from the widespread worry that black-box scoring algorithms could end up making decisions on the basis of race, gender or other equally sensitive variables.

Consider this hypothetical case:

I’ve paid out two mortgages and never defaulted on a loan. But not only do I own a cheap Android device, I also give e-commerce sites a free email address with numbers in it, so they don’t spam my main address. Making matters worse, I often make purchases late at night because I’m too busy to surf shopping sites during the working day. I’m a fat-fingered typist. This pretty much rings all the default bells in the Berg model; I’m clearly not the only person with a high credit rating who does: The Berg paper says the model’s results are weakly correlated with credit scores.

I’d be lucky if the financial services provider that used a model that was similar to Berg’s also looked at my credit score and relied on it more than on the big data signals. But, given the hype around big data, that’s not guaranteed.

Imagine now that you’re being judged on the basis of far more information you have, in one way or another, provided to various data harvesters (who likely have shared it with one another). Are you single? Do you use an outdated browser? Do you own more than one cat? Have you been drunk more than once in the past year? What could an algorithm working with vast amounts of statistical data deduce about you on the basis of this information? Let your imagination run wild.

I can probably change my online behavior to cover all the Berg model bases. But, as he and his collaborators point out, “some of the digital footprint variables are clearly costly to manipulate, but, more importantly, such a change in behavior can lead to a situation where the use of digital footprints has a considerable impact on everyday life, with consumers constantly considering their digital footprints which are so far usually left without any further thought.”

Living in a surveillance society could have an upside, too. For example, as the Berg paper points out, the use of big data could improve access to credit for those without a credit score, even the unbanked. Soon, however, they, too, would be squirming uncomfortably in their glass houses.

It’s not enough to worry about the data that’s being collected from us by Google, Facebook, Amazon and every site that places a cookie on our devices. We need to start asking how the data are being analyzed and to what conclusions the analysis leads. The transparency of algorithms could be more important than our right to control our personal data; even if we’re extremely careful, we all leave digital traces.

To contact the author of this story: Leonid Bershidsky at lbershidsky@bloomberg.net.

To contact the editor responsible for this story: Max Berley at mberley@bloomberg.net.