October 2008
Sun Mon Tue Wed Thu Fri Sat
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31  

Authors' Committee

Chair:

Matt Blackwell (Gov)

Members:

Martin Andersen (HealthPol)
Kevin Bartz (Stats)
Deirdre Bloome (Social Policy)
Andy Eggers (Gov)
John Graves (HealthPol)
Rich Nielsen (Gov)
Maya Sen (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Arthur Spirling, Jamie Robins, Don Rubin, Chris Winship

Recent Comments

Recent Entries

Categories

Blogroll

Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 4.24-en


« Bafumi and Herron on whether the US government is representative | Main | R Animations for Teaching Statistics »

30 October 2008

Words and Credit Scores

Find statistical evidence that borrowers who use words like "bill," "bills," and "need" in their loan applications are twice as likely to default. This post uses freely available data from the P2P lending site LendingClub.

LendingClub is a P2P lending site much like Propser. What makes them special is that they've released a full data set of all 4,564 past loans and their current status. As a data source this is extraordinary, since most literature on credit scoring uses proprietary data. For the LendingClub data, can we beat the FICO at default prediction by incorporating additional clues?

This post focuses on the borrower's "Loan Description," which I use along with FICO scores to predict defaults. The loan description is written by the borrower and usually pitches his qualifications and reasons for needing the money. Here's a randomly chosen example from someone who is current on his payments.

I have some credit card debt that I would like to pay-off. It makes sense to pay one lender as opposed to 5 credit card companies. I'd rather pay interest to one payee rather than split between 5 or 6.

This is a relatively short one -- the average description is 58 words long. Perhaps there are keywords in the description that impact the probability of default after controlling for the FICO score. Here's what I did to test for these keywords:

  1. Find the 300 most common words in all loan descriptions.
  2. For each word w, test the hypothesis that use of w is conditionally independent of delinquency given the FICO score range (six ranges from 640 up). I apply the Maentel-Haenszel test. Note that for simplicity I am ignoring the survival analysis aspect of the problem here (i.e., some loans are newer than others) for simplicity since all loans are relatively new anyway (Lending Club started in January of 2007).
  3. Order all the words by test's p value. Check that the distribution of p values is non-uniform to ensure significance in the presence of multiple comparisons.

Now, the fun stuff. For our purposes define a Delinquency as either being late in your payments or having defaulted completely. The 10 words with the greatest p-values are below. I report marginal delinquency probabilities, not broken out by FICO score, simply for brevity; the actual M-H test controlled for the FICO scores.


WordLoans WithP(Delinquency|No word)P(Delinquency|Word)p-value
also
215
0.067
0.140
0.0004
need
608
0.062
0.105
0.0015
business
233
0.069
0.116
0.0038
live
91
0.070
0.154
0.0057
already
64
0.071
0.156
0.0059
other
285
0.068
0.112
0.0081
bills
223
0.067
0.135
0.0082
bill
279
0.066
0.125
0.0117
interest
660
0.081
0.053
0.0136
Some speculative reasoning: A word like "also" implies that the loan will be used for more than one purpose, which points to a heightened risk. Here's a randomly chosen delinquent borrower who used "also." It's clear that he has multiple goals in mind for the money and has obviously racked up quite a bit of debt.
I have good credit and am looking to consolidate all my debt into one easy payment. I am looking to get married soon so the less multiple bills we have to keep track of the better. I have two credit cards with low balances that I would like to pay off. I have a furniture debt that I would also like to consolidate and I need to overhaul the commuter vehicle my fiance will begin driving. I have no recorded late or delinquent payments on my credit. I have worked for my current employer for 5 1/2 yrs and have good standing. I am excited to join hands in marriage with my lovely fiance and the remainder balance after consolidation will be used for marraige documentation purposes. I appreciate your consideration. Thank you.
As for the other words, "need" implies that the borrower is in straits of some kind, while "live," "bill" and "bills" suggest that the money will be used for day-to-day expenses rather than a targeted goal, implying a systemic negative cash flow. "Already" suggests an existing outstanding loan. All but one word ("interest") on the list enhances delinquency risk. "Business" is somewhat surprising -- people who want money to start businesses must be greater risks. Here's an example:
i am trying to buy a residential Land in emerging and booming market like new delhi where building cost is very cheap and return of investment is 150% in just six months. I intend to purchase the land build the house with my friends help who is in building house business and make a six flats/3 floor house. and sale it each one of them under USD 12, 000.00.
I'm stunned something like this got funded! All in all, such keywords look like a good building block for enhancing a credit score model that goes beyond FICO scores. In a saner credit market, a viable strategy would be to fund P2P loans judged by an enhanced model to minimize default risk. Right now, however, I'd be worried that the credit crisis could wipe out all these sites at the drop of a hat.

Posted by Kevin Bartz at October 30, 2008 2:16 PM