| Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | |||
| 5 | 6 | 7 | 8 | 9 | 10 | 11 |
| 12 | 13 | 14 | 15 | 16 | 17 | 18 |
| 19 | 20 | 21 | 22 | 23 | 24 | 25 |
| 26 | 27 | 28 | 29 | 30 | 31 |
« Bafumi and Herron on whether the US government is representative | Main | R Animations for Teaching Statistics »
30 October 2008
LendingClub is a P2P lending site much like Propser. What makes them special is that they've released a full data set of all 4,564 past loans and their current status. As a data source this is extraordinary, since most literature on credit scoring uses proprietary data. For the LendingClub data, can we beat the FICO at default prediction by incorporating additional clues?
This post focuses on the borrower's "Loan Description," which I use along with FICO scores to predict defaults. The loan description is written by the borrower and usually pitches his qualifications and reasons for needing the money. Here's a randomly chosen example from someone who is current on his payments.
I have some credit card debt that I would like to pay-off. It makes sense to pay one lender as opposed to 5 credit card companies. I'd rather pay interest to one payee rather than split between 5 or 6.
This is a relatively short one -- the average description is 58 words long. Perhaps there are keywords in the description that impact the probability of default after controlling for the FICO score. Here's what I did to test for these keywords:
Now, the fun stuff. For our purposes define a Delinquency as either being late in your payments or having defaulted completely. The 10 words with the greatest p-values are below. I report marginal delinquency probabilities, not broken out by FICO score, simply for brevity; the actual M-H test controlled for the FICO scores.
| Word | Loans With | P(Delinquency|No word) | P(Delinquency|Word) | p-value |
|---|---|---|---|---|
| also | 215 | 0.067 | 0.140 | 0.0004 |
| need | 608 | 0.062 | 0.105 | 0.0015 |
| business | 233 | 0.069 | 0.116 | 0.0038 |
| live | 91 | 0.070 | 0.154 | 0.0057 |
| already | 64 | 0.071 | 0.156 | 0.0059 |
| other | 285 | 0.068 | 0.112 | 0.0081 |
| bills | 223 | 0.067 | 0.135 | 0.0082 |
| bill | 279 | 0.066 | 0.125 | 0.0117 |
| interest | 660 | 0.081 | 0.053 | 0.0136 |
I have good credit and am looking to consolidate all my debt into one easy payment. I am looking to get married soon so the less multiple bills we have to keep track of the better. I have two credit cards with low balances that I would like to pay off. I have a furniture debt that I would also like to consolidate and I need to overhaul the commuter vehicle my fiance will begin driving. I have no recorded late or delinquent payments on my credit. I have worked for my current employer for 5 1/2 yrs and have good standing. I am excited to join hands in marriage with my lovely fiance and the remainder balance after consolidation will be used for marraige documentation purposes. I appreciate your consideration. Thank you.As for the other words, "need" implies that the borrower is in straits of some kind, while "live," "bill" and "bills" suggest that the money will be used for day-to-day expenses rather than a targeted goal, implying a systemic negative cash flow. "Already" suggests an existing outstanding loan. All but one word ("interest") on the list enhances delinquency risk. "Business" is somewhat surprising -- people who want money to start businesses must be greater risks. Here's an example:
i am trying to buy a residential Land in emerging and booming market like new delhi where building cost is very cheap and return of investment is 150% in just six months. I intend to purchase the land build the house with my friends help who is in building house business and make a six flats/3 floor house. and sale it each one of them under USD 12, 000.00.I'm stunned something like this got funded! All in all, such keywords look like a good building block for enhancing a credit score model that goes beyond FICO scores. In a saner credit market, a viable strategy would be to fund P2P loans judged by an enhanced model to minimize default risk. Right now, however, I'd be worried that the credit crisis could wipe out all these sites at the drop of a hat.
Posted by Kevin Bartz at October 30, 2008 2:16 PM