Application of Natural Language Processing on Customer Feedback’s to Mortgage Banks: Uncover theme for each narrative- A sample working session between a banker and a data scientist

This article focuses on studying customer narratives during home loan origination and servicing during mortgage banking process, collected for several financial institutions. Federal laws mandate that agencies share these consumer narratives in open domain.

The results displayed here are a subset of what underlying insights can be discovered or what actionable can be recommended to business stakeholders. Note that the application of techniques is based on problem of interest and data/infra limitations. For example, depth of performing Exploratory Data Analysis is limited by knowledge of loan level attributes, the application of Natural Language Processing techniques to determine underlying themes may be more accurate with more powerful infra;

We may deduce groups of similar feedback's that talk about a similar problem (clustering, topic modelling) , OR, build a regression model to gauge how many customer feedback over time (Time Series Analysis, Regression), OR, a model that can evaluate real time the sentiment of an incoming feedback. What we want to do here and how business wants to visualize all these aspects (tableau or simple graphs) is all subjective.

For the sake of this article, I focused on a detailed EDA followed by topic modelling of consumer narratives, via conversation between imaginary banker and a data scientist :)

The study starts with picking a data set of interest, say customer narrative datasets for Mortgages across few banks we are interested in.

In a typical business/consulting setup, let us see how this project can be viewed:

Mr. Banker: A mortgage banker/Product Owner/C level executive perspective

Mr. Scientist: A data scientist perspective

Mr.Banker to Mr. Scientist:

“ I am a servicing lead looking for improving customer experience in mortgage space. At this point I don’t really know what I can do with the data you shared. In fact, all through time I was relying on pure numbers shared by business reporting teams and sales teams, now that you said you can also share insights from descriptive data or consumer narratives, .. may be, I‘d like to see what we can find:”

Mr. Banker’s goals, for example could be:

  1. Understand underlying themes making customers unhappy during loan origination process or certain servicing practices that are revelations that should be fixed for my bank
  2. Gather insights comparing my firms practices with (ex: a monolithic bank vs a nimble, data driven start up firm) that may help refine my marketing and sales strategies, or anything, including my product development roadmap

Mr. Data Scientist to Mr. Banker:

“Sure, I think we can start by taking in data set from CFPB specific to your bank. I will show the summary stats for starters. You, being a domain expert, may already be aware of some stats; Anything you find it interesting will be a bonus..

You however should be telling what specific areas you want to me delve upon next, to help me start feature engineering and then maybe apply actual techniques to give you insights unknown so far”…

“ And oh.. there is a lot of data cleansing, understanding of data attributes involved.. so we need to start meeting iteratively for me to start feature engineering on the data set.“

Mr. Data Scientist’s tasks:

  1. Explore given data set and find immediate inferences that can be summarized to business
  2. Apply various data cleansing and feature engineering techniques
  3. Iteratively talk to business to perform EDA on data and arrive at datasets that business may want to deep dive in (when problem is not known)
  4. Apply Natural Language Processing (on real-world, genuine data sets which article runs on)
  5. Even without a domain expert to suggest, the data set can be used to try a host of unsupervised learning techniques to perform clustering, prediction, entity recognition etc;
  6. Visualizing the outcomes to business users of interest
  7. Identify unknown problems to business revealed during prior steps

Exploratory Data Analysis

Let us look at high level where most complaints are registered across all firms

List Of Top 30 companies with most registered customer feedbacks
firList Of Top 30 companies with most # of customer feedback’s
# of customer feedback’s per product type across all firms

Mr. Banker : “Thats good enough info, since I belong to Mortgage space, let us see how the stats look for all complaints for below players I am interested in:


Mr.Data Scientist: “Ok, here are how the stats look for the companies of your interest”

Distribution of(All Types of) complaints reported against Chase,BOA,Wells,Citi,Quicken
Distribution of(All Types of) complaints reported against Chase,BOA,Wells,Citi,Quicken

Inference #1:

89% of all complaints of Quicken loans are about Mortgages

50% of feedback’s Bank Of America and Wells Fargo received from customer are for mortgages

33% of all feedback's received by JPMChase and 30% of all received by Citibank are for Mortgages and Credit Cards respectively.

Mr. Banker :”Hmm.. Makes sense, since Quicken loans is primarily a mortgage firm. I want to see the firms outside these that are seeing lot of complaints on mortgages”

Distribution of Mortgage complaints reported in CFPB (Top 30 reported companies)

Inference #2:

Ocwen financial is contributing to 11% of total customer feedback’s for mortgages, this wasn’t expected by Mr.Banker and he may want to to add it to his list of analysis.

Mr.Banker:”For my bank, I want to see spread of consumer disputes that resulted in monetary relief and/or just closed with explanation ? And oh, I want to see what issues are causing them the most”

How Complaints are Closed For Mr.Banker’s firm
How Complaints are Closed For Mr.Banker’s firm
How Complaints are Closed For Mr.Banker’s firm
The main issues resulting in customer complaints needing monetary relief
The main issues resulting in customer complaints needing monetary relief
The main issues resulting in customer complaints needing monetary relief during mortgage process

Inference #3:

For Mr.Banker’s firm, 3.2% of all consumer disputed complaints resulted in closures with monetary relief, while 5.3% of non-disputed items resulted in monetary relief.

Loan Servicing and Payments, Escrow Account related issues, followed by Loan Modification, Collection and Foreclosure related issues standout as main contributors for monetary closures.

Mr.Banker: Fine, can we see what sub products within mortgage products customers are talking about the most ?”

What are the major issues recorded against each mortgage product type ?

Inference #4:

Loan modification and servicing processes are reported as top contributors for: ARMs, conventional fixed mortgages, Reverse Mortgages

80% of all reverse mortgage loans are reported with issues related to loan servicing processes

Noticeable that for HELOC loans, Conventional Home Mortgages and Other type of mortgages, 50% of all complaints are talking trouble during payment process.

Listing Top-Down, major states where complaints are reported
Listing Top-Down, major states where complaints are reported

Mr. Banker: “Can we see the states in which loan servicing or modification process is really problematic ?”

Distribution of complaints per issue type reported in states contirubing to atleast 3% of total complaints
Distribution of complaints per issue type reported in states

Inference #5:

16% of all complaints are reported from State of California (CA), followed by TX, NY and FL. Across all these states, 40–50% of complaints are again related to loan servicing and loan modification, collection, foreclosure issues.

Note that trouble during payment process also is a significant factor for all states

The issue types reported the most (significant) are result of which sub-product ?

Inference #6:

42% of issues during loan origination phase and 31% of the issues reported during loan servicing are reported for Conventional Fixed Mortgages

52% of issues during loan modification/foreclosure process are reported for ‘Other Mortgage’ types

Mr.Banker: “This level of information is good.. It gives me some insight into what major areas we should focus on broader level.“

Mr.Data Scientist: “Yes, so far we just drew summary stats and dug one level deep based on our area of interest. That said, we haven’t yet done anything with actual complaints, right.. Let us go there ..”

This section is focused on mining of consumer comments/feedback’s. Remember my starting note that we can do multiple things with our data set, but since we know Mr.Banker is interested in specific areas related to servicing and what is causing them — And the info we want to mine is not structured (meaning: in simple words, something that is not in rows vs column format), we have to do LOT of work to deduce meaningful info.

When dealing with textual data, there are few basic use cases and related techniques I will present here, and of course, real world outcomes that will be used by Mr.Banker. For starters, here are some:

  1. Topic Modelling
  2. Sentiment Analysis
  3. Topic Prediction
  4. Entity Recognition

.. and others, based on need.

Problem Statement

For our use case, Mr.Banker would get help to know what are the major underlying topics or themes for those cases where customers are complaining the most. So, lets do Topic Modelling of our CFPB data set per Mr.Banker’s interest.

Understand the data structure in detail, and cleanse the data set

Perform stopword removal and lemmatization: Not just language specific here; Need to talk to Mr.Banker to ensure we don't add words that could skew the generated themes

Generate own corpus from cleansed and lemmatized documents

Create Count Vectorizer and TFIDF Vectorizer — using scikitlearn (or any other package)

Filter the features list from Vectorizer, to create a ‘Dictionary’ and corpus object to be later used for Gensim and LDA Mallet. Use it to draw features per row and include them as list of input features across corpora for creating the dictionary for gensim model creation

Try a Random model, say LDA model: Use Mallet Implementation, and Visualize the topics

Plot coherence values corresponding to the LDA model on a range of learning decay and varying number of topics: Try this for various implementations including Sci kit as well

Select optimal model: While doing so, consider interpretability by Mr.Banker later

Visualize the generated themes for review with Mr.Banker who will now get a picture of what customers are ‘really’ talking about in their complaints

Let me first demonstrate the outcome of underlying themes as a result of the above process, which Mr.Banker might be interested to look at.

Of course, will add content of interest for data scientist/ML engineers in following sections on how to translate the above steps into code, show sample code output at each step, justification of choice of algorithms etc.

Optimal (Topic) Model

Through the above steps, programmatically we discover that there are several underlying themes (or points/concerns/areas) being talked about by customers. How we choose the accurate number of topic and the accuracy of this outcome, can be evaluated by various means:

Eyeball validation: A domain expert himself can make a quick validation of the topics our algorithm detected makes sense or not. In his field, he would be aware of what things customers might mostly give feedback about, and these should be at least coming out as themes.

Model Interpretability: The themes our ML model reports should be interpretable, they should not be too many topics that are fit into a model as it reduces clarity for human to separate them which hurt further decision making process; Each topic or theme should be as coherent as possible.

Coherence score: It is an indication of syntactic similarity of highest occurring words in a topic.

While there are other aspects, lets visualize how optimal model and # of topics is identified through a graph:

Determining optimal model based on coherence score comparing 2 varying Implementations of LDA
Determining optimal model based on coherence score comparing 2 varying Implementations of LDA

Mr.Banker: “OK, so we started with the large data set, ran some summary stats and found that all feedback's related to loan servicing and payments, loan modification and foreclosure were reporting highest # of customer feedback's or complaints; and we zeroed on those narratives I asked since I want to know what customers are actually talking about; And then you applied some ML techniques and uncovered that there are 5 underlying categories that all feedback boil down to ?”

Mr. Data Scientist: “Yes”

Mr. Banker: “How can I know what each of these refer to.. Is there a name your program automatically assigns it to? And how do I know that its accurate ?”

Mr. Data Scientist: “The accuracy part is one, which can be explained programmatically via graphs above. That said, lets not just lean on accuracy alone, for this kind of problems, the interpretability from domain side is equally important. For the name part, it is you who has to look at one level deeper for each topic, and be able to interpret what it is talking all about; And just give a name”

Mr. Banker: “OK, Please show me the accuracy measures first, and then will look at the topics your code discovered”

Themes discovered by our model based on best coherence score
Top Occurring Words in Themes Uncovered
Top Occurring Words in Themes Uncovered
Top Occurring Words in Themes Uncovered
Underlying themes in our corpus
Underlying themes in our corpus
Visualization: Clusters of themes in the corpus of interest

Mr.Data Scientist: “The theme 0 is a mixture of top occurring words such that 6.5% of this theme is made up of ‘escrow’ word, 5.9% with ‘interest’ , 5.1% with ‘pay’. The theme 1 composition: 13.6%: ‘payment‘, 4.6%:’pay’, 4.4%: ‘credit’ and so on.. I think looking at this distribution, can you name each theme now ?

Mr. Banker: “Yes, it makes sense now. I can almost name each theme, now that I also see the visualization of how each theme is farther from each other, I can relate that these topics are not overlapping each other, meaning I can separate and investigate individually”

Now that our banker was demonstrated what underlying themes for unstructured text data (our customer feedback's) related to servicing and modification issues, our next activity is to delve a level deep and show him how each customer feedback in our dataset is related to each theme.

We will now compute and determine:

  1. Document-Topic Proportion: Identify dominant theme for each customer narrative, ie; statistically, what is the closest amongst the 5 themes each narrative is closest to.
  2. The % distribution of other themes the given narrative is aligned to.
Topic Proportion distribution for each consumer narrative(also called document)
Topic Proportion distribution for each consumer narrative(also called document)

We have thus categorized all complaints against servicing and modification process into 5 groups.

Mr. Banker: “Great so far, I can see how close each narrative is to a theme. Can you put a threshold limit and split the dataset so that only those documents/complaints that align with its major theme >40% is to be investigated further ? ”

Mr. Data Scientist: “Sure, here it is..And, let me tell you.. We only saw one facet of what NLP algorithms can help. Based on your interest, please get back to me. We can do a lot more still. Here are some more algorithms and techniques I can apply and can be your next use cases:

Forecasting: The count of feedback's that we may receive over time, looking at trend and seasonality effects

Classification of incoming new complaint: Predict Topic for a new, unseen outside trained corpus

Sentiment Analysis: On existing complaints, and can segregate our datasets based on this.

Entity Recognition: Identify Parts of Speech, Entities, Tags, Actors from given corpus and determine relationships between them.

Mr. Banker: “Definitely; I think I will want to know more of how identifying entities and patterns in text will help me isolate deeper issues and try to help improve my servicing and loan modification processes. Lets catch up sometime”.

To be Contd..

It is to be updated for programmers and data scientists to check on the steps in detail, look at sample code and evaluation of algorithms implemented and alternatives.