How an engineer turned a data problem into a technical solution
An interview with a sentiment and intent analysis expert
Our team recently published a piece on how we analyzed 200,000 messages in a live chat service to design a chatbot. We’ve been getting questions about how we gathered the data, what algorithm we used to identify intents, and how we translated them into business insights. So we’re doing a multi-part series to answer all these questions, with insights from our very own machine learning engineer Marcial Puchi.
Here’s Part 1: How we discovered the business problem with data and came up with a technical solution.
Business Problem Discovery
One of our clients is a Fortune 500 real estate and homebuilding company. While we were working on an internal chatbot for their employees, we were also brainstorming on an external chatbot for their customers: homeowners and homebuyers.
In order to design the chatbot, we needed to know: why are people visiting their website?
We originally intended to do interviews with the customer care team to see what customers asked about the most.
But gathering qualitative data from interview is often supplementary, not comprehensive. The results are biased because people only remember the highlights and the lowlights.
As such, we asked if we could analyze data from their live chat service instead, looking at conversations between website visitors and customer service representatives.
Defining useful data
We started looking at the data format to structure it into an efficient format to run the algorithm through with minimal processing time. Here, we defined a conversation as a group of messages that were exchanged by 2 or more people.
Once we limited the dataset to conversations, we defined which data is useful. For example, what agents said to the customer were not useful because they follow a script. Messages initiated by the customers, however, are helpful in telling us what they’re interested in. The latter type of data is what we put through an algorithm to group together groups of similar information, in a practice known as intent clustering.
Setting a plan
Marcial first ran all the messages through a standard set of tasks needed to perform natural language processing. The reason we use a common standard algorithm is that it removes words that do not affect the overall meaning of the sentence — like stopwords. We don’t need to analyze stopwords like prepositions because whether or not they’re there has no impact on the sentence meaning.
However, we do need to categorize words into categories of verbs, subjects, nouns to disambiguate different types of words — a process known as part of speech tagging. There are words that can be a noun or verb like “model” so we need to know which context it’s used in to assign into the correct group with similar meanings.
A machine learning model created by Google allowed him to transform a given sentence to a vector-space representation of 300 dimensions.
So what did we find out about our client’s data?
Once the data was cleaned up from stopwords, he could search for the remaining words in the sentence, then use a clustering algorithm to help him cluster the sentences with similar meanings.
What did we do with these identified topics?
Based on these topics, we designed conversations that should take place to address these topics. By knowing the customers’ needs and wants, we design a product for human behaviors, rather than enforcing human behaviors to fit the mold of the product.
Look out for Part II on how Marcial’s sentiment analysis work on social media content inspired intent analysis of customer inquiries.
P.S. We’re currently offering companies the chance to get their own data sample analyzed. Check it out here.
How an engineer turned a data problem into a technical solution was originally published in Chatbots Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.