When my kids' school sent me a survey with 15 open-ended questions, I volunteered to help them analyze the data. I was curious what other parents said, but I also wanted to test ChatGPT on this feedback. In this blog post, I’ll share my approach and the prompts used for the analysis. At the end, I’ll summarize the pros and cons of using ChatGPT for analyzing customer feedback and our approach of using Large Language Models at Thematic.
The dataset had 100 responses to 35 survey questions, 10 of which were open-ended. The majority of parents have answered at least two open-ended questions. Some were tied to a score. For example, a 5-scale rating question:
“How strongly do you agree with this statement: Our School allows children to find their place and experience a sense of wellbeing” was followed by “Why do you say that” open text field.
Other questions were independent: “What do you see as areas for improvement at our School?”.
Traditionally, this data is analyzed in huge spreadsheets manually by creating a code frame (a taxonomy of themes) and then categorizing each answer according to this code frame. At first I considered Thematic, an automated solution to discover themes from feedback. But it seemed like an overkill for this school’s needs for two main reasons:
The dataset was small: 100 responses. Thematic becomes useful from 1000 comments or more where you can only review a subset.
Enter ChatGPT!
I decided to work on a prompt to analyze this feedback to see how well it can create a code frame automatically and to understand any limitations. Below are the step-by-step instructions on how to do this task.
Finding the best prompt takes a few iterations. The general idea is that the more context you provide the more accurate are the results. But you need to keep the prompt and the data within the same context window. Here are some helpful tips on how to write a feedback analysis prompt:
Here is the main part of my prompt:
Simply copy and paste open text data “as is” into the prompt. You don’t need to clean it. However, if it doesn’t fit into a single context / prompt window, you will need to do it in batches.
Obviously, make sure that there is no private data included.
You might get satisfactory results straight away! But more common than not, you will need to solve for errors since, unfortunately, ChatGPT will create duplicates of the same themes.
If you have to split your data into batches, the themes will be named differently. You might want shorter names for your themes than the ones I received in my analysis. In this case, you can add that requirement into your prompt.
Here’s how I manually merged themes that mean the same thing for my final report:
I also reviewed the data to make sure no themes were missed. This step is critical if the accuracy of the analysis matters to you. Most commonly, ChatGPT made the following mistakes:
For example, in our school’s feedback, “Learner-led conferences” and “Parent-teacher interviews” were the same thing. In both cases, the child is updating the parent on their progress in front of a teacher. I don’t expect ChatGPT to know it, but somehow I need to teach it this knowledge. If a person not from our school would be analyzing the data, I would also need to teach them this knowledge.
My favorite part of using ChatGPT to analyze feedback was to verify if the themes were correct. Some themes seemed incorrect because I did not pick them up by scanning the data. So I wanted to see evidence! I simply asked ChatGPT to list relevant comments for that theme. ChatGPT obliged and I was made aware of my bias for missing these themes on my own.
But I also noticed occasional mistakes! Mostly it was the themes ChatGPT missed. Thankfully, I could verify the results by reading the data. But the more data you have, the harder it is for ChatGPT to do the analysis! This would have been more difficult if the feedback was split across multiple prompts.
Here’s a summary of the things I liked and didn’t like about using ChatGPT for analyzing feedback:
✔ ChatGPT was fast! It would have taken me 1 or 2 hours more to analyze the same data manually.
✔ I enjoyed most the fact that I did not need to clean the data. ChatGPT handled typos and spelling mistakes with grace.
✔ I could use the ChatGPT interface to work “with the AI” to validate themes.
However…
✖ ChatGPT did not create any charts. This was all manual work, and occupied the bulk of my 2h of preparing the report for the school. And if I wanted to change what I was reporting on, I would need to re-do the whole process.
✖ I could not segment the results by other survey responses, or let’s say metadata about respondents (e.g. child’s gender, ethnicity, family income). For this survey it wasn’t needed, but for company feedback at scale, you often want to segment by customer value, location etc.
✖ Finally, when we experimented with larger datasets, we found that ChatGPT can handle a maximum of 20 themes and was less accurate the more themes we wanted to discover.
So! If you need to analyze 1000s of feedback comments or more, if you need consistent analysis over time, if you need segment-specific insights, and if you need to share the results with others in the company, ChatGPT will quickly fail your needs.
Conclusion: ChatGPT is a great tool for analyzing feedback in small one-off surveys, but it hits many limits for large-scale analysis and reporting.
At Thematic, we use large language models (LLMs) such as GPT4 with our own algorithms, to make it easier and faster to get specific and reliable answers. This way our customers get the combined benefits of our own AI, of LLMs, and our intuitive platform.
For a quick overview of how Thematic - now infused with LLMs - meets an organization’s needs when it comes to analyzing larger volumes of customer feedback, check out the table below:
Analysis needs | Why LLMs can't deliver | How Thematic delivers |
---|---|---|
Transparency is key. I need to trust the analysis quality. |
Probabilistic black box. | Our AI and User Interface make it easy to view what makes up a theme’s quality. |
Flexibility is key I need to guide the AI on our company’s point of view and refine analysis. |
The prompt engineering is complex and specific. | It's simple to fine tune themes with a drag and drop UI. |
Consistency is key. I need to deliver reliable and trusted results. |
Delivers different output each time. As data increases, inaccuracies increase. |
The results are reliable. You get consistent results, unless you request the AI to deliver a new lens. Sends only relevant feedback data into LLMs, for cost effective and accurate results. |
Granularity is key. I need to manage 100s themes. |
Theme range limited to 20. | Theme range can go beyond the 1000s. |
Analytics tools are key. To analyze the data to inform business decisions. |
Analyzes text feedback to deliver a list of tags, describing the themes. | Analyzes text feedback, to deliver a taxonomy of relevant themes, along with sentiment analysis.
Shows the feedback, themes and sentiment in context. Analyzes themes with other customer variables (such as customer region, product usage). |
To summarize, there are many gotchas when implementing Generative AI for feedback analysis in-house. The biggest hurdle will be hallucinations, inconsistent and varying results. Apart from making decisions based on incorrect data, this can undermine the value of unstructured feedback in your company.
You’ve just read some of the key insights about using GPT to analyze feedback data, but we’ve just scratched the surface in this post.
We have more insights that can help you in your role. Whether you want details into how to improve the insights for CX teams or more commentary on how Thematic uses LLMS.
Generative AI describes a broad type of algorithm that can generate new forms of creative content. The AI technology arises from Large Language models.
An LLM is a powerful machine learning model that can process and identify complex relationships in natural language, understand user questions and generate text. These models rely on techniques like deep learning and neural networks. Defined as natural language-processing AI models, LLMs are trained on massive amounts of text data.
We fine-tune the output of LLMs for our clients by passing only their high quality context into the prompt. This context data originates from Thematic's own AI after guidance from a human expert, often with input from the customer’s organization.
We prioritize responsible use of Generative AI with transparency, empowering users to verify results easily. Every summary includes links to the original data, where they can review how our foundational AI reached the result and to get context. Users should check the original data for accuracy and relevancy before using the summary to influence a decision.
We use Azure’s service as it provide enterprise grade security and data protection. In addition, we review agreements and dataflow to ensure that no training/logging etc is used by the provider. We also log and monitor all uses of Generative AI at Thematic.
You can easily analyze small subsets of feedback using Generative AI. Make sure that no private data is passed onto the model and that your use is compliant with your company’s policy.
For larger datasets and tracking progress over time things become tricky. An AI specialist could design an in-house solution that sends all data to a Generative AI model in batches. In addition to compliance and data privacy, you’ll need to consider three aspects:
Join the newsletter to receive the latest updates in your inbox.