Last week’s post explored the differences between polarity and topic-based sentiment analysis (have a look here if you missed it). This post goes a step further, drilling down into a real data set to demonstrate the true value of topic-based sentiment analysis.
The data set analyzed consists of 8,400 TripAdvisor reviews of a high-end hotel in Hawaii that belongs a popular hotel chain. Within the 8,400 reviews are 57,000 opinions with sentiment. Each review may contain multiple opinions, which is why there are many more opinions than there are reviews.
Have a look at this example:
One of the best hotels on the island. The food is as good as it gets.
One of the best hotels on the island.
- Sentiment topic: hotels
- Sentiment expression: best
The food is as good as it gets.
- Sentiment topic: food
- Sentiment expression: good
This particular data set contains an average of nearly 7 opinions per review.
Since we are trying to imitate a real use case, we want to leverage the metadata attached to the reviews. For the sake of this example, we will filter by date of the review, but this kind of analysis can be enriched by any kind of reviewer information, such as gender, location, age, Net Promoter Score, etc.
We will first filter by month and polarity. For this example we will search for negative polarity during the month of August. The objective is to discover pain points during one of the busiest months of the year, so that problem areas can be identified and improvement plans implemented.
After filtering by month and polarity we are left with about 1,300 sentences that must be read if we want to extract insights from them. Our segmentation has decreased our work somewhat, but we are left with the same challenge: manual reading and understanding of volumes of text-based data.
And this is where the power of topic-based sentiment analysis becomes very clear.
When sentiment analyses are automatically connected to specific topics, then we can see exactly to what each sentiment refers – and this significantly decreases our workload.
Again, we will filter by August and negative polarity.
The sentiments are broken down into topics, so we get clear and immediate visibility of exactly what customers are talking about – without reading a single comment! We can pinpoint if the negative review is related to the hotel staff, the prices, the restaurant or the overall experience.
Less than 20 opinions are assigned to each topic, which means our volumes of unstructured text have been quickly transformed into a very clear picture of what needs to be done. Decisions going forward are data-driven, and we can refer to individual comments if there is ever a need to back up our decisions to management or stakeholders.