In previous posts, we have outlined the crucial role of Machine Learning for Analytics (in How to Make Machine Learning more Effective using Linguistic Analysis?), and the implications of using Machine Learning for analyzing and structuring text (in How Phrase Structure helps Machine Learning?). In a following post, we will explain how Linguistics can complement Machine Learning and how it can be integrated in the same technology stack.
This post dives into one of the topics of a previous post "How to Make Machine Learning more effective using Linguistic Analysis". We referred to the strong points of Machine Learning technology for insight extraction. We also stated that text analysis is not the area where machine learning shines the most. Here we go into some detail on this last statement.
Text analysis is becoming a pervasive task in many business areas. Machine Learning is the most common approach used in text analysis, and is based on statistical and mathematical models.
Stemming and lemmatization are methods used by search engines and chatbots to analyze the meaning behind a word. Stemming uses the stem of the word, while lemmatization uses the context in which the word is being used. We'll later go into more detailed explanations and examples.
In this blog we will discuss three ways of doing your chatbot evaluation by using:
- real world evaluation data
- synthetic data
- "in scope" or "out of scope" queries
One of the flaws of usual training data generation is that, when you ask somebody to manually create training data for you, they will make an effort to write these sentences correctly, following the spelling and punctuation norms of your language. Even if some errors appear, they will be minimal, because they are trying to do things right —this is, to provide “orthographically right” sentences.