In previous posts, we have outlined the crucial role of Machine Learning for Analytics (in Machine Learning & Deep Linguistic Analysis in Text Analytics), and the implications of using Machine Learning for analyzing and structuring text (in What are the limitations of Machine Learning for Text Analysis?).
Now we turn to Linguistics as a source of solutions for some of these limitations. In a following post, we will explain how Linguistics can complement Machine Learning and how it can be integrated in the same technology stack.
Recapping, the main limitation of Machine Learning for text analytics is that it is “blind” to text structure. And text structure is essential for moving towards text understanding.
This is the first benefit Linguistics provides to data sicentists. Linguistics helps X-ray the internal structure of text. As the science of language, Linguistics collects knowledge about language (grammars, ontologies, lexicons). This knowledge allows us to understand the structure of language and decompose it in different layers (morphology, syntax, semantics).
By uncovering the structure of a sentence, Linguistics helps us deal with complex phenomena accurately, especially in complex cases where we have similar wordings but entirely different meanings:
- negation: “I never enjoyed it” as opposed to “I enjoyed it like never before“
- conditionality: “I’ll buy it if they change their pricing policy“
- comparison: “ACME R3 is much better than the Samsung Galaxy“
Besides, understanding structure allows Linguistics to provide granularity. Granularity is about reading a sentence like:
“The screen is wonderful but I hate the on-screen keyboard”
and identifyings the topics being discussed (screen, on-screen keyboard) and the opinions about those topics (“is wonderful, I hate it”). Granularity is about detecting that there are two opinions about two topics within the same sentence.
Another advantage that Linguistics provides is the ability to analyze different types of text: from short and informal tweets to lengthy formal legal documents or newswires. Considering the variety of texts involved in Big Data projects, this is a critical advantage that saves significant efforts in text tagging and algorithm training.
Additionally, engines based on Linguistics allow easily for incremental and consistent improvements. Fixes can be implemented easily by adding new rules or modifying existing ones, all with predictable results. So moving from the “usual” 70% accuracy to +90% is a matter of customizing the engine.
In summary, Linguistics provides an understanding of text structure that is the base for tackling many different business applications (understanding customers, preventing churn, generating sales leads, detecting risk of loan defaults, etc.), and is likely most beneficial when integrated with machine learning techniques.
If you want to know more about how Linguistics can help Machine Learning and download our paper!