When we are running a search, we want to find relevant results not only for the exact expression we typed on the search bar, but also for the other possible forms of the words we used. For example, it’s very likely we will want to see results containing the form “skirt” if we have typed “skirts” in the search bar.
One of the flaws of usual training data generation is that, when you ask somebody to manually create training data for you, they will make an effort to write these sentences correctly, following the spelling and punctuation norms of your language. Even if some errors appear, they will be minimal, because they are trying to do things right —this is, to provide “orthographically right” sentences.
All Machine Learning (ML) engines that work with text can benefit from a solid linguistic background. If they are working in a multilingual environment, the need of a good lexicon (with forms, lemmas and attributes) is overwhelming. Even so, basic features such as Word Embeddings hugely improve when enriched with linguistic knowledge, and if this is not usually applied, is because of a lack of linguists working for ML companies.