Christmas advertisements and commercials are all around us at this time of the year, and at some point we just get tired of them and stop paying attention. However, this year one TV commercial from the Polish brand Allegro: "English for beginners" has fascinated our linguistic souls till the point all the office can’t stop watching it.
You may be wondering how this add is related to Natural Language Processing, Machine Learning, Artificial Intelligence or Chat Bots, and at a first glance the connection does not appear at all.
But the truth is, that this add has a lot say about the learning process, a key stage for the success of every machine learning system
The general belief and the accepted idea is that the training process usually is long and complex, and this becomes an issue in fast moving field like technology.
For some systems like chat bots the learning process is mainly introducing him tones of words and interacting with users to learn from these experiences. But, how can the conversational bot answer human requests if it is not already full developed to do so?
We can appreciate two clear problems in this traditional learning method:
-As we can see in the advertisement, and this is the key, there is no need to know all the words of a language to start speaking and be understood, we just need to know the basic words. The important ones to deliver our message.
A language like English or French has around 100.000 words on their dictionaries, but most of them are not used, an illustrated native speaker can use at most 11.000, so why do we teach the system so many words that it is never going to use?
As the old man shows, we should start teaching the words the bot is going to use, basic words like verbs or pronouns and then the particular vocabulary regarding the industry where it is going to be used. The problem then, is to know which are the words the bot is going to need. There is a lack of training materials and this slows down the training process.
Open sources solutions usually train bots using formal text, but here is where Bitext method is innovative: we use also informal texts that contain slang and frequent expressions customers use daily to express themselves and that are not contained in traditional training methods.
-The second problem is an accuracy one. Some words may have a different meaning depending on the context and not being able to know which is the right one may affect the interaction with the users. With the traditional learning process is it not possible to learn how to know which meaning is the appropriate one, but thanks to our automatic context detection we can train the bot to navigate ambiguity.
If you are interested in knowing more about this innovative training method we will be glad to talk to you!