In the last two years, data seems to be everywhere – there is no one who hasn’t heard about “Big data”. However, the key is not how large your datasets are, but the quality of the information you can extract from them.
Machine Learning tools are able to scan enormous datasets to detect patterns by using algorithms and to generate a signal to make users aware about those patterns in their data.
However, generating the features that Machine Learning system use as input for their algorithms is an arduous task that requires a lot of time and ingenuity. That is why Deep Learning is becoming the norm, since this approach doesn’t requires users to manually create useful feature vectors; instead, raw data is used as the input for the algorithm, leading into a decrease in the time needed to process all the data to look for patterns.
According to the Merriam-Webster dictionary, Artificial Intelligence is “the capability of a machine to imitate intelligent human behavior”. In other words, it is the ability of machines to perform tasks as well as humans and, in some cases, even better. As we said last week, the highest IQ on Earth is between 225-230, but it is expected that machines will reach soon an IQ of 500. This superior capability can be particularly useful in data processing.
But what is the relationship between both concepts?
Deep Learning and Machine Learning are the most used methodologies to train Artificial Intelligence systems. As we mentioned above, Deep Learning algorithms can work with large amounts of data to detect patterns. By giving many examples of a problem and its solution to a system, it processes it and finds relations between the data, relations that it can apply to fresh data thus doing the task that it was aimed to do – this is what powers Artificial Intelligence.
One of the biggest surprises from last year was the latest improvement in Google Translate. Until September 2016, the technology used by Google used Phrase-Based Machine Translation (PBMT), which deals with words at the phrase context, without considering either linguistics or the larger context. The main issue with this approach is that languages are flexible and complex, and ambiguity becomes a problem if you don’t bring into consideration the context when handling long-range dependencies and reordering.
More precision requires more data and that is why Google decided to replace their former method with Neural Machine Translation (NMT) based on Deep Learning. The results, as you may have read, are quite impressive; if you are not familiar with them, let me sum them up: after teaching their AI framework to translate from English to Polish and from English to Arabic, the system is now able to translate from Polish to Arabic by itself without passing first through English. Quite impressive, isn’t it?
However almost no other company has at its disposal the same resources Google does, and changing algorithms to achieve better results is both expensive and time consuming. Google has used a traditional approach: more money + more data + more processing power = better results.
So, is it possible to improve performance, achieving better results without having virtually unlimited resources? The answer is yes.
How to achieve better results saving time and money?
At Bitext we promote a new solution, able to achieve even better results than the ones obtained by changing an algorithm. How? By enriching the raw data the algorithm will process.
In our field, linguistics, if we complement the text contained in the data set that is going to be processed with information regarding the context, syntax, the algorithm will have an enriched input that can be transformed in better output.
If you are interested to know more about it just get in touch with us!