For all of those who work daily with data, the task is becoming more arduous, not only regarding data volume but also in time to process all the information and in the computing power required to process it.
We cannot stop and look in depth at each data point; we need it to detect patterns, trends or insights because they are the real key.
Enriching the data before processing it, allows machine learning and deep learning algorithms to work with smaller datasets achieving better results in less time.
To prove our hypothesis, we want to share with you an example regarding one of the sectors that works with large amounts of data, the financial sector.
Rumors are a norm in the financial sector, and in newspapers, we can find headlines like: “X corporation is planning to acquire Z industries.”
For any reader, even if it’s not into the stock market, is clear that X is the company that wants to invest its money and Z is going to be the one that is purchased.
However, machines don’t see it that clear. Just by looking into the words we cannot see the relation between both companies. Therefore the relation between X and Z will be missing.
To provide any algorithm with the capability of understanding the connection between the two entities we must turn to linguistics and add enriched data. If we add morphological or syntactic information to our datasets, an algorithm will be able to detect relations like the one in the X and Z example.
Most machine learning software is equipped with an algorithm able to detect the link between the businesses. However, not all of them can provide the type of link as Bitext algorithm does.
We extracted a sample of 1000 from a relevant news site, focusing on acquisition news. Our goal was to find out how the stock price of both companies varied after the news publication date.
In this first picture, we can see that there is little order in our dataset stock price variation.
Then we enrich the information to see if we can structure the dataset. The enrichment was done by automatically adding syntactical information such as subject and object.
- Subject or acquirer is the one that is acting: in “X corporation is planning to buy Z industries” X is the Subject.
- Object or the acquired is the entity acted upon: "X corporation is planning to buy Z industries”. In this example Z is the Object.
Now we can filter out by the acquirer and see how the data looks like:
We can see that all the results line up, and can only see either near 0 variations or negative ones. This means that the stock price decreased in the month after the company announced that it would be on the buying side of an acquisition process.
Now let's filter by object or acquired company:
We can see that the results line up once again, now all the variations in stock price are positive ones. Meaning that the stock price increased in the month after it was announced that the company was being acquired.
These graphs and results confirm that working enriched data allows to better detect patterns making it easier for anyone, human or machine, to make sense of text information.
If you want to know how can you apply this technology to your business, get in touch with us!