For all of those who work daily with data the task is becoming more arduous, not only in terms of data volume but also in time to process all the information and in the computing power required to process it.
We cannot stop and look in depth into each data point, we need it to detect patterns, trends or insights because they are the real key.
Enriching the data before processing it, allows machine learning and deep learning algorithms to work with smaller datasets achieving better results in less time.
To prove our hypothesis, we want to share with you an example regarding one of the sectors that works with large amounts of data, the financial sector.
Rumors are a norm in the financial sector, and in newspapers we can find headlines like: “X corporation is planning to acquire Z industries”.
For any reader, even if it’s not into the stock market, is clear that X is the company that wants to invest its money and Z is going to be the one that is purchased.
However, machines don’t see it that clear. Just by looking in to the words we cannot see the relation between the both companies, therefore the relation between X and Z will be missing.
To provide any algorithm with the capability of understanding the connection between the two entities we must turn to linguistics and add enriched data. If we add morphological or syntactic information to our datasets an algorithm will be able to detect relations like the one in the X and Z example.
Most machine learning softwares are equiped with an algorithm able to detect the link between the businesses, however not all of them are able to provide the type of link as Bitext algorithm is.
We extracted a sample of 1000 from a relevant news site, focusing on acquisition news. Our goal was to find out how the stock price of both companies varied after the news publication date
In this first picture, we can see that there is little order in our dataset stock price variation.
Then we enrich the information to see if we can structure the dataset. The enrichment was done by automatically adding syntactical information such as subject and object.,
-Subject or acquirer is the one that is performing the action: “X corporation is planning to buy Z industries” X is the Subject.
-Object or the acquired is the entity acted upon: "X corporation is planning to buy Z industries” in this example Z is the Object.
In our “X corporation is planning to buy Z industries” example X is the Subject and Z is the Object.
Now we can filter out by acquirer and see how the data looks like:
We can see that all the results line up, and can only see either near 0 variations or negative one. This means that the stock price decreased in the month after company announced that it will in the buying side of an acquisition process.
Now lets filter by object or acquired company
We can see that the results line up once again, now all the variations in stock price are positive ones. Meaning that the stock price increased in the month after it was announced that the company was being acquired.
These graphs and results confirm that working enriched data allows to better detect patters making it easier to anyone, human or machine to makes sense of text information.
If you want to know how can you apply this technology to your business get in touch with us!