If you are building your conversational bot at Amazon Lex’s platform, you are about to hear some good news. You may have noticed training data is scarce and unless you are a really big company you won’t have the necessary dataset of sentences to train the chatbot quickly enough. Even if you are, you still have to spend time and money in hand-tagging those sentences into the right intents and entities. This is a common problem chatbot builders are running into.
On the spirit of facing problems with a different approach so we obtain a different result, here at Bitext we suggest you give a chance to the solution of variant generation for data training. The data you feed your bot for its training is key, as all aspects of its behavior will depend on this part of the building process.
We have developed a software capable of generating a collection of sentences that are useful to train any chatbot. As you can imagine, this reduces the training time dramatically and increases the understanding accuracy.
So how does it work?
Provided the intent you need to cover (e.g. “my chatbot has to be able to understand when somebody asks it to interact with any electrical appliance in the kitchen”, “my virtual assistant has to be able to open and close mechanical objects of a car when asked for it”), the Bitext Sentence Variation Service provides all the possible variants that make sense.
Note that clarification about sense is key: you want to put certain limits, otherwise the dataset will end up full of nonsense material. Those limits can come under different forms:
- Discarding absurd queries, as “make a reservation for 2 at New York Burger, for a year, 5 months and 17 days from now” which aren’t very likely to be said by your customers.
- Classification of entities: you can preheat the oven, but you don’t preheat the fridge; you may want to set the oven or the fridge to certain temperatures, but won’t use the same numbers for the degrees.
- Rules of inflection: to generate the variations of the simple sentence “open the trunk” with structures like “What about…?” or “I want… to be…” you must inflect the verb: “What about opening the trunk?”, “I want the trunk to be opened”. You guessed it, this is why you need linguistics.
Ok, but… how is this possible? Isn’t language recursive? Doesn’t that make the number of possible sentences infinite?
In theory, yes. But if we keep considering possibilities of language infinite we won’t go anywhere. We follow a practical approach (typical of engineers and makers): divide and conquer.
This is actually the strength of our solution, because it allows a high adaptability to any domain (booking, automobile, home automation, e-commerce, etc.). We aren’t stating we can generate all the possible sentences a person might use to interact with your bot – that’s clearly impossible. What we are doing is covering, for a certain domain, the sentences that are expected to be more frequently used by your customers. After all, nobody works with theories; when we deal with the real world, we have to break it down into small pieces so we can resolve one piece at a time.
Tell me it isn’t going to be very painful to integrate the sentence variation with the Lex platform…
It won’t be, promise. Once you are given the sentences dataset ready for training, you can then use the Amazon SDK to easily build the chatbot through the AWS API. Just fit them into a list assigning it to the variable sampleUtterances when using the “put_intent” method, and you are ready to go!
"add the headphones to the cart",
"can you add the headphones to the cart?",
"I need you to add the headphones to the cart",
"I’d like to add the headphones to the cart",
"what about adding the headphones to the cart?",
"I want the headphones to be added to the cart",
"I’d like the headphones to be added to the cart",
"it would be great if you added the headphones to the cart",
"it would be great if you could add the headphones to the cart"
Discover how this solution works by scheduling a demo now!