Imagine communicating with machines by writing or speaking using our natural language. While this might have been fiction in the past, it has now become a reality with the emergence of conversational applications or chatbots. And, soon, we will have a bot for everything.
“By 2020, autonomous software agents outside of human control will participate in five percent of all economic transactions.” – Gartner
2016 saw an eruption of chatbots and conversational systems, a disruption driven by the simultaneous growth of messaging platforms, progress, and ease of access to artificial intelligence (not to mention APIs). In this article, we will address how a chatbot works, simplify the concepts surrounding it, and hopefully inspire all to build bots.
How does a Chatbot work?
Let’s first understand the difference between traditional and conversational applications. A traditional application such as a mobile app or website works in a point-and-click fashion. Its interfaces are built on blocks of elements with which users can interact via limited actions (e.g., click, type, touch, or swipe). This arrangement is extremely convenient and efficient for a computer as there are finite interaction points, often in a sequence. Developers can, therefore, write code for each finite set of interactions very quickly.
That being said, there are also some challenges presented by traditional applications. First, the user must understand the flow required to get the work done. While many flows are commonly used and seemingly simple, specific business domains might necessitate user training. Second, if additional requirements get added, then new user interface (UI) elements and interactions must be introduced.
Conversational applications, on the other hand, take the command from the user in the form of his/her natural language. The example illustrated below is a simplified version of a multi-dialogue, chatbot interaction for buying groceries.
While many may argue that chatbot interaction may be cumbersome as users have to type what they want instead of simply clicking a few times, the statistics on messaging platforms say otherwise.
“Users around the world are logging in to messaging apps to not only chat with friends but also to connect with brands, browse merchandise, and watch content. What were once simple services for exchanging messages, pictures, videos, and GIFs have evolved into expansive ecosystems with their own developers, apps, and APIs.” – Business Insider
We haven’t quite made the full switch from traditional to conversational applications. For now, while the speech recognition and natural language comprehension continues to evolve, we will see many hybrid interfaces making the best of both worlds.
How do we make an application conversational?
A conversational application’s primary aim is to translate natural language into user intent. The intent, in this context, is the command the user intends to execute. The conversational app can either be rule-based or actions-based.
A rule-based application is preprogrammed with multiple phrases against an intent (see the simple rule-based flow below). While these bots are intelligent and able to understand natural language, any conversation that goes outside the boundaries of their rules fails. Having said that, the rules are also what make these chatbots extremely accurate.
Natural language processing (NLP) is the class of artificial intelligence (AI) algorithms that enables a computer to understand human language and process commands. These actions-based, conversational applications basically convert human language into bits and bytes.
Consider the lifecycles of human beings. Children are taught by feeding them information. As they grow, their interactions with the environment continue to develop their intelligence. NLP works in a similar fashion. Initialized with a set of training data, the AI builds upon its learnings via usage and interaction.
Breaking down a Chatbot
From this point forward, we will focus on the application of natural language processing in a chatbot and the key concepts applied by current bot platforms and software developers’ kits (SDKs). The objective is to become aware of the ecosystem and quickly start building chatbots of your own.
Conversational channels are like the eyes, ears, and mouth of a chatbot. The most common conversation interfaces are currently text and voice as they allow interaction via natural language. Also, chat platforms have become popular mostly through our mobile devices and desktops, which are ideal for text interfaces. Therefore, while multiple interfaces in addition to voice and text exist, we will focus primarily on these two.
1. Text-Based Channels
These are simple chat platforms that allow you to communicate with bots via text, which their NPL algorithms can directly consume. These interfaces can be completely custom-built as mobile, desktop, or web applications, or they can be integrated with existing messaging platforms. A few key examples of text-based messaging platforms include Whatsapp, Slack, Tropo, Line, KIK, and Facebook Messenger. These chat platforms provide web-based API hooks for transmitting the text received via their chat interfaces to a chatbot service. If the interface is the platform, then the chatbot can be developed and exposed as an API.
2. Voice-Based Channels
Voice or speech interfaces like the Amazon Echo allow users to converse with bots by simply speaking. Since the chatbot only understands communication in text format, an additional interface is needed to convert speech into text (and text back into speech when the chatbot responds). A few notable platforms/services that provide speech-to-text and text-to-speech services include Google’s Speech API, Amazon’s Voice Service, IBM’s Watson Speech API, Microsoft’s Azure Speech API, and API.AI.
The Chatbot Core
Let’s try to see and dissect a chatbot’s inner workings. How does it understand language, intelligently process commands, and respond as natural (read: human) as possible? Every time a user tries to communicate with the chatbot, he/she has the “intent” of asking a question or giving a command. Natural language processing (and the algorithms supporting it) is responsible for figuring out that intent based on the inputs the chatbot receives.
General English Language
Chatbots can be taught general, spoken English or any other language by giving it predefined learning data. For example, hello, greetings, or hi are understood as an intent of “salutation.”
Domain Specific Language
Unlike generic vocabulary, vocabulary specific to a business can be interpreted differently. Let’s take “I am planning to travel to New York” as an example. The phrase can be interpreted by an airline service as the intent to book a “flight,” while a hotel would interpret it as intending to book a “hotel room.”
Ideally, we would want a chatbot to be very open-ended and have conversations with much wider contexts. Since these kinds of open domain bots are quite complex, most of the bots today are dedicated to specific businesses or domains.
Let’s assume that the chatbot is focused on the “airline domain” and is connected to the business API of the airline’s booking system online. Based on the “travel” verb, the chatbot understands that the user intends to travel and knows that it needs to call the API “searchTravelOptions” before it can book any flights.
To make a successful API call, the chatbot also needs to identify the parameters required to complete the operation. NLP applies a concept of named entity recognition, which enables the bot to associate the parameters with known information like place, time, date, etc. For example, “New York” can be associated with either “Destination” or “Source” based on the named entity training data with which the chatbot is pre-loaded. Similarly, if the user had given a date, then it could be associated with either “Travel Date” or “Return Date.” To evaluate these possibilities, the chatbot uses prepositions such as from or to to accurately identify an entity. For example, “from location” signifies the source, while “to location” signifies the destination.
In the aforementioned example, not all entities are provided to the bot in a single sentence. The bot platforms are, therefore, equipped to construct “dialogues” or series of conversations in order to complete a process. As you can see below, the chatbot continues to have a dialogue with the user until all the information necessary for a “searchTravelOptions” API call is gathered.
In the example conversation above, the chatbot is aware of the user’s current location based on his mobile’s GPS and assumes the source. The chatbot can use the information gathered from mobile devices to determine physical context (like location, speed, etc.) and it can also use saved user data (such as class preference) to determine domain context. Context awareness and the ability to derive entity information makes a chatbot more aware and human.
Unsupervised & Supervised Learning
The identification of intent and entities enables the chatbot to know which API to call, what data to fetch, and which parameters to pass. The pre-loading and classification of Common Vocabulary, Domain Specific Vocabulary, Named Entitles, and Domain Specific Entities can, therefore, be deemed the chatbot’s unsupervised, “learning” processes.
However, there will be many instances when the chatbot will not be able to accurately translate a user’s phrase into intent. For this, all bot platforms allow developers to review missed translations and manually label these phrases with their appropriate intents. With this process, the chatbots learn from their mistakes or “lack of knowledge” in a supervised environment. In the example below, the bot cannot associate “whazzup” to any intent and has asked the developer to associate it to the appropriate one.
How does a Chatbot respond?
Now that we have discussed how a chatbot processes natural language, let’s discuss how chatbots respond to questions or commands in a manner as similar to natural, human responses as possible. There are multiple algorithms and models that allow a chatbot to determine its responses, but we will touch on only one approach: the retrieval-based model.
A predominant approach due to easy implementation, the retrieval-based model involves a predefined response to a command or question. The response can be static or selected from a predefined set of commands based on rules or persona information (that of the user interacting with the chatbot). While this approach may seem smarter, the truth is that responses are limited to a finite set of vocabulary.
So, how can we make chatbots more perceptive? The more context a chatbot has, the more intelligent it can become. The chatbot can begin to select responses based on the user’s mood, physical, or linguistic context. Services like IBM’s Watson™ Tone Analyzer and Personality Insights can be used to gather this user context, and change the style or flow of the dialogue accordingly.
Let’s build Chatbots!
There is an enormous list of available chatbot ecosystems and platforms, along with many tutorials that can help those looking to build chatbots. These platforms are very simple and easy to use, and do not require vast amounts of artificial intelligence knowledge.
The chatbots and machine intelligence space is expanding at a rapid pace, and has to be taken seriously by organizations and individuals alike. In the near future, AI and machine learning will shift from being the domain of a closed community to touching every sphere of our lives. Being aware of this space will be as important as knowing how to operate a smartphone.
By Siddhartha Lahiri, Senior Manager, SapientRazorfish