AI, machine learning, chatbots… no content, no hope

As demonstrated by the example of Microsoft’s “crazy” robot on Twitter, artificial intelligence is above all a question of data. If there is no reliable and serious content, algorithms cannot produce the desired result.

Recent months on the web have seen an increased passion for subjects relating to machine learning, automatic learning algorithms or, in broader terms, artificial intelligence.

These technologies have been around for a long time, but in just a few years have expanded rapidly, becoming more and more sophisticated and offering increasingly high levels of performance.

Many players in the digital industry have chosen to invest in research dedicated to these technologies, including giants such as Google or Facebook. The research agency TechSci Research expects to see growth of 75% in the Artificial Intelligence sector for the period 2016-2021.

AI is now used in many domains, and in particular in the management of automated interactions in customer relations contexts: these are known as chatbots, or conversational robots. According to TechSci Research, a third of Americans have already used this technology in the form of this new kind of personal assistant.


Various experiments have led to the emergence of a certain number of myths around artificial intelligence, and in particular its self-learning abilities, some of which generate fear. We can of course refer to the victory for the AlphaGo machine against Lee Sedol, champion of the game Go, leading to fantasies of machines being able to surpass humans.

Yet machines are programmed by humans and are dependent on humans in order to work effectively. Another example: last spring, Microsoft performed a machine learning experiment with a chatbot which taught itself based on its interactions with Twitter users. It wasn’t exactly a success, the self-learning without a human filter leading the bot Tay to express racist sentiments.


AI is not synonymous with magic, however, but rather with data (and therefore content), mathematics, iterations and models.
While algorithms are certainly important for an effective conversation engine, it is the content which is truly essential. This content may take the form of text, images, videos or audio.

With the bot drawing its responses from a knowledge base, the quantity and quality of the data is essential in order to provide accurate and appropriate automated responses. This content may come from several sources. An incomplete knowledge base may provide a disappointing experience for the user due to incomplete or approximate answers, or even no answer at all.



Conversational robots must also be able to “disambiguate” the question asked if the structure used by the user is unknown to the robot, or too vague. In this case, it makes suggestions to the user in order to more accurately determine the question or the scenario concerned.

This disambiguation phase depends both on the relevance of the underlying engine and on the work performed on the knowledge base. Certain requests may actually have several meanings. For example, for a request such as “transfer my contract”, does the user wish to change the name of the subscriber, or are they moving house?

It is essential to avoid misinterpretations or providing answers which are not sufficiently precise. Using mechanisms based on the concept of “emergent grammar” defined by Hopper[1], or on self-learning through collective intelligence (user clicks on the same suggestions following a misunderstanding), or on deep learning algorithms, engines can make relevant suggestions to content administrators to allow them to efficiently manage the knowledge bases of their chatbot.



We mainly use vertical chatbots, which offer a detailed level of knowledge on a specific but limited subject. By way of example, a chatbot dedicated to energy consumption will draw its replies from a database with tens of thousands of different formulations of questions. If Moore’s law is to be believed, the speed of information processing is doubled every 2 years. We can easily imagine that in the future, bots will be increasingly universal and independent, to the point of adapting to the subject and tone used according to the context.

Currently, knowledge must come from structured content, or else the automated structuring must be assisted by content administrators. Providing a very precise and appropriate answer to a question is often more complicated than it appears, when the answer needs to take into account parameters such as the user’s profile, their subscription or product type, the geographical location from which the question is asked, from which device… But inevitably, the machine will soon be able to construct its knowledge base alone, based on unstructured external content. It will therefore be capable of reading, understanding and structuring this content itself.

The question then arises of how AI is supervised. Will we legislate to establish the rules for the autonomy of AI and list prohibited fields of applications? Will it require constant supervision by humans?
The answer to this question is already known. Just as there is a law for telecommunications, energy or IT, so a “robot” or “artificial intelligence” legal category will come about, as Alain Bensoussan and Jérémy Bensoussan wrote about in their book “Droit des Robots” (Robot Law).