The FAIR researchers’ key technical innovation in building such long-term planning dialog agents is an idea called dialog rollouts. When chatbots can build mental models of their interlocutors and “think ahead” or anticipate directions a conversation is going to take in the future, they can choose to steer away from uninformative, confusing, or frustrating exchanges toward successful ones. Specifically, FAIR has developed dialog rollouts as a novel technique where an agent simulates a future conversation by rolling out a dialog model to the end of the conversation, so that an utterance with the maximum expected future reward can be chosen
Facebook had two ‘bots’ negotiate with each other in chat messages after being shown conversations of humans negotiating. Here’s why some media outlets have published reports on the work that are alarmist in tone in the past week or so
By GO Staff
Media outlets have been breathlessly re-reporting a weeks-old story that Facebook’s AI-trained chatbots “invented” their own language. When the report was first published, like most technical reports — it got no interest. Then someone got the bright idea of making it seem like a Terminator-like world is in the offing and suddenly everyone was either scared or thrilled.
Understandable, perhaps, but it’s exactly the wrong thing to be focusing on. The fact that Facebook’s bots “invented” a new way to communicate wasn’t even the most shocking part of the research to begin with.
A bit of background: Facebook’s AI team, Facebook Artificial Intelligence Research (FAIR), published a paper back in June, detailing their efforts to teach chatbots to negotiate like humans. Their intention was to train the bots not just to imitate human interactions, but to actually act like humans.
The FAIR researchers studied negotiation on a multi-issue bargaining task. Two agents are both shown the same collection of items (say two books, one hat, three balls) and are instructed to divide them between themselves by negotiating a split of the items.
Each agent is provided its own value function, which represents how much it cares about each type of item (say each ball is worth 3 points to agent 1). As in life, neither agent knows the other agent’s value function and must infer it from the dialogue (you say you want the ball, so you must value it highly).
FAIR researchers created many such negotiation scenarios, always ensuring that it is impossible for both agents to get the best deal simultaneously. Furthermore, walking away from the negotiation (or not agreeing on a deal after 10 rounds of dialogue) resulted in 0 points for both agents. Simply put, negotiation is essential, and good negotiation results in better performance.
In order to train negotiation agents and conduct large-scale quantitative evaluations, the FAIR team crowdsourced a collection of negotiations between pairs of people. The individuals were shown a collection of objects and a value for each, and asked to agree how to divide the objects between them. The researchers then trained a recurrent neural network to negotiate by teaching it to imitate people’s actions. At any point in a dialogue, the model tries to guess what a human would say in that situation.
Unlike previous work on goal-orientated dialogue, the models were trained “end to end” purely from the language and decisions that humans made, meaning that the approach can easily be adapted to other tasks.
To go beyond simply trying to imitate people, the FAIR researchers instead allowed the model to achieve the goals of the negotiation. To train the model to achieve its goals, the researchers had the model practice thousands of negotiations against itself, and used reinforcement learning to reward the model when it achieved a good outcome. To prevent the algorithm from developing its own language, it was simultaneously trained to produce humanlike language.
To evaluate the negotiation agents, FAIR tested them online in conversations with people. Most previous work has avoided dialogues with real people or worked in less challenging domains, because of the difficulties of learning models that can respond to the variety of language that people can say.
Interestingly, in the FAIR experiments, most people did not realize they were talking to a bot rather than another person — showing that the bots had learned to hold fluent conversations in English in this domain. The performance of FAIR’s best negotiation agent, which makes use of reinforcement learning and dialog rollouts, matched that of human negotiators. It achieved better deals about as often as worse deals, demonstrating that FAIR’s bots not only can speak English but also think intelligently about what to say.
You can read more about the experiment on Facebook’s blog post about the project, [Just search online for “Deal or no deal? Training AI bots to negotiate”] but the bottom line is that their efforts were far more successful than they anticipated. Not only did the bots learn to act like humans, actual humans were apparently unable to discern the difference between bots and humans.
The bots ended up coming up with very human like strategies for negotiating what they wanted. The bots even pretended to want one item, when they actually wanted another one, initially feigning interest in a valueless item, only to later “compromise” by conceding it — an effective negotiating tactic that humans use regularly. This behaviour was not programmed by the researchers but the neural network learned this tactic for negotiating on its own, which is fascinating. The bots also outlasted the humans at negotiations, persistently avoiding a compromise till the humans gave up.
At one point in the process though, the bots’ communication style went a little off the rails. The bots were given a set of words to use from a training data set. The bots were rewarded for getting the items, but not for doing so in English. Because of this, the bots began communicating in a nonsensical way saying things like “I can can I I everything else,” Fast Company reported in the now highly cited story detailing the unexpected outcome.
Researchers from Facebook claim that the experiment was shut down not because the Facebook bots were coming up with a new language, but because the language they were using was not something that could be used to negotiate with humans. In any case, the obsession with bots “inventing a new language” misses the most notable part of the research in the first place: that the bots, when taught to behave like humans, learned to lie — even though the researchers didn’t train them to use that negotiating tactic.
This is specially interesting because in science fiction classics, robots/AI have always been shown to have difficulties learning to deal with the concept of humans lying. Clearly, as always, truth is stranger than fiction!