By now it looks like artificial intelligence is the inevitable in the sense that it’s already doing many of the things we do and in fact is replacing us when it comes to complex tasks like diagnosing and curing diseases. The thing is it’s here to stay and there’s little we can do about it if we will be able to deal with the complex tasks of the future and the other fact is that they are here to steal out jobs and majority of us will have little say over this. But seeing as they are here to stay, what happens in a complex system where artificial intelligence (AI) agents have to interact with each other to deliver on tasks?
That’s what Google’s DeepMind (Google’s AI arm based in London) aims to answer in a new study they published earlier in the day. The studied how AI systems acted in a series of what they call social dilemmas which is another term for situations where players can actually thrive individually by being selfish. In a blog post by the team at DeepMind, they explain this further using the examples of the prisoner’s dilemma which is a scenario in which two people can decide to betray each other but will lose out if they both decide to go down that route.
Two suspects are arrested and put into solitary confinement. Without confessions the police do not have sufficient evidence to convict the two suspects on the main charge, but have good prospects to achieve one year prison sentences for both. In order to entice the prisoners to confess, they offer them simultaneously the following deal: If you testify against the other prisoner (“defect”) you will be released, but the other prisoner will serve three years in prison. If both prisoners confess (“defect”), they will both serve two years.
Using basic video games to depict their finding of the social dilemma, they came up with two situations namely;
Gathering which they depicted by having two players having to collect from a pile and by having them both do this, they could decide to co-operate or just take the other player out leaving just one player to carry out the task of collecting all of the apples.
They repeated this process thousands of time and the found out that the agents were able to behave rationally using deep multi-agent reinforcement learning. This means that they co-operated with each other in packing the apples but as the number of apples reduced, one agent tags the other to afford only themselves the opportunity to pack the scarce apples. It then means that the gathering process corresponds with the attributes of the Prisoner’s Dilemma.
There’s a second one and that’s Wolfpack. This ones allows two agents in the game to hunt for another agent and it doesn’t matter which of the two agents captures the third as points are equally shared by all agents near the hunted.
But from their findings, AI systems are able to change their behaviour based in the task at hand. You see even in the Gathering scenario, when an agent with more computational power was introduced, it tagged out the other agent because it felt it had the power needed to gather the apples without being slowed down by the other agent. This is done only after it has ascertained that the other AI system would eventually slow down its task of gathering. This doesn’t necessarily mean that AI systems are antagonistic towards each other but that they make decision based on computational power needed to complete tasks at least in the first scenario of the Gathering game.
The second Wolfpack gaming scenario means even more intelligent AI agents are capable of working with less intelligent ones to accomplish a task. But this scenario requires more computational power eventually.
This then means that the future of AI with respect to tasks will only depend on the rules of the game. The research aimed to better understand and control complex multi-agent systems such as the economy, traffic systems, or the ecological health of our planet – all of which depend on our continued cooperation.