DeepMind’s AlphaStar Beats Humans 10-0 (or 1)

Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. I think this is one of the more important things that happened in AI research lately. In the last few years, we have seen DeepMind defeat the best Go players in the world, and after OpenAI’s venture in the game of DOTA2, it’s time for DeepMind to shine again as they take on Starcraft 2, a real-time strategy game. The depth and the amount of skill required to play this game is simply astounding. The search space of Starcraft 2 is so vast that it exceeds both Chess, and even Go by a significant margin. Also, it is a game that requires a great deal of mechanical skill, split-second decision making and we have imperfect information as we only see what our units can see.

A nightmare situation for any AI. DeepMind invited a beloved pro player, TLO to play a few games against their new StarCraft 2 AI that goes by the name AlphaStar. Note that TLO a profesional player who is easily in top 1% of players, or even better. Mid-grandmaster for those who play StarCraft 2. This video is about what happened during this event, and later, I will make another video that describes the algorithm that was used to create this AI. The paper is still under review, so it will take a little time until I can get my hands on it. At the end of this video, you will also see the inner workings of this AI. Let’s dive in. This is an AI that looked at a few games played by human players, and after that initial step, it learns by playing against itself for about 200 years.

In our next episode, you will see how this is even possible, so I hope you are subscribed to the series. You see here that the AI controls the blue units, and TLO, the human player plays red. Right at the start of the first game, the AI did something interesting. In fact, what is interesting is what it didn’t do.

It started to create new buildings next to its nexus, instead of building a walloff that you can see here. Using a walloff is considered standard practice in most games, and the AI used these buildings to not wall off the entrance, but to shield away the workers from possible attacks. Now note that this is not unheard of, but this is also not a strategy that is widely played today and is considered non-standard. It also built more worker units than what is universally accepted as standard, we found out later that this was partly done in anticipation of losing a few of them early on.

Very cool. Then, almost before we even knew what happened, it won the first game a little more than 7 minutes in, which is very quick, noting that in-game time is a little faster than real-time. The thought process of TLO at this point is that that’s interesting, but okay, well, the AI plays aggressively and managed to pull this one off. No big deal. We will fire up the second game, in the meantime, few interesting details. The goal of setting up the details of this algorithm was that the number of actions performed by the AI roughly matches a human player, and hopefully it still plays as well, or better. It has to make meaningful strategic decisions. You see here that this checks out for the average actions every minute, but if you look here, you see around the tail end here that there are times when it performs more actions than humans and this may enable playstyles that are not accessible for human players. However, note that many times it also does miraculous things with very few actions.

Now, what about an other important detail, reaction time? The reaction time of the AI is set to 350ms, which is quite slow. That’s excellent news because this is usually a common angle of criticism for game AIs. The AI also sees the whole map at once, but it is not given more information than what its units can see. This perhaps is the most commonly misunderstood detail, so it is worth noting. So, in other words, it sees exactly what a human would see if the human would move the camera around very quickly, but, it doesn’t have to move the camera, which adds additional actions and cognitive load to the human, so one might say that the AI has an edge here. The AI plays these games independently, what’s more, each game was played by a different AI, which also means that they do not memorize what happened in the last game like a human would. Early in the next game, we can see the utility of the walloff in action which is able to completely prevent the AIs early attack.

Later that game, the AI used disruptors, a unit, which if controlled with such level of expertise, can decimate the army of the opponent with area damage by killing multiple units at once. It has done an outstanding job picking away at the army of TLO. Then, after getting a significant advantage, AlphaStar loses it with a few sloppy plays and by deciding to engage aggressively while standing in tight choke points. You can see that this is not such a great idea. This was quite surprising as this is considered to be StarCraft 101 knowledge right there. During the remainder of the match, the commentators mentioned that they play and watch matches all the time and the AI came up with an army composition that they have never seen during a professional match. And, the AI won this one too. After this game it became clear that these agents can play any style in the game. Which is terrifying. Here you can see an alternative visualization that shows a little more of the inner workings of the neural network. We can see what information it gets from the game, a visualization of neurons that get activated within the network, what locations and units are considered for the next actions, and whether the AI predicts itself as the winner or loser of the game.

If you look carefully, you will also see the moment when the agent becomes certain that it will win this game. I could look at this all day long, and if you feel the same way, make sure to visit the video description, I have a link to the source video for you. The final result against TLO was 5 to 0, so that’s something, and he mentioned that the AlphaStar played very much like a human does and almost always managed to outmaneuver him. However, TLO also mentioned that he is confident that upon playing more training matches against these agents, he would be able to defeat the AI. I hope he will be given a chance to do that.

This AI seems strong, but still beatable. I would also note that many of you would probably expect the later versions of AlphaStar to be way better than this one. The good news is that the story continues and we’ll see whether that’s true! So at this point, the DeepMind scientists said that “maybe we could try to be a bit more ambitious”, and asked “can you bring us someone better”? And in the meantime, pressed that training button on the AI again. In comes MaNa, a top tier pro player. One of the best Protoss players in the world. This was a nerve-wracking moment for DeepMind scientists as well, because their agents played against each other, so they only knew the AI’s winrate against a different AI, but they didn’t know how they would compete against a top pro player.

It may still have holes in its strategy. Who knows what would happen? Understandably, they had very little confidence in winning this one. What they didn’t expect is that this new AI was not slightly improved, or somewhat improved. No, no, no. This new AI was next level. This set of improved agents among many other skills, had incredibly crisp micromanagement of each individual unit. In the first game, we’ve seen it pulling back injured units but still letting them attack from afar masterfully, leading to an early win for the AI against Mana in the first game. He and the commentators were equally shocked by how well the agent played. And I will add that I remember from watching many games from a now inactive player by the name MarineKing a few years ago. And I vividly remember that he played some of his games so well, the commentators said that there’s no better way to put it, he played like a god. I am almost afraid to say that this micromanagement was even more crisp than that. This AI plays phenomenal games.

In later matches, the AI did things that seemed like blunders, like attacking on ramps and standing in choke points, or using unfavorable unit compositions and refusing to change it and, get this, it still won all of those games 5 to 0. Against a top pro player. Let that sink in. The competition was closed by a match where the AI was asked to also do the camera management. The agent was still very competent, but somewhat weaker and as a result, lost this game, hence the “0 or 1” part in the title. My impression is that it was asked to do something that it was not designed for, and expect a future version to be able to handle this use case as well. I will also commend Mana for his solid game plan for this game, and also, huge respect for DeepMind for their sportsmanship. Interestingly, in this match, Mana also started using a worker oversaturation strategy that I mentioned earlier.

This he learned from AlphaStar and used it in his winning game. Isn’t that amazing? DeepMind also offered a reddit AMA where anyone could ask them questions to make sure to clear up any confusion, for instance, the actions per minute part has been addressed, I’ve included a link to that for you in the description. To go from a turn-based perfect information game, like Go, to a real time strategy game of imperfect information in about a year sounds like science fiction to me.

And yet, here it is. Also, note that DeepMind’s goal is not to create a godlike StarCraft 2 AI. They want to solve intelligence, not StarCraft 2, and they used the game as a vehicle to demonstrate its long-term decision making capabilities against human players. One more important thing to emphasize is that the building blocks of AlphaStar are meant to be reasonably general AI algorithms, which means that parts of this AI can be reused for other things, for instance, Demis Hassabis mentioned weather prediction and climate modeling as examples. If you take only one thought from this video, let it be this one. I urge you to watch all the matches because what you are witnessing may very well be history in the making. I put a link to the whole event in the video description, plus plenty more materials, including other people’s analysis, Mana’s personal experience of the event, his breakdown of his games and what was going through his head during the event.

I highly recommend checking out his 5th game, but really, go through them all, it’s a ton of fun! I made sure to include a more skeptical analysis of the game as well to give you a balanced portfolio of insights. Also, huge respect for DeepMind and the players who practiced their chops for many many years and have played really well under immense pressure. Thank you all for this delightful event. It really made my day. And the ultimate question is, how long did it take to train these agents? 2 weeks. Wow. And what’s more, after the training step, the AI can be deployed on an inexpensive consumer desktop machine. And this is only the first version. This is just a taste, and it would be hard to overstate how big of a milestone this is.

And now, scientists at DeepMind have sufficient data to calculate the amount of resources they need to spend to train the next, even more improved agents. I am confident that they will also take into consideration the feedback from the StarCraft community when creating this next version. What a time to be alive! What do you think about all this? Any predictions? Is this harder than DOTA2? Let me know in the comments section below. And remember, we humans build up new strategies by learning from each other, and of course, the AI, as you have seen here, doesn’t care about any of that. It doesn’t need intuition and can come up with unusual strategies. The difference now is that these strategies work against some of the best human players. Now it’s time for us to finally start learning from an AI. gg. Thanks for watching and for your generous support, and I’ll see you next time!

Add Comment