1200字范文 > 第22天：如何使用OpenAI Gym和Universe构建AI游戏机器人

第22天：如何使用OpenAI Gym和Universe构建AI游戏机器人

时间：2024-06-10 23:39:26

by Harini Janakiraman

通过哈里尼·贾纳基拉曼

第22天：如何使用OpenAI Gym和Universe构建AI游戏机器人 (Day 22: How to build an AI Game Bot using OpenAI Gym and Universe)

Let’s face it, AI is everywhere. A face-off battle is unfolding between Elon Musk and Mark Zuckerberg on the future of AI. There are some that demonize it. And some whose utopian views claim that AI could almost be God-like in helping humanity. Whichever side your views tilt, AI is here to stay.

面对现实，人工智能无处不在。埃隆·马斯克(Elon Musk)和马克·扎克伯格(Mark Zuckerberg)之间就AI的未来展开的对抗之战正在展开。有一些妖魔化了它。还有一些人以乌托邦的观点声称，人工智能在帮助人类方面几乎可以像上帝一样。无论您的观点偏向哪一侧，人工智能都将继续存在。

“With artificial intelligence, we are summoning the demon.” — Elon Musk

“借助人工智能，我们正在召唤恶魔。” —伊隆·马斯克(Elon Musk)

“Fearing a rise of killer robots is like worrying about overpopulation on Mars.” — Andrew Ng

“害怕杀手机器人的崛起就像担心火星上的人口过多。” －吴彦祖

If you’re excited to dive right in and tinker with AI, then games are a great place to start. They have been the go-to testbed for AI. But before jumping in, here’s a little bit of history on how game programming has evolved through time.

如果您很高兴直接潜入并尝试AI，那么游戏就是一个不错的起点。它们已成为AI的首选测试平台。但是在进入之前，这里有一些关于游戏编程如何随着时间演变的历史。

游戏编程的历史 (The History of Game Programming)

Game programmers used to use heuristic if-then-else type decisions to make educated guesses. We saw this in the earliest arcade videos games such as Pong and PacMan. This trend was the norm for a very long time. But game developers can only predict so many scenarios and edge cases so your bot doesn’t run in circles!

游戏程序员过去常常使用启发式的if-then-else类型决策来进行有根据的猜测。我们在最早的街机视频游戏(例如Pong和PacMan)中看到了这一点。长期以来，这种趋势一直是常态。但是游戏开发人员只能预测这么多的场景和极端情况，因此您的机器人不会运转！

Game developers then tried to mimic how humans would play a game, and modeled human intelligence in a game bot.

然后，游戏开发人员试图模仿人类如何玩游戏，并在游戏机器人中模拟人类智能。

The team at DeepMind did this by generalizing and modeling intelligence to solve any Atari game thrown at it. The game bot used deep learning neural networks that would have no game-specific knowledge. They beat the game based on the pixels they saw on screen and their knowledge of the game controls. However, parts of DeepMind are still not open-sourced as Google uses it to beat competition.

DeepMind团队通过对情报进行归纳和建模来解决投掷给它的任何Atari游戏来做到这一点。该游戏机器人使用了深度学习神经网络，而该神经网络没有特定于游戏的知识。他们根据在屏幕上看到的像素和对游戏控件的了解来打败游戏。但是，由于Google使用DeepMind击败竞争对手，因此DeepMind的某些部分仍未开源。

人工智能的民主化 (The Democratization of AI)

To avoid concentrating the incredible power of AI in the hands of a few, Elon Musk founded OpenAI. It seeks to democratize AI by making it accessible to all. Today we shall explore OpenAI Gym and the recently released Universe, which is built on top of Gym.

为了避免将AI的强大功能集中在少数人的手中，Elon Musk创立了OpenAI 。它试图通过使所有人都能使用来使人工智能民主化。今天，我们将探索OpenAI Gym和最近发布的基于Gym的Universe。

OpenAI Gym provides a simple interface for interacting with and managing any arbitrary dynamic environment. OpenAI Universe is a platform that lets you build a bot and test it out.

OpenAI Gym提供了一个简单的界面，用于与任意动态环境进行交互和管理。 OpenAI Universe是一个平台，可让您构建一个机器人并对其进行测试。

There are thousands of environments. They range from classic Atari games, Minecraft, and Grand Theft Auto, to protein fold simulations that can cure cancer. You can create a bot and run it in any environment using only a few lines of Python code. This is too awesome not to try!

有成千上万的环境。它们的范围从经典的Atari游戏，Minecraft和Grand Theft Auto到可以治愈癌症的蛋白质折叠模拟。您可以创建bot并仅使用几行Python代码在任何环境中运行它。这太棒了，不要尝试！

专案(1小时) (Project (1 Hour))

We are going to build an AI Game Bot that uses the “Reinforcement Learning” technique. I’ll explain that later. It will autonomously play against and beat the Atari game Neon Race Car (you can select any game you want). We will build this game bot using OpenAI’s Gym and Universe libraries.

我们将构建一个使用“强化学习”技术的AI游戏机器人。稍后再解释。它将自动与Atari游戏Neon Race Car对抗并击败Atari游戏Neon Race Car(您可以选择任何游戏)。我们将使用OpenAI的Gym和Universe库构建此游戏机器人。

步骤1：安装 (Step 1: Installation)

Ensure you have Python installed, or install it using Homebrew. You can download a dedicated Python IDE like PyCharm or iPython notebook. I like to keep it simple and use Sublime. Finally, install Gym, Universe and other required libraries using pip.

确保已安装Python，或使用Homebrew安装它。您可以下载专用的Python IDE，例如PyCharm或iPython Notebook。我喜欢保持简单并使用Sublime。最后，使用pip安装Gym，Universe和其他必需的库。

// Install python using brewbrew install python3// Install the required OpenAI librariespip3 install gympip3 install numpy incrementalbrew install golang libjpeg-turbo pip install universe

Everything in Universe (the environments) runs as containers inside Docker. In case you don’t have it already, install and run Docker from here.

Universe(环境)中的所有内容都作为Docker内部的容器运行。如果您还没有它，请从这里安装并运行Docker。

第2步：编写游戏机器人代码 (Step 2: Code the Game Bot)

The Game Bot is coded in Python, so we start by importing the only two dependencies needed: Gym and Universe.

Game Bot是用Python编码的，因此我们首先导入所需的两个依赖项：Gym和Universe。

import gymimport universe

For this Game Bot, let’s use my favorite childhood game, Neon Race Cars, as the test environment. You can find a complete list of other environment/games you can choose from here.

对于这个游戏机器人，让我们使用我最喜欢的童年游戏Neon Race Cars作为测试环境。您可以在此处找到其他环境/游戏的完整列表。

Universe lets you run as many environments as you want in parallel. But for this project, we will use only one.

Universe使您可以并行运行任意多个环境。但是对于这个项目，我们将只使用一个。

env = gym.make(‘flashgames.NeonRace-v0’)env.configure(remotes=1) # creates a local docker container

强化学习(Reinforcement Learning)

Now we add the game bot logic that uses the reinforcement learning technique. This technique observes the game’s previous state and reward (such as the pixels seen on the screen or the game score). It then comes up with an action to perform on the environment.

现在，我们添加了使用强化学习技术的游戏机器人逻辑。此技术观察游戏的先前状态和奖励(例如，屏幕上看到的像素或游戏得分)。然后提出要在环境上执行的操作。

The goal is to make its next observation better (in our case — to maximize the game score). This action is chosen and performed by an agent (Game Bot) with the intention of maximizing the score. It’s then applied on the environment. The environment records the resulting state and reward based on whether the action was beneficial or not (did it win the game?).

我们的目标是使下一次观察更好(在我们的案例中-最大化游戏得分)。此动作是由代理商(游戏机器人)选择并执行的，目的是使得分最大化。然后将其应用于环境。环境根据操作是否有益(它是否赢得了游戏？)记录结果状态和奖励。

Now we can retrieve the list of observations for each environment initialized using the env.reset() method.

现在，我们可以检索使用env.reset()方法初始化的每个环境的观察结果列表。

observation_n = env.reset()

The observation here is an environment-specific object. It represents what was observed, such as the raw pixel data on the screen or the game status/score.

这里的观察是特定于环境的对象。它代表观察到的内容，例如屏幕上的原始像素数据或游戏状态/得分。

The next step is to create a game agent using an infinite loop, which continuously performs some action based on the observation. In our bot, let’s define a single action of repeatedly pressing the up arrow (Silly bot! Feel free to evolve it to a complex one…). Action here is defined by the event type (KeyEvent), the control key (Up Arrow), and setting it to true for all observation that the agent sees.

下一步是使用无限循环创建游戏代理，该循环根据观察结果连续执行某些动作。在我们的机器人中，让我们定义一个反复按下向上箭头的动作(Silly机器人！随意将其演变为一个复杂的……)。此处的操作由事件类型(KeyEvent)，控制键(向上箭头)定义，并针对代理看到的所有观察值将其设置为true。

while True:action_n = [[('KeyEvent', 'ArrowUp', True)] for ob in observation_n]

We then use theenv.step()method to use the action to move forward one time step. This is a very basic implementation of reinforced learning.

然后，我们使用env.step()方法来使用该动作向前移动一个时间步。这是强化学习的非常基本的实现。

observation_n, reward_n, done_n, info = env.step(action_n)

The step method here returns four variables:

这里的step方法返回四个变量：

observation_n: Observations of the environment

observation_n：对环境的观察

reward_n: If your action was beneficial or not: +1/-1

reward_n：如果您的举动是否有益：+ 1 / -1

done_n: Indicates if the game is over or not: Yes/No

done_n：指示游戏是否结束：是/否

info: Additional info such as performance and latency for debugging purposes

info：用于调试目的的其他信息，例如性能和延迟

You can run this action simultaneously for all the environments in which you’re training your bot. Use the env.render() method to start the bot.

您可以在训练机器人的所有环境中同时运行此操作。使用env.render()方法启动机器人。

env.render()

Now you have the Game Bot ready to compete with the environment. The complete code for this basic bot as well as an advanced version is available in my Github repo here.

现在您已经准备好与环境竞争。这个基本的机器人，以及一个高级版本的完整代码，请在我的GitHub库在这里。

步骤3：运行游戏机器人(Step 3: Run the Game Bot)

Now for the fun part: ensure Docker is running and run the bot. See it in action beating other cars or failing to do so. If it fails, keep tweaking your bot to make it beat intelligence!

现在开始有趣的部分：确保Docker正在运行并运行该机器人。实际观察它击败其他汽车还是不这样做。如果失败，请不断调整您的机器人使其胜过智能！

python gamebot.py

Keep tinkering with AI and eventually you can unlock God Mode! #100DaysOfCode

继续修补AI，最终您可以解锁上帝模式！＃100DaysOfCode

If you enjoyed this, please clap? so others can see it as well! Follow me on Twitter @HariniLabsor Mediumto get the latest updates on other stories or just to say Hi :)

如果喜欢这个，请鼓掌吗？S0其他人可以看到它的！在Twitter上关注我@ HariniLabs或Medium，以获取其他故事的最新更新，或者只是打个招呼：)

PS: Sign up for my newsletterhereto be the first to get fresh new content and it’s filled with a dose of inspiration from the world of #WomenInTechand yes men can signup too!

PS：在这里注册我的新闻通讯是第一个获得新鲜内容的新闻，它充满了＃WomenInTech世界的启发，是的，男性也可以注册！