"If you can't fix it, just roto it"
Objective of The Game
The aim of this project is to develop and implement an artificial agent in the unity game engine using ML algorithms such as neural networks and reinforcement learning to complete a goal in the game. Then pushing the system which the algorithm is implemented in to its limits. One way to accomplish this is by testing multiple instances of the algorithm at the same time on a singular machine.
The aim for this project is to use this work to optimise future algorithms or procedures to improve current AI for games.
Since models and resources are sparse, the game will be a simple objective based game such as moving an entity to complete a goal. The agent should be able to learn how to complete its goal and adapt over time to perfect it's method. It will be a 2D game ran in a 3D environment where the agent adapts to oncoming objects and jumps over them, the agent learns through reinforcement learning such as a reward system, while a score counter keeps check of how many objects
have been avoided (jumped over).
The game is a simple one, which the player jumps over an oncoming car, each car that the player jumps over earns them a point, we will use this system to reward the AI for completing its goal and give it an incentive to win.
ML Agent Academy Brain
In the ML Agents framework Agents are actors and the behavior which is linked to them determines how they act, an agent requires a behavior in order to function. The agent is responsible for collecting observations, executing actions and assigning rewards. The behavior entity of the framework is responsible for obtaining the observations and rewards and is responsible for determining which action to execute.
Raycasting
Ray casting in Unity is a powerful and common tool to observe the environment. The behavior parameters in the framework consist of vector observations, discrete or continuous data and model behavior types. Model behavior types consist of three main types.
1. Heuristic: This is a classic implementation model, where the programmer decides how they want the intelligent agent to work and develop a hardcoded script to do certain actions given a specific state or response to a change in external stimuli. However this is not machine learning per say as it lacks versatility and can’t change.
2. Learning: When the AI is currently trained using machine learning, during training a neural network gets generated, and in order to use the learned model and apply it to the system the last model used is inference.
3. Inference: The learned model is applied to the system but not changed.
Reinforcement Learning
The mechanism of this system in computer science thrives on a reward/punishment system. The state machine passively interacts with the environment, the machine collects a reward or punishment for each of its actions. The goal of this system is to increase its reward or decrease the risk over a sequence of actions and iterations with the environment, the more point’s the machine obtains, the closer it is to it’s goal, it can also lose points by doing incorrect actions. Reinforcement learning algorithms keep learning from experience of the environment until they explore all possible states.
ML Classes and C# Code
Jumper Class
Refactoring Jump for ML
Action handling in ML-Agents is monitored and controlled by the ActionRecieved(); function. The vector containing actions is in ActionsOut[] anything contained in ActionsOut will be sent and handled by the OnActionRecieved(); function
The initial change to the game must be done in the main player class (Named Jumper). Where now instead of inheriting from ACTOR the class will inherit from AGENT.
Then we can move the code from the Awake(); function to the Initialize(); function.
As shown above the OnActionReceived function checks the vector array and if it contains an action in the first slot then it will call the jump function
Reward System
The next step is to add rewards to the agent, the initial idea is that, when the object collides with the agent/player then the agent decrements the reward by -1, when the agent jumps over the object, it collides with a hidden object, this hidden collision increments the reward by 0.1.
Collision System/Rewards
The script must check, if there is collision with the player and the street then it can perform a jump, else it is in the air, and else if there is a collision with an object then it will decrement the reward. If there is a collision, then there will be a de incrementation in the reward and the game resets because the agent lost, and the end episode function must be called to end the episode.