Sci4Teens Competition: Engineering 16-18 Bronze Award

cellfiemagazine
Oct 11, 2020
9 min read

Updated: Oct 12, 2020

Deep Reinforcement Learning: How AI Can Become More Powerful - Hoor Ulain Umar and Deeksha Yelamanchi

Abstract:

Deep reinforcement learning (deep RL) is an up-and-coming form of artificial intelligence (AI) that combines two widely known subsets of machine learning: deep learning and reinforcement learning (RL). When used in combination, both mechanisms optimize their

efficacy by collaborating and solving problems that cannot be fixed individually. Deep learning utilizes artificial neural networks to teach a program to learn on its own, whereas reinforcement learning primarily relies on an agent to act upon its environment and signals several outputs until an optimal solution is driven (trial and error method). When coupled together, deep learning and reinforcement learning are much stronger than they are individually, as many aspects of deep learning can solve issues that reinforcement learning encounters, and vice versa. As deep RL becomes increasingly popular, researchers and scientists are continuing to look into its applications in neuroscience and the real world.

Deep Learning:

The first component here is deep learning. It is a subset of machine learning, however the key difference between deep learning and machine learning is that the former does not require specific algorithms and instructions (Simplilearn 2019). For example, if a programmer implemented an algorithm that asked a machine to recognize everytime the word “orange” appeared in pieces of handwritten work, the machine would struggle to do this. This is because it can only recognize the word “orange” in one specific font, but cannot adapt to the potential differences in style. Deep learning, however, can learn to adapt and recognize the same thing in different circumstances (Ng 2014). In order to function, deep learning utilizes forward propagation and backpropagation through a series of neural networks, giving it the ability to learn on its own, which is something not incorporated in machine learning (Brownlee 2019). Neural networks are essentially sets of neurons, or processors, that can be activated. They are composed into layers which allow data to pass through and produce an output through channels that connect them. This can be compared to a process of elimination. Think of a large bucket filled with dirty pond water. The goal is to eliminate the sticks, rocks, dirt, and any other materials, leaving only clean water. Initially, one would pour the dirty mixture from the bucket into a strainer with larger holes. This would eliminate any sticks in the water. The mixture would then go through another strainer to get rid of any rocks. This process would continue until only clean water is left. Similarly, a neural network takes data into the input layer. This information then goes through a series of hidden layers (which can be compared to the strainers), until only the data needed is left over. This data is then transferred to the output layer, giving a final answer (Raicea 2017).

For neural networks, the initial input must be information that can be converted into numbers (Sanderson 2017). For example, if someone wanted to detect something within an image, the program would recognize pixels as individual features of the data, rather than looking at the picture as a whole. Every neuron within the neural network is essentially a holding place for a number and is connected to the neurons in the layer after it through channels. Neurons will go through the channels and each layer will look for a different factor to eliminate. Once the data has gone through every hidden layer, it will be sent to the output layer, leaving only a single

answer (see fig. 1). This entire process is known as forward propagation, as the information is sent forward through the neural network.

Fig. 1. Shown here is a deep neural network example with three hidden layers and channels connecting each layer from: Nielsen, Michael A. Neural Networks and Deep Learning. Determination Press, 2015.

However, as deep learning revolves around learning mechanisms, the output is only the answer with the highest probability of being correct, not necessarily one that is one hundred percent right (Rohrer 2019). If this answer is incorrect, the network will attempt to fix the issue by comparing the actual output to the correct answer. The difference between the actual output and the correct answer is known as the error; it’s magnitude determines how far away the machine was from being correct. The info is then sent backwards through the channel through a process known as backpropagation (Schmidhuber 2014). Going through forward propagation and backpropagation repeatedly is what allows a neural network to learn and develop, a training process that can take months to perfect (Sanderson 2017).

Reinforcement Learning:

Reinforcement learning, like deep learning, is a subset of machine learning that focuses on autonomously training a system through a trial and error process that maximizes a machine's creativity (Bajaj 2020). A RL system primarily consists of an agent, an environment, a state, a reward, and a policy (see fig. 2) (Sharma 2020). The agent is an algorithm that takes actions on the environment around it (Osiński et. al 2018; Sharma 2020). The environment is where the agent is placed and the state is the situation the agent is placed in (Osiński et. al 2018; Sharma 2020). The reward is the feedback that is given to the agent after it derives an output through the implementation of a policy (a strategy used to determine actions) (Sharma, 2020).

Fig. 2. Shown here are the primary components of a reinforcement learning system from: Mayank, Mohit. “Reinforcement Learning with Q Tables.” Medium, ITNEXT, 18 Aug. 2018,

itnext.io/reinforcement-learning-with-q-tables-5f11168862c8.

In RL models, the agent’s ultimate goal through implementing the trial and error process is to maximize rewards, in order to achieve the best outcomes (Bajaj 2020; Osiński et. al 2018; Sharma 2020).

To put this into context, think of the process of learning how to bike. When someone first learns how to bike, they start off by implementing trial and error. The biker might lose balance and fall due to sharp, and unexpected movements. Through this experience, the biker now knows to avoid this technique, and would thus implement smoother movements. The same method of learning is used in RL models. The agent utilizes different trial and error to reinforce the actions that maximize rewards and continually modifies its algorithms until the best result is delivered (Sharma 2020).

In order to do this, a training process is set in place (Bajaj 2020). The designer helps the machine by sending an input signal which will initiate the model. Then, the machine independently creates various output signals depending on the actions it decides to take (Bajaj 2020). If the designer sends an output signal, the purpose of the trial and error process will be defeated as it would restrict the machine from being able to make mistakes, learn, and maximize rewards (Bajaj 2020). The training of the machine progresses as the machine tries to maximize rewards, while the designer rewards or punishes the machine for the output/results it achieves (Bajaj 2020).

These unique properties of RL open pathways to solving problems within AI that cannot be solved using standard machine learning techniques (Joy 2020; Osiński et. al 2018). RL models are very similar to a human’s style of learning- less likely to repeat mistakes, capable of identifying and correcting mistakes, and able to create perfect policies (Joy 2020). In some rare cases, RL models are capable of outshining a human mind (Joy 2020). In times when collecting information through the environment itself is the only option, reinforcement learning models have proven to be highly useful (Joy 2020). Exploration and exploitation are also equally balanced features in RL models, which is not an aspect that is found within other machine learning models (Joy 2020).

While on the other hand, there are many gaps in the reinforcement learning model that makes it difficult to run on its own (Joy 2020). An intense amount of data, computation, and money is required to operate successful reinforcement learning models, which can result in loss of files due to overburden (Joy 2020). RL agents are oftentimes not capable of solving straightforward, simple problems (Joy 2020). Moreover, RL models are not capable of proper dimensional analysis, and processing realistic states from their environments (Joy 2020). In order for gaps within reinforcement learning models to fill up, deep learning must run alongside, which leads to the creation of a newer model- deep RL (Joy 2020).

Deep Reinforcement Learning:

As shown, the mechanisms and processes behind deep learning and reinforcement learning are what make each of them unique and powerful in different ways. The idea of using them in combination has become a recent commodity within the field of AI, gaining more

popularity as the program becomes stronger. This is due to multitude reasons, including the way deep RL can solve problems that deep learning and reinforcement learning encounter on their own, their neuroscientific implications, and the doors that they open up in terms of AI and real world applications (Botvinick et al., 2020).

One of the most interesting aspects of deep RL is that it is not just the sum of reinforcement learning and deep learning; it produces unique phenomena that either one does not have on its own. One of the most prominent issues with RL is that it relies on a single input-output system, rather than having a decision making process that goes through many steps. This limitation reduces reinforcement learning’s complexity and therefore the quality of its applications. Deep learning combats this issue through the use of artificial neural networks, an example that clearly illustrates the importance of combining deep learning and RL.

Deep RL is a mechanism that has been in development for quite a while. Some of the most common deep RL tests are done on games such as chess, checkers, Go, and backgammon, as they provide an ideal ground for testing artificial intelligence mechanisms. One of the earliest studies involving deep RL was conducted with a temporal difference (TD) method on the game backgammon (Tesauro, 1995). This method essentially utilized comparisons between the actual answer and the predicted output to realize the error (similar to what was being used in deep learning). Although TD-gammon was not fully successful, it still provided a gateway for future tests regarding a combination of reinforcement learning and deep learning. The first significantly successful model of deep RL was created in 2013, which utilized this system to play Atari video games (Mnih et al., 2013). Since then, deep RL has become an increasingly popular mechanism in AI, not only in regards to games, but also in regards to its possible neuroscientific and real-world applications.

A prominent application of deep RL is in neural representation (Botvinick et. al 4; Yamins & DiCarlo 2016). It is the closest AI machine learning subset to represent an actual human brain (including the internal networks, and the method of processing information) (Botvinick et. al 2). This also implies that deep RL conveys the best mechanisms of learning and decision making within all AI subsets. Similar to how the human brain rewards learning through dopamine production, deep RL uses rewards to constantly promote learning (Botvinick et. al 5).

As deep RL continues to grow, there are several successes that attest to its potential. For example, AlphaZero, a deep RL system, implemented self-teaching methods through trial and error processes and initiated changes to parameters within its neural network, allowing it to master chess (Silver 2018). Deep RL is currently being used to improve drilling and safety conditions within the petroleum industry (Marr 2019). Similarly, deep RL’s applications are also being explored in several other fields including customer service, transportation, sustainability, and neuroscience (Botvinick et. al 1; Marr 2019; Osiński et. al 2018; TORRES.AI 2020).

Conclusion:

Deep RL is an intriguing mechanism that continues to grow in popularity as the technological world advances. It utilizes a unique combination of deep learning and reinforcement learning to create a new machine learning subset that is stronger than ever seen before. This is due to how deep RL takes reinforcement learning problems and solves them through deep learning processes, thereby creating a machine that can learn and develop on its own with much more efficacy and independence. Further research on deep RL will help develop AI technology and subsequently shed light onto potential world applications in fields like sustainability, customer service, and neuroscience.

References:

Raicea, Radu. “Want to Know How Deep Learning Works? Here's a Quick Guide for Everyone.” FreeCodeCamp.org, FreeCodeCamp.org, 31 Mar. 2020,

www.freecodecamp.org/news/want-to-know-how-deep-learning-works-heres-a-quick-guid e-for-everyone-1aedeca88076/#:~:text=Deep Learning uses a Neural,), and the Output Layer.&text=Neurons apply an Activation Function,coming out of the neuron.

Bajaj, Prateek. “Reinforcement Learning.” GeeksforGeeks, 17 May 2020, www.geeksforgeeks.org/what-is-reinforcement-learning/.

Botvinick, Matthew, et al. DeepMind, 2020, pp. 1–22, Deep Reinforcement Learning and Its Neuroscientific Implications.

Brownlee, Jason. “What Is Deep Learning?” Machine Learning Mastery, 14 Aug. 2020, machinelearningmastery.com/what-is-deep-learning/.

Goodfellow, Ian et al. Deep Learning. MIT Press, 2016.

Joy, Ashwin. “Pros And Cons Of Reinforcement Learning.” Pythonista Planet, 11 June 2020, www.pythonistaplanet.com/pros-and-cons-of-reinforcement-learning/.

Marr, Bernard. “The Incredible Ways Shell Uses Artificial Intelligence To Help Transform The Oil And Gas Giant.” Forbes, Forbes Magazine, 18 Jan. 2019,

www.forbes.com/sites/bernardmarr/2019/01/18/the-incredible-ways-shell-uses-artificial-int elligence-to-help-transform-the-oil-and-gas-giant/.

Mayank, Mohit. “Reinforcement Learning with Q Tables.” Medium, ITNEXT, 18 Aug. 2018, itnext.io/reinforcement-learning-with-q-tables-5f11168862c8.

Ng, Andrew. “RSS2014: 07/16 09:00-10:00 Invited Talk: Andrew Ng (Stanford University): Deep Learning.” Youtube. Uploaded by RSS Conference, 06 August 2014, www.youtube.com/watch?v=W15K9PegQt0

Nielsen, Michael A. Neural Networks and Deep Learning. Determination Press, 2015.

Osiński, Błażej, and Konrad Budek. “What Is Reinforcement Learning? The Complete Guide.” Deepsense.ai, 5 July 2018,

deepsense.ai/what-is-reinforcement-learning-the-complete-guide/.

Raicea, Radu. “Want to Know How Deep Learning Works? Here's a Quick Guide for Everyone.” FreeCodeCamp.org, FreeCodeCamp.org, 31 Mar. 2020,

Rohrer, Brandon. “How Deep Neural Networks Work - Full Course for Beginners.” Youtube. Uploaded by freeCodeCamp.org, 16 April 2019,

www.youtube.com/watch?v=dPWYUELwIdM

Sanderson, Grant. “But what is a Neural Network? | Deep learning, chapter 1.” Youtube. Uploaded by 3Blue1Brown, 05 October 2017,

www.youtube.com/watch?v=aircAruvnKk&vl=en

Sanderson, Grant. “What is backpropagation really doing? | Deep learning, chapter 3” Youtube. Uploaded by 3Blue1Brown, 03 November 2017,

https://www.youtube.com/watch?v=Ilg3gGewQ5U

Schmidhuber, Jürgen. “Deep Learning in Neural Networks: An Overview.” Neural Networks, vol. 61, 2015, pp. 85–117., doi:10.1016/j.neunet.2014.09.003.

Sharma, Siddharth. “The Ultimate Beginner's Guide to Reinforcement Learning.” Medium, Towards Data Science, 13 June 2020,

towardsdatascience.com/the-ultimate-beginners-guide-to-reinforcement-learning-588c071a f1ec.

Silver, David, et al. “AlphaZero: Shedding New Light on the Grand Games of Chess, Shogi and Go.” Deepmind, 6 Dec. 2018,

deepmind.com/blog/article/alphazero-shedding-new-light-grand-games-chess-shogi-and-go

Simplilearn. “Deep Learning In 5 Minutes | What Is Deep Learning? | Deep Learning Explained Simply | Simplilearn.” Youtube. Uploaded by Simplilearn, 13 June 2019, www.youtube.com/watch?v=6M5VXKLf4D4..

TORRES.AI, Jordi. “DRL 01: A Gentle Introduction to Deep Reinforcement Learning.” Medium, Towards Data Science, 9 July 2020,

towardsdatascience.com/drl-01-a-gentle-introduction-to-deep-reinforcement-learning-405b 79866bf4.

Yamins, D., DiCarlo, J. Using goal-driven deep learning models to understand sensory cortex. Nat Neurosci 19, 356–365 (2016). https://doi.org/10.1038/nn.4244

Sci4Teens Competition: Engineering 16-18 Bronze Award

Recent Posts

Comments