报告题目：An Introduction to Reinforcement Learning
This talk provides an introduction to reinforcement learning and multiarmed bandit as a subclass of reinforcement learning problems. Reinforcement learning is a learning technique in which an agent has to interact with an environment by selecting and running actions, and progressively discovers the environment dynamics. Multi-armed bandit problem is derived from slot machines and an agent could pull the arms in order to maximize its cumulative reward in the long term. An agent learns optimal behavior through its interactions with arms. Multi-armed bandit problem has several interesting applications such as recommendation systems. In this talk, we review some of well-known algorithms to tackle multi-armed bandit problem.