Multi-agent Reinforcement Learning for pedestrians simulation


This web presents a Multi-agent Reinforcement Learning framework proposed to simulate groups of pedestrians, where the individuals have to learn an optimized microscopic control to navigate inside a 3D environment. The aim of the framework is to get plausible pedestrian simulations of groups of pedestrians to be used as secundary actors in games, training 3D interactive tools (serious games) and other 3D interactive virtual-environment-based applications that need a human-like real-time pedestrian simulation.

The main properties of this framework are the following:

The framework

In Reinforcement Learning (RL), the agent learns by interacting with the environment. Each agent modifies the state of the perceived environment through actions. A critic, that is part of the environment, gives to the agent a feedback signal (an immediate reward) that values the joint state-action pair. This reward is used to update a Value Function that estimates the value of being in a state and executing an action. Once the learning process ends, the learned Value Function is used to select the best action in each specific state.

Our Multi-agent Reinforcement Learning framework for pedestrian navigation (MARL-Ped) is a multi-agent system where each embodied agent represents a pedestrian. Each agent learns a particular behavior from his/her own experience of interactions with the environment. The agents learn to control their velocity vector depending of the state they perceive. Therefore the learned behavior is a velocity control system that modifies the velocity of the agent based on the current stete that the agent perceives, as the real pedestrians do.

The framework has two types of agents. The embodied agentes represents the pedestrians, and contain the learning algorithms and the decision-making modules. Besides they also incorporates the necessary modules to process the sensed raw information of the environment to extract the state in which the agent is.

The environment agent has two main components: the physics module and the critic module.
The physics module is in charge of calculating the physic interactions among the agents and between the agents and the virtual environment (constituted basically by walls and the floor). The physics module is implementes using a calibrated version of the Open Dynamics Engine (ODE) physics engine. The calibration process is described in MIG2012 paper referenced below.
The critic module evaluate the actions performed by the agents in each time step and releases an immediate reward that valorates this action.

The Multi-agent system is implemented with parallelization techniques in which each agent is a different MPI process. The communication module is therefore based on the MPI programming interface that implements the communication between the environment and each agent.

The following image represents a schema of our framework.

This framework has a flexible modular architecture which permits to research different techniques at several levels. For instance several types of state generalization techniques have been implemented and tested as well as different learning algorithms and knowledge transfer techniques. For a complete explanation of the model see the published papers of our work at Publications.


Dr. Francisco Martinez-Gil is a lecturer of the School of Engineering at the University of Valencia (ETSE-UV) and a member of the CoMMLab research group. He received a MSc. Physics (Universitat de Valencia) in 1992 and received his Ph. D. degree in Computer Science from the Universitat of Valencia in 2014.

Dr. Fernando Fernández Rebollo is a faculty of the Computer Science Department of Universidad Carlos III de Madrid, since october 2005. He received his Ph.D. degree in Computer Science from University Carlos III of Madrid (UC3M) in 2003. He received his B.Sc. in 1999 from UC3M, also in Computer Science. Since 2009, he became senior lecturer at UC3M.

Dr. Miguel Lozano Ibáñez is an a active member of the CoMMLab research group. He received his Ph.D. degree in Computer Science from University of Valencia in 2004. His research interests include large-scale multiagent systems, social engineering and distributed/parallel architectures. Dr. Lozano served as a member of the Program Committee in different conferences and workshops (e.g. ICMCS) as well as a reviewer for scholarly journals like Journal of Applied Soft Computing (Elsevier) or Computational and Mathematical Organization Theory.


A Reinforcement Learning Approach for Multiagent Navigation Francisco Martinez-Gil, Fernando Barber, Miguel Lozano, Francisco Grimaldo, Fernando Fernández. ICAART 2010 - Proceedings of the International Conference on Agents and Artificial Intelligence, Volume 1 - Artificial Intelligence, Valencia, Spain, January 22-24, 2010. INSTICC Press 2010, ISBN 978-989-674-021-4

Multi-Agent Reinforcement Learning for Simulating Pedestrian Navigation Francisco Martinez-Gil, Miguel Lozano, Fernando Fernández. Adaptive and Learning Agents Workshop at AAMAS (ALA'2011). LNAI 7113 (P. Vrancx, M. Knudson, M. Grzes Eds.) . Pags.54-69. Springer. 2012

Calibrating a motion model based on reinforcement learning for pedestrian simulation Francisco Martinez-Gil, Miguel Lozano, Fernando Fernández. ACM SIGGRAPH Conference on Motion in Games (MIG 2012) Rennes (France). LNCS 7660 (M. Kallmann, K. Bekris eds.) Pages 302-313. Springer. 2012.

Emergent collective behaviors in a multi-agent reinforcement learning based pedestrian simulation Francisco Martinez-Gil, Miguel Lozano, Fernando Fernández. Extended abstracts booklet of the First Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM2013). Princeton University. New Jersey. 25-27 October 2013.
Accepted as a full paper in the AAMAS 2014 Workshop: Multi-Agent-Based Simulation (MABS 2014) Paris, France. 2014

MARL-Ped: a Multi-Agent Reinforcement Learning Based Framework to Simulate Pedestrian Groups Francisco Martinez-Gil, Miguel Lozano, Fernando Fernández. Simulation Modelling Practice and Theory 47: 259-275 (2014). Elsevier.

Strategies for simulating pedestrian navigation with multiple reinforcement learning agents Francisco Martinez-Gil, Miguel Lozano, Fernando Fernández. Autonomous Agents and Multi-Agent Systems. 29 (1): 98-130 (2015). Springer. DOI: 10.1007/s10458-014-9252-6 

Emergent behaviors and scalability for multi-agent reinforcement learning-based pedestrian models  Francisco Martinez-Gil, Miguel Lozano, Fernando Fernández. Simulation Modelling Practice and Theory 74: 117-133 (2017). Elsevier.

Modeling, Evaluation and Scale on Artificial Pedestrians: A literature review  Francisco Martinez-Gil, Miguel Lozano, Ignacio García, Fernando Fernández. ACM Computing Surveys (CSUR) 50 (5): Article 72 (2017). ACM.

Contador de visitas