RLCONFORM logo

2nd RL-CONFORM Workshop at IROS'22

Reinforcement Learning meets HRI, Control, and Formal Methods

Thank you to all speakers, panelists, short paper presenters, and participants for making the 2nd RL-CONFORM a success! We will update our website soon with recordings where applicable.

 

Reinforcement learning (RL) has shown remarkable achievements in applications ranging from autonomous driving, object manipulation, or beating best players in complex board-games. Different communities, including RL, human-robot interaction (HRI), control, and formal methods (FM), have proposed multiple techniques to increase safety, transparency, and robustness of RL. However, elementary problems of RL remain open: exploratory and learned policies may cause unsafe situations, lack task-robustness, or be unstable. By satisfactorily addressing these problems, RL research will have long-lasting impact and see breakthroughs on real physical systems and in human-centered environments. As an example, a collaborative mobile manipulator needs to be robust and verifiably safe around humans. This requires an integrated approach with RL to learn optimal policies for complex manipulation tasks, control techniques to ensure stability of the system, FM techniques to provide formal guarantees to ensure safety, and techniques from human-robot interaction to learn from and interact with humans.

The aim of this multidisciplinary workshop is to bring these communities together to:

  1. Identify key challenges and opportunities related to safe and robust exploration, formal safety and stability guarantees of control systems, safety in physical human-robot collaborative systems;
  2. Provide unique insights into how these challenges depend on the application, desired system properties, and complexity of the environment;
  3. Propose new and debate existing approaches to ensure desired properties of learned policies in a wide range of domains;
  4. Discuss existing and new benchmarks to accelerate safe and robust RL research;
  5. Disseminate the outcomes of the workshop and publish the results as a perspectives article in one of the major robotics journals.

The themes of the workshop include but are not limited to RL and control theory, RL and Human-Robot Interaction, RL and Formal Methods, and benchmarking of RL.

Tentative Program (all times are JST) - October 23, 2022.

 

Time (JST)
09:00 am - 09:05 am Organizers
Welcome!
09:05 am - 09:25 am Invited Speaker: Scott Niekum
20-minute talk
09:25 am - 09:45 am Invited Speaker: Georgia Chalvatzaki
20-minute talk
09:45 am - 10:30 am Panel Discussion I
45-minute panel session
10:30 am - 10:50 am Coffee Break
10:50 am - 11:10 am Invited Speaker: Jeanette Bohg
20-minute talk
11:10 am - 11:30 am Invited Speaker: Bradley Hayes
20-minute talk
11:30 am - 12:00 pm Short Papers I
4-minute talks
12:00 pm - 01:30 pm Lunch Break
01:30 pm - 01:50 pm Invited Speaker: Takayuki Osa
20-minute talk
01:50 pm - 02:10 pm Invited Speaker: Fabio Ramos
20-minute talk
02:10 pm - 02:20 pm Coffee Break
02:20 pm - 03:05 pm Panel Discussion II
45-minute panel
03:05 pm - 03:25 pm Coffee break
03:25 pm - 03:45 pm Invited Speaker: Hadas Kress-Gazit
20-minute talk
03:45 pm - 04:05 pm Invited Speaker: Nils Jansen
20-minute talk
04:15 pm - 04:45 pm Short Papers II
4-minute talks
04:45 pm - 05:00 pm Organizers
Closing Remarks

 

Short paper presentations I (morning session)

PDFs of the papers will be added to the website shortly

Time (JST)
11:30-11:36 am Tung Nguyen and Johane Takeuchi.
Utilization of domain knowledge to improve POMDP belief estimation. (paper)
11:36-11:42 am   Francisco Cruz, Adam Bignold, Hung Son Nguyen, Richard Dazeley and Peter Vamplew.
Broad-persistent advice for interactive reinforcement learning scenarios. (paper)
11:42-11:48 am   Stefano Massaroli, Michael Poli, Ren Komatsu, Alessandro Moro, Atsushi Yamashita and Hajime Asama.
Model-Based Policies in Continuous Time, States and Actions: Surrogate Models and Gradient Estimation.
11:48-11:54 am   Finn Rietz, Erik Schaffernicht, Todor Stoyanov and Johannes Andreas Stork.
Towards Task-Prioritized Policy Composition. (paper)
11:54-12:00 pm   Eugene Lim and Harold Soh
Observed Adversaries in Deep Reinforcement Learning. (paper)

 

Short paper presentations II (afternoon session)

PDFs of the papers will be added to the website shortly

Time (JST)
04:15-04:21 pm   Mudit Verma, Ayush Kharkwal and Subbarao Kambhampati.
Advice Conformance Verification by Reinforcement Learning agents for Human-in-the-Loop. (paper) (video)
04:21-04:27 pm   Mudit Verma and Katherine Metcalf
Symbol Guided Hindsight Priors for Reward Learning from Human Preferences. (paper) (video)
04:27-04:33 pm   Quantao Yang, Johannes A. Stork and Todor Stoyanov.
Transferring Knowledge for Reinforcement Learning in Contact-Rich Manipulation. (paper)
04:33-04:39 pm   Xavier Weiss, Saeed Mohammadi, Parag Khanna, Mohammad Reza Hesamzadeh and Lars Nordström.
Learn to Run Power Network (L2RPN) Safely with Deep Reinforcement Learning. (video)
04:39-04:45 pm   Alexander Dürr, Volker Krueger and Elin Anna Topp.
Towards a comprehensive study to identify the relevant I/O to support or complement Reinforcement Learning.

 

 

Invited Speakers

Fabio Ramos, NVIDIA and University of Sydney, AU.

Headshot of Fabio Ramos

Title: Leveraging Differentiable Simulation for Reinforcement Learning and Bayesian Domain Randomization

Abstract: Differentiable simulation can play a key role in scaling reinforcement learning to higher dimensional state and action spaces, while, at the same time, leveraging recent probabilistic inference methods for Bayesian domain randomization. In this talk, I will discuss advantages and disadvantages of differentiable simulation and connect it with two methods that use differentiability to speed up Bayesian inference, stochastic gradient Langevin dynamics and Stein Variational Gradient Descent. Our resulting Bayesian domain randomization approach can quickly produce posterior distributions over simulation parameters given real state-action trajectories, leading to robust controllers and policies. I will show examples in legged locomotion, robotics manipulation, and robotics cutting.

Bio: Fabio is a Principal Research Scientist at NVIDIA, and Professor in machine learning and robotics at the School of Computer Science, University of Sydney. Before, Fabio was a co-Director of the Centre for Translational Data Science, and previously an Australian Research Council (ARC) Research Fellow at the Australian Centre for Field Robotics. Fabio's research is focused on modelling and understanding uncertainty for prediction and decision making tasks, and includes Bayesian statistics, data fusion, anomaly detection, and reinforcement learning. Over the last ten years Fabio has applied these techniques to robotics, mining and exploration, environment monitoring, and neuroscience.


Georgia Chalvatzaki, TU Darmstadt, DE.

Headshot of Georgia Chalvatzaki

Title: Learning adaptive and safe human-centric robot behaviors

Abstract: The need for intelligent and safe robotic assistants is more urgent than ever in hospitals, nursing homes, caring facilities, etc. In this talk, I will discuss problems I have addressed over the years regarding human-centered robotic assistants that learn to adapt to humans, ranging in applications regarding elderly support and child-robot interaction. I will cover methods for human behavior understanding and intention prediction, and explain how structured models and modern deep learning methods allow effective, adaptive physical human-robot interaction behaviors. I will finish my talk with our recent work on modeling human and object manifolds to use them as differentiable constraints in safe reinforcement learning methods of robot behaviors, particularly for learning safe robot interactions in the presence of humans.

Bio: Dr. Georgia Chalvatzaki is an Assistant Professor and the research leader of the intelligent robotic systems for assistance (iROSA) group at TU Darmstadt and a member of Hessian.AI. She received the Emmy Noether grant from the German Research Foundation (DFG) in 2021. In iROSA, her team researches the topic of "Robot Learning of Mobile Manipulation for Intelligent Assistance," investigating novel methods for combined planning and learning to enable mobile manipulator robots to solve complex tasks in house-like environments with the human-in-the-loop of the interaction process. She is co-chair of the IEEE RAS technical committee of Mobile Manipulation and co-chair of the IEEE RAS Women in Engineering Committee.


Takayuki Osa, University of Tokyo, JP.

Headshot of Takayuki Osa

Title: What should we learn in a robot-learning system?

Abstract: To deploy a robot-learning system in the real world, it is essential to ensure the safety of the system. In this talk, we discuss what to learn in a robot-learning system and how to learn it to make the system reliable. As an approach towards safe robot-learning systems, we introduce methods for learning diverse solutions in motion planning and reinforcement learning. Through case studies, we demonstrate that the diversity of solutions offers the flexibility and reliability to a robot-learning system.

Bio: Takayuki Osa is an associate professor at the University of Tokyo and a visiting researcher at RIKEN Center for Advanced Intelligence Project (AIP). Before joining UTokyo in June 2022, he was an associate professor at Kyushu Institute of Technology. From April 2017 to February 2019, he was a project assistant professor at the University of Tokyo. Takayuki Osa received his Ph.D. in Engineering from the University of Tokyo in 2015. From 2015 to 2017, he worked with Jan Peters and Gerhard Neumann at TU Darmstadt in Germany as a post-doctoral researcher. Industrial partners include Honda Motor Co., Ltd., Komatsu Ltd.


Hadas Kress-Gazit, Cornell University, US.

Headshot of Hadas Kress-Gazit

Title: Skills and composition: a wish list for RL

Abstract: Given a set of robot skills, one can compose them to achieve complex robot behaviors, be it through classical planning, or synthesis from temporal logic requirements. But where do these skills come from? How should we learn them? How should we modify them if needed? In this talk, I will give my own personal wish list for properties of RL and will be excited to learn whether my ideal algorithm is already a reality.

Bio: Hadas Kress-Gazit is a Professor at the Sibley School of Mechanical and Aerospace Engineering at Cornell University. She received her Ph.D. in Electrical and Systems Engineering from the University of Pennsylvania in 2008 and has been at Cornell since 2009. Her research focuses on formal methods for robotics and automation and more specifically on creating verifiable robot controllers for complex high-level tasks using logic, verification, synthesis, hybrid systems theory and computational linguistics. She received an NSF CAREER award in 2010, a DARPA Young Faculty Award in 2012 and the Fiona Ip Li '78 and Donald Li '75 Excellence in teaching award in 2013.


Bradley Hayes, University of Colorado Boulder, US.

Headshot of Bradley Hayes

Title: Making Capable Human-Robot Teams through Reinforcement Learning and Multimodal Communication

Abstract: Robots capable of collaborating with or otherwise assisting humans can bring transformative changes to the way we live and work. So where are these robots that we've been told about for the past 20 years? Deployments into human-populated environments and onto human teams remain largely infeasible due to the challenge of ensuring that robots are simultaneously proficient at task execution and at maintaining shared situational awareness with the humans around them. In this talk I will present my group's recent work toward overcoming these challenges via an interactive, communicative approach: blending novel techniques for imitation learning and autonomous coaching to realize autonomous systems that are able to transparently learn from and adapt to operation with and around humans, sharing their knowledge to improve collective situational awareness as well as individual or team performance.

Bio: Bradley Hayes is an Assistant Professor of Computer Science at the University of Colorado Boulder, where he directs the Collaborative AI and Robotics (CAIRO) Lab. Brad's research develops techniques to create and validate autonomous systems that learn from, teach, and collaborate with humans to improve mutual understanding, efficiency, safety, and capability at scale. His work combines novel approaches at the intersection of machine learning, cognitive science, and explainable artificial intelligence, toward making human-autonomy teams both more capable and more powerful than the sums of their parts.


Scott Niekum, University of Massachusetts Amherst, US.

Headshot of Scott Niekum

Title: The Role of Guarantees in Value Alignment

Abstract: As AI systems have become increasingly competent, value alignment -- ensuring that the goals and/or behaviors of AI systems align with human values -- has become a popular buzzword in the AI research community, though it's exact technical meaning is often unclear. Perhaps the single most important distinction across value alignment methods is whether they provide performance guarantees of any form, suggesting several critical questions that the AI research community must address: Are guarantees integral to the core concept of value alignment? What types of guarantees are even possible in practice? And does value alignment without guarantees amount to anything more than a marketing strategy?

Bio: Scott is an Associate Professor and the director of the Safe, Confident, and Aligned Learning + Robotics Lab (SCALAR) in the College of Information and Computer Sciences at The University of Massachusetts Amherst. Scott is also a core member of the interdepartmental UMass robotics group, as well as an Adjunct Professor at the University of Texas at Austin.


Jeanette Bohg, Stanford University, US.

Headshot of Jeanette Bohg

Title: Plan for any task: Resolving action dependencies in sequential manipulation task with learned manipulation primitives

Abstract: Advances in robotic skill acquisition have made it possible to build general-purpose libraries of primitive skills for downstream manipulation tasks. However, naively executing these learned primitives one after the other is unlikely to succeed without accounting for dependencies between actions prevalent in long-horizon plans. In this talk, I will present a scalable framework for training manipulation primitives and coordinating their geometric dependencies at plan-time to efficiently solve long-horizon tasks never seen by any primitive during training. Based on the notion that Q-functions encode a measure of action feasibility, we formulate motion planning as a maximization problem over the expected success of each individual primitive in the plan, which we estimate by the product of their Q-values. Our experiments indicate that this objective function approximates ground truth plan feasibility and, when used as a planning objective, reduces myopic behavior and thereby promotes task success. We further demonstrate how our approach can be used for task and motion planning by estimating the geometric feasibility of candidate action sequences provided by a task planner. We evaluate our approach in simulation and on a real robot.

Bio: Jeanette is a Professor for Robotics at Stanford University and director of the Interactive Perception and Robot Learning Lab. In general, Jeanette's research explores two questions: What are the underlying principles of robust sensorimotor coordination in humans, and how we can implement them on robots? Research on this topic has to necessarily be at the intersection of Robotics, Machine Learning and Computer Vision.


Nils Jansen, Radboud University, NL.

Headshot of Nils Jansen

Title: Safe Reinforcement Learning under Partial Observability

Abstract: Safe exploration is a common problem in reinforcement learning (RL) that aims to prevent agents from making disastrous decisions while exploring their environment. A family of approaches to this problem assume domain knowledge in the form of a (partial) model of this environment to decide upon the safety of an action. A so-called shield forces the RL agent to select only safe actions. However, for adoption in various applications, one must look beyond enforcing safety and also ensure the applicability of RL with good performance. We discuss the applicability of shields via a tight integration with state-of-the-art deep RL, and provide an extensive, empirical study in challenging, sparse-reward environments under partial observability. We show that a carefully integrated shield ensures safety and can improve the convergence rate and final performance of RL agents. We furthermore show that a shield can be used to bootstrap state-of-the-art RL agents: they remain safe after initial learning in a shielded setting, allowing us to disable a potentially too conservative shield eventually.

Bio: Nils is an associate professor with the Department of Software Science (SWS) at Radboud University Nijmegen in the Netherlands. Nils a member of the ELLIS society.


 

 

Invited Panelists

There will be two interactive panel sessions, one on Principles and understanding of RL algorithms and models and one on Benchmarks, implementation, and accelerating RL research.

We, the workshop organizers, prioritize and strive to improve gender equity, diversity, and inclusion in our workshop and to our best ability plan our sessions accordingly. In this year's hybrid edition, our actions to accommodate time zones of our panelists might affect panel composition.

 

Benchmarks, implementation, and accelerating RL research
(9:45 - 10:30 am JST)

Headshot of Tesca Fitzgerald

Tesca Fitzgerald is an Assistant Professor at Yale University, US.

Headshot of Lukas Brunke

Lukas Brunke is a PhD student at the University of Toronto Institute for Aerospace Studies, CA.

Headshot of Nathan Fulton

Nathan Fulton is a Senior Applied Scientist at Amazon Web Services, US.

Headshot of Elaine Schaertl Short

Elaine Schaertl Short is an Assistant Professor in the Tufts University Department of Computer Science, US.

 

Principles and understanding of RL algorithms and models
(2:20 - 3:05pm JST)

Headshot of Jens Kober

Jens Kober is an Associate Professor at TU Delft, NL.

Headshot of Harold Soh

Harold Soh is an Assistant Professor at the National University of Singapore, SG.

Headshot of Takamitsu Matsubara

Takamitsu Matsubara is professor at the Nara Institute of Science and Technology and a visiting researcher at the ATR Computational Neuroscience Laboratories, Kyoto, Japan.

Mohan Sridharan reader in Cognitive Robot Systems in the School of Computer Science at the University of Birmingham, UK.

 

Call for Papers

We invite extended 2-4 page abstract submissions of recent works, preliminary work with open questions is very welcome, related to the theme of the workshop. All accepted abstracts will be part of a short paper presentation session held during the workshop, where the authors will have the opportunity to present their lines of work in a 5 minutes presentation, followed by a 3-minutes live Q&A session. This is a non-archival venue: there will be no formal proceedings, but we encourage the authors to publish their extended abstracts on arXiv (where the link will be placed on the workshop’s website). Abstracts may be submitted to other venues in the future.

Based on the target areas and the discussions during our RL-CONFORM workshop at last year’s IROS, topics of interest include but are not limited to:

  • Data-efficiency, sim-to-real gap, and guided exploration in RL;
  • Safety guarantees, shielding, invariant sets, and online verification;
  • Query sample-efficiency, human-robot interaction, learning from demonstration, and human feedback;
  • Existing and new benchmarks to accelerate safe and robust RL research.

Important workshop details

  • When: October 23, 2022.
  • Where: Hybrid event co-located with IROS 2022 in Kyoto, Japan, and over Zoom.
  • Submission deadline: September 3, 2022 (AoE)
  • Notification of acceptance: September 15, 2022
  • Submission format: 2-4 page abstracts (excl. references) of original, possibly ongoing research. Papers should be formatted in the IROS 2022 style guidelines, more information can be found at IROS Call for Papers.
  • To submit your work visit: Easychair submission website
  • Contact: rlconform2022@easychair.org

Previous Editions of RL-CONFORM

For information about the workshop in 2021, visit: RL-CONFORM 2021

Connect with us and join the conversation!