A Survey of Reinforcement Learning from Human Feedback

Timo Kaufmann, Paul Weng, Viktor Bengs et al.

2023 · arXiv (Cornell University) · 36 citations

Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that learns from human feedback instead of relying on an engineered reward function. Building on prior work on the related setting of preference-based reinforcement learning (PbRL), it stands at the intersection of artificial intelligence and human-computer interaction. This positioning provides a promising approach to enhance the performance and adaptability of intelligent systems while also improving the alignment of their objectives with human values. The success in training large language models…

Read the paper →

Explore this paper's citation graph on Constellation.