Reinforcement Learning from Human Feedback

(arxiv.org)

37 points | by onurkanbkrc 2 hours ago

2 comments