БЛОГ

Apr 8, 2024

Paper page — Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

Posted by in category: education

From Microsoft.

Direct Nash Optimization.

Teaching Language Models to Self-Improve with General Preferences.

This paper studies post-training large language models (LLMs) using #preference feedback from a powerful oracle to help a model iteratively improve over…


Join the discussion on this paper page.

Leave a reply