Tag: Generative-Rl
Reward Is Not a Universal Interface for Generative Reinforcement Learning
MindRL argues that a reward becomes an RL update only after it passes through the probability object, score, or controlled surrogate exposed by the policy branch.