Instruction tuning is the future, but the delay in launching RLHF via PPO suggests it is finicky in practice. Practitioners will likely need to stick to instruction tuning for the time being.
…and it will look a lot like AGI. LLMs can become extremely powerful by leveraging external cognitive assets and choosing actions.
This is Causal Deference, a newsletter about navigating the most important century.

