OpenAI comes clean about GPT 3.5

Dec 2, 2022

Instruction tuning is the future, but the delay in launching RLHF via PPO suggests it is finicky in practice. Practitioners will likely need to stick to instruction tuning for the time being.

Read →

2 Comments

Andrew

Dec 2, 2022

Is FeedMe equivalent to just Step 1 of RLHF or am I misunderstanding?

Expand full comment

Reply (1)

John McDonnell

Dec 3, 2022

I believe so, but what confused me is that would make it the same as Flan with different input data (which maybe it is?)

Expand full comment

Causal Deference

OpenAI comes clean about GPT 3.5