In this video, we recount an incident that occurred at OpenAI while researchers were trying to finetune GPT-2 to be as helpful and ethical as possible. It’s narrated that inadvertently flipping a single minus sign led GPT-2 to become the embodiment of a well-known cardinal sin.
#ai #aisafety #alignment.
▀▀▀▀▀▀▀▀▀SOURCES \& READINGS▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
OpenAI blog post: https://openai.com/research/fine-tuni…
OpenAI paper behind the blog post: https://arxiv.org/pdf/1909.08593.pdf.
RLHF explainer on Hugging Face: https://huggingface.co/blog/rlhf.
RLHF explainer on aisafety.info https://aisafety.info/?state=88FN_904…
Concrete Problems in AI Safety, by @RobertMilesAI: • Concrete Problems in AI Safety.
▀▀▀▀▀▀▀▀▀PATREON, MEMBERSHIP, KO-FI▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🟠 Patreon: / rationalanimations.
Comments are closed.