Loading…

OpenAI o1 System Card

The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought. These advanced reasoning capabilities provide new avenues for improving the safety and robustness of our models. In particular, our models can reason about our safety policies in context when res...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2024-12
Main Authors: OpenAI, Helyar, Alec, Madry, Aleksander, Neitz, Alexander, Tam, Allison, Vallone, Andrea, Duberstein, Andrew, Mishchenko, Andrey, Applebaum, Andy, Jiang, Angela, Nair, Ashvin, Sokolowsky, Benjamin, Barak, Boaz, Baker, Bowen, McKinzie, Brandon, Lugaresi, Camillo, Hudson, Cary, Voss, Chelsea, Koch, Chris, Fischer, Claudia, Chan, Clive, Roberts, Dan, Levy, Daniel, Selsam, Daniel, Robinson, David, Tsipras, Dimitris, Proehl, Elizabeth, Cheung, Enoch, Wallace, Eric, Parascandolo, Giambattista, Salman, Hadi, Bagherinezhad, Hessam, Lightman, Hunter, Sutskever, Ilya, Pachocki, Jakub, Lennon, James, Feng, Jiacheng, Tang, Jie, Yu, Jieqi, Hallman, John, Ward, Jonathan, Huizinga, Joost, Nguyen, Karina, Shi, Katy, Gu-Lemberg, Keren, Lu, Kevin, Yu, Kevin, Ahmad, Lama, Kuhn, Lorenz, Kondraciuk, Lukas, Kaiser, Lukasz, Boyd, Madelaine, Joglekar, Manas, Chen, Mark, Tintor, Marko, Schwarzer, Max, Shah, Meghan, Yatbaz, Mehmet, Xu, Mengyuan, Glaese, Mia, Lampe, Michael, Wang, Michele, Wang, Miles, Wang, Mingxuan, Rohaninejad, Mostafa, Chowdhury, Neil, Boiko, Oleg, Murk, Oleg, Ashbourne, Paul, Zhokhov, Peter, Lin, Randall, Leike, Reimar, Roshan, James, Greene, Ryan, Toizer, Sam, Miserendino, Samuel, Zhao, Shengjia, Santurkar, Shibani, Zhang, Shuyuan, Fu, Siyuan, Papay, Spencer, Lin, Steph, Suvansh Sanjeev, Clark, Aidan, Taylor, Gordon, Sanders, Ted, Sottiaux, Thibault, Degry, Thomas, Dimson, Thomas, Zheng, Tianhao, Garipov, Timur, Eloundou, Tyna, Qi, Valerie, Kosaraju, Vineet, Monaco, Vinnie, Zheng, Weiyi, Lu, Yinghai, Cha, Young, Wang, Yunyun, Shao, Zheng
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought. These advanced reasoning capabilities provide new avenues for improving the safety and robustness of our models. In particular, our models can reason about our safety policies in context when responding to potentially unsafe prompts, through deliberative alignment. This leads to state-of-the-art performance on certain benchmarks for risks such as generating illicit advice, choosing stereotyped responses, and succumbing to known jailbreaks. Training models to incorporate a chain of thought before answering has the potential to unlock substantial benefits, while also increasing potential risks that stem from heightened intelligence. Our results underscore the need for building robust alignment methods, extensively stress-testing their efficacy, and maintaining meticulous risk management protocols. This report outlines the safety work carried out for the OpenAI o1 and OpenAI o1-mini models, including safety evaluations, external red teaming, and Preparedness Framework evaluations.
ISSN:2331-8422