COLING 2025 Tutorial
Safety Issues for Generative AI

1MBZUAI 2LibrAI 3King Abdulaziz University 4Tsinghua University 5University of Auckland 6The University of Melbourne

About this tutorial

Time: Monday, January 20, 2025
Location: Capital Suite 7

This tutorial will provide a fresh and comprehensive view of AI safety, examining issues such as harmful content, deceptive/persuasive model behaviors, and the possibility of "model consciousness." We will cover both theoretical and practical aspects of large language models (LLMs), multi-modal systems, and agentic AI, illustrating how to identify and address vulnerabilities—from common jailbreak attacks to complex, autonomous decision-making scenarios. Attendees will learn cutting-edge attack and defense techniques, and will explore emerging research directions that balance AI innovation with robust safety strategies. This tutorial is geared toward AI researchers, developers, and security professionals who aim to stay ahead of evolving threats and ensure responsible development of next-generation AI systems.

Schedule

Time Section Topics
9:00—9:15 Introduction to AI Safety and Risks • Overview of harmful content, societal/ethical risks
• Deceptive behavior, persuasive behavior
• Possibility of "model consciousness" or self-awareness
9:15—10:00 LLM Safety, Jailbreak, and Harmful Content • Common attack vectors and jailbreak techniques
• Real-world examples of harmful or disallowed outputs
• Defense approaches at training-time and inference-time
10:00—10:30 Multi-modal and Agentic AI Safety • Vulnerabilities in text, image, audio, and video models
• Risks of agentic AI systems taking autonomous actions
• Red-teaming methods for complex, multi-modal systems
10:30—11:00 Break
11:00—11:30 Safe AI • Handling deception, persuasion, and emergent behaviors
• Ongoing debates around model "consciousness" and awareness
11:30—12:00 Defense • Layered strategies (training data, architecture, prompt engineering)
• Real-time detection, filtering, and adversarial testing

Reading List

We've compiled a comprehensive reading list of papers related to AI safety. If you think that we're missing essential papers, please submit a pull request.


Tutorial Slides

BibTeX

@misc{lin2024achillesheelsurveyred,
      title={Against The Achilles' Heel: A Survey on Red Teaming for Generative Models}, 
      author={Lizhi Lin and Honglin Mu and Zenan Zhai and Minghan Wang and Yuxia Wang and Renxi Wang and Junjie Gao and Yixuan Zhang and Wanxiang Che and Timothy Baldwin and Xudong Han and Haonan Li},
      year={2024},
      eprint={2404.00629},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2404.00629}, 
}