Conference: IJCAI 2025
Date: August 16th 9AM – 12:30PM ET
Venue: Palais des congrès Montréal, QC, Canada
The rise of powerful foundation models, particularly large language models (LLMs) built on Transformer architectures, has ushered in a new era of Generative AI, transforming various industries. These models have enabled a wide range of applications, including question answering, customer support, image and video generation, and code completion. However, modern LLMs consist of billions of parameters trained on trillions of tokens, making their development challenging in resource-constrained environments.
This tutorial provides a comprehensive exploration of deep learning training techniques optimized for AI accelerators. These enable faster, memory-efficient, yet robust training in billion-scales of model parameters. We begin with an overview of Transformer architectures, deep learning optimization strategies, and system and hardware-level of techniques. We then discuss system optimization techniques, such as fast attention computation and fault-tolerant training at scale. Leveraging these modern deep learning frameworks, we illustrate the principles of scaling laws that enable the training of LLMs with hundreds of billions of parameters. Next, we delve into low-precision training methods (e.g.,FP8 and FP4), highlighting techniques such as numerical error handling through scaling and stochastic rounding. Finally, we examine fine-tuning approaches, such as low-rank adaptation together with sparsity and quantization, which enables efficient model updates by modifying only a small subset of parameters.
Decoder-only models with self-attention (e.g., LLaMA, Qwen)
Encoder-decoder models with vision encoder and cross-attention
(e.g., LLAVA, LLaMA 4)