Home / Technology / Microsoft's OPCD: Smarter AI, Faster Apps
Microsoft's OPCD: Smarter AI, Faster Apps
28 Feb
Summary
- New AI training framework bakes knowledge into models.
- OPCD reduces inference latency and per-query costs.
- Framework improves model performance for specific tasks.

Enterprises deploying large language models (LLMs) often face challenges with long system prompts that increase inference latency and costs. Microsoft researchers have developed On-Policy Context Distillation (OPCD), a novel training framework that integrates essential company knowledge and application-specific instructions directly into AI models.
This method trains models to internalize information, compressing complex instructions into their parameters. Unlike older techniques that suffer from exposure bias and mode-covering behaviors, OPCD uses the model's own generation trajectories and reverse KL divergence for training. This on-policy approach allows the student model to learn from its mistakes, promoting mode-seeking behavior and reducing hallucinations.
Benchmark results demonstrate OPCD's effectiveness. For experiential knowledge, an 8-billion-parameter model improved from 75.0% to 80.9% on mathematical reasoning. In system prompt distillation, a 3-billion parameter Llama model's accuracy for safety and toxicity classification rose from 30.7% to 83.1%.
OPCD offers a significant advantage by specializing models without causing catastrophic forgetting, maintaining out-of-distribution performance. While not replacing Retrieval-Augmented Generation (RAG) for highly dynamic data, OPCD integrates seamlessly into existing workflows with minimal hardware requirements, such as eight A100 GPUs.
This advancement paves the way for genuinely self-improving models that can continuously adapt to enterprise needs. Core improvements to AI models will shift from training time to test time, with real-world usage driving continuous advancement.




