What is Microsoft's On-Policy Context Distillation (OPCD)?

OPCD is a new training framework developed by Microsoft researchers that helps integrate an application's specific knowledge and preferences directly into AI models, reducing latency and costs.

How does OPCD improve AI model performance?

OPCD bakes complex instructions and knowledge into model parameters, enabling models to perform better on specialized tasks without needing lengthy prompts, and it helps models learn from their own generation processes.

What are the benefits of using OPCD for enterprises?

Enterprises benefit from significantly reduced inference latency and per-query costs, alongside improved model reliability and specialized capabilities without losing general intelligence.

Home / Technology / Microsoft's OPCD: Smarter AI, Faster Apps

Microsoft's OPCD: Smarter AI, Faster Apps

28 Feb

•

Summary

New AI training framework bakes knowledge into models.
OPCD reduces inference latency and per-query costs.
Framework improves model performance for specific tasks.

Microsoft's OPCD: Smarter AI, Faster Apps

Enterprises deploying large language models (LLMs) often face challenges with long system prompts that increase inference latency and costs. Microsoft researchers have developed On-Policy Context Distillation (OPCD), a novel training framework that integrates essential company knowledge and application-specific instructions directly into AI models.

This method trains models to internalize information, compressing complex instructions into their parameters. Unlike older techniques that suffer from exposure bias and mode-covering behaviors, OPCD uses the model's own generation trajectories and reverse KL divergence for training. This on-policy approach allows the student model to learn from its mistakes, promoting mode-seeking behavior and reducing hallucinations.