Home / Technology / AI's Black Box Cracked: Interpretability Now Engineering
AI's Black Box Cracked: Interpretability Now Engineering
23 Feb
Summary
- Guide Labs open-sourced Steerling-8B, an 8B parameter LLM.
- Its novel architecture makes every token traceable to training data.
- This approach shifts AI interpretability from science to engineering.

San Francisco-based Guide Labs has launched Steerling-8B, an 8 billion parameter large language model (LLM) with a focus on interpretability. This new model architecture allows developers to trace every token it produces back to its specific origins within the training data. This breakthrough addresses the long-standing challenge of understanding the internal workings of complex AI systems.
This interpretable architecture is achieved by engineering the model from the ground up, rather than relying on post-hoc analysis. CEO Julius Adebayo explained that this approach eliminates the need for complex "neuroscience on a model." The company's work, which began during Adebayo's PhD at MIT, introduces a concept layer that buckets data into traceable categories, requiring more upfront annotation but yielding a more transparent model.
Guide Labs claims Steerling-8B can achieve 90% of the capability of larger models with less training data. This development is crucial for regulated industries like finance and for scientific applications where understanding AI decision-making is paramount. The company, which raised $9 million in seed funding, plans to develop larger models and offer API access.




