Small Language Models on the Edge for Real-World Agentic Systems in Industry
Abstract
Large Language Models face significant deployment challenges in enterprise environments, including high computational costs, data privacy concerns, and network dependencies. This paper presents a framework for deploying Small Language Models (SLMs) with fewer than 7 billion parameters on edge devices, using agentic architectures to overcome capacity limitations. We introduce three key contributions: (1) a multi-agent benchmarking framework employing role-based evaluation to reduce bias, (2) a three-phase task planning pipeline that decomposes planning into subtask identification, dependency reasoning, and schema-constrained generation, and (3) real-world implementations achieving 3-4x latency improvements over cloud services. Our evaluation demonstrates that models like Phi-4 achieve CEFR C1-level translation quality and 0.883 G-Eval summarization scores on commodity hardware. Through WebLLM browser-based inference and local hosting, we show that SLMs effectively serve enterprise needs in privacy-sensitive, bandwidth-constrained, or air-gapped environments, representing a viable alternative prioritizing data sovereignty and cost efficiency.
Type
Publication
Southern African Conference for Artificial Intelligence