SGLang

SGLang
SGLang
Developer	LMSYS
Initial release	January 17, 2024 (2024-01-17)
Written in	Python, Rust, CUDA, C++
Type	Large language model inference engine
License	Apache License 2.0
Website	sglang.io
Repository	github.com/sgl-project/sglang

SGLang (short for Structured Generation Language) is an open-source framework for programming and serving large language models and multimodal models. It was introduced by researchers affiliated with LMSYS¹ and other institutions as a system combining a Python-embedded language for structured generation with a runtime for high-throughput inference.²³⁴

The project is designed for low latency and high-throughput inference workloads, and its documentation describes support for features such as structured outputs, speculative decoding, continuous batching, quantization, and compatibility with OpenAI-style APIs.⁵

History

SGLang was publicly introduced in January 2024 by researchers affiliated with Stanford, UC Berkeley, Texas A&M, and Shanghai Jiao Tong University.² Its academic description later appeared in the proceedings of NeurIPS 2024.³ In January 2026, TechCrunch reported that contributors associated with the project had formed the startup RadixArk to commercialize services around SGLang while continuing its open-source development.⁶⁷

Architecture

According to the NeurIPS paper, SGLang consists of two main components: a front-end language embedded in Python and a back-end runtime for executing language model programs efficiently.³ The front end provides primitives for generation, selection, and parallel control flow, while the runtime uses a set of optimizations intended to reduce repeated computation and improve throughput.³

Among the techniques described by the project are RadixAttention for reusing key–value cache state across multiple generation calls, compressed finite-state machines for faster constrained decoding, and speculative execution for API-based models.³ The current documentation also describes support for serving both language models and multimodal models across a range of hardware back ends.⁵

References

"LMSYS". GitHub. GitHub, Inc. Retrieved April 22, 2026.
"Fast and Expressive LLM Inference with RadixAttention and SGLang". LMSYS Org. January 17, 2024. Retrieved April 19, 2026.
Zheng, Lianmin; Yin, Liangsheng; Xie, Zhiqiang; Sun, Chuyue; Huang, Jeff; Yu, Cody Hao; Cao, Shiyi; Kozyrakis, Christos; Stoica, Ion; Gonzalez, Joseph E.; Barrett, Clark; Sheng, Ying (2024). SGLang: Efficient Execution of Structured Language Model Programs (PDF). Advances in Neural Information Processing Systems 37. Retrieved April 19, 2026.
"SGLang". UC Berkeley Sky Computing Lab. April 25, 2024. Retrieved April 22, 2026.
"SGLang Documentation". SGLang. Retrieved April 19, 2026.
Hu, Krystal (January 21, 2026). "Sources: Project SGLang spins out as RadixArk with $400M valuation as inference market explodes". TechCrunch. Retrieved April 19, 2026.
R, Vignesh (January 23, 2026). "From Berkeley lab to $400M startup: SGLang becomes RadixArk". TFN. Retrieved April 22, 2026.

External links

[1] "LMSYS". GitHub. GitHub, Inc. Retrieved April 22, 2026.

[lmsys-launch-2] "Fast and Expressive LLM Inference with RadixAttention and SGLang". LMSYS Org. January 17, 2024. Retrieved April 19, 2026.

[neurips-3] Zheng, Lianmin; Yin, Liangsheng; Xie, Zhiqiang; Sun, Chuyue; Huang, Jeff; Yu, Cody Hao; Cao, Shiyi; Kozyrakis, Christos; Stoica, Ion; Gonzalez, Joseph E.; Barrett, Clark; Sheng, Ying (2024). SGLang: Efficient Execution of Structured Language Model Programs (PDF). Advances in Neural Information Processing Systems 37. Retrieved April 19, 2026.

[4] "SGLang". UC Berkeley Sky Computing Lab. April 25, 2024. Retrieved April 22, 2026.

[docs-5] "SGLang Documentation". SGLang. Retrieved April 19, 2026.

[techcrunch-6] Hu, Krystal (January 21, 2026). "Sources: Project SGLang spins out as RadixArk with $400M valuation as inference market explodes". TechCrunch. Retrieved April 19, 2026.

[7] R, Vignesh (January 23, 2026). "From Berkeley lab to $400M startup: SGLang becomes RadixArk". TFN. Retrieved April 22, 2026.

1

2

3

4

5

6

7

History

Architecture

See also

References

External links