Optimizing Data for LLMs: An Introduction to TOON
How Token Optimized Object Notation reduces LLM token consumption by 40-60% for enterprise workloads. A practical guide to data representation optimization.
As an AI architect working on enterprise-scale LLM systems, I've learned that the most expensive problems are often the least visible ones. Not model accuracy. Not latency. Not even infrastructure.
It's how data is represented when it flows into the model.
This usually comes up indirectly. A team notices their LLM costs climbing faster than expected. They assume it's the model choice, or the prompt length, or the output verbosity. When we dig in, the real issue is often much simpler—and much more mundane. It's the JSON payloads being sent to the model, repeated thousands of times a day, quietly consuming tokens.
JSON is familiar, readable, and works extremely well for APIs built for humans and services. But LLMs don't read data the way humans do. They don't see structure—they see tokens. Every brace, quote, colon, and repeated field name costs money.
The Architectural Challenge
When we architect traditional distributed systems, we obsess over serialization formats. We debate Protocol Buffers vs. JSON vs. MessagePack. We benchmark compression algorithms. We optimize for network bandwidth and parsing speed.
But when architecting LLM-integrated systems, we face a different constraint: tokens. The model doesn't see bytes—it sees tokens. And tokens are what you pay for.
This creates an interesting architectural tension. JSON is the lingua franca of modern APIs, and LLMs understand it well. But JSON is verbose by design. Its human-readability is a feature—until that readability costs you money at scale.
Understanding LLM Tokenization
When you send text to an LLM, it doesn't process characters directly. The model's tokenizer breaks your input into tokens—subword units that the model was trained on.
Most modern LLMs use variations of Byte-Pair Encoding (BPE) or SentencePiece tokenization. The key characteristics:
- Common words become single tokens: "the", "is", "data" = 1 token each
- Longer or uncommon words split into subwords: "optimization" = "optim" + "ization" = 2 tokens
- Technical terms often fragment: "authentication" = 3-4 tokens
- Punctuation and syntax tokenize individually: braces, quotes, colons = 1 token each
This last point is critical. JSON syntax itself is a token consumer. Every structural element—every brace, bracket, quote, colon, and comma—is a token. In production systems processing millions of tokens per day, this overhead becomes a material cost driver.
Token Economics: A Systems Perspective
From an architectural standpoint, tokens are your unit of cost, latency, and capacity:
- Cost: You pay per token, both input and output. At enterprise scale, this becomes a significant operational expense.
- Latency: Token count affects response time. More input tokens mean more processing time before the model starts generating.
- Context Window: Every model has a maximum context length. Tokens spent on verbose data formatting are tokens unavailable for actual content.
When I design LLM architectures, I think of tokens as a scarce resource to be budgeted carefully—not unlike memory or compute in traditional systems.
Token Pricing Across the LLM Landscape
Understanding the cost structure is essential for architectural decision-making.
OpenAI:
- GPT-4o: Input 2.50 per million / Output 10.00 per million
- GPT-4o-mini: Input 0.15 per million / Output 0.60 per million
Anthropic:
- Claude 3.5 Sonnet: Input 3.00 per million / Output 15.00 per million
- Claude 3.5 Haiku: Input 0.80 per million / Output 4.00 per million
Google:
- Gemini 1.5 Pro: Input 1.25 per million / Output 5.00 per million
- Gemini 1.5 Flash: Input 0.075 per million / Output 0.30 per million
Introducing TOON: Token Optimized Object Notation
Standard optimization patterns—minification, schema pruning, attribute name compression—optimize within the JSON paradigm. TOON takes a different approach: it changes the paradigm entirely for LLM communication.
The Core Insight
When you send an array of JSON objects, you repeat the attribute names for every single record:
[
{"customerId": "C12345", "firstName": "John", "status": "active"},
{"customerId": "C12346", "firstName": "Jane", "status": "active"},
{"customerId": "C12347", "firstName": "Bob", "status": "inactive"}
]
The strings "customerId", "firstName", and "status" appear three times each. Every occurrence costs tokens.
The TOON Solution
TOON separates the schema from the data, declaring attribute names once and representing data as position-delimited values:
@schema:customerId,firstName,status
C12345|John|active
C12346|Jane|active
C12347|Bob|inactive
Token comparison:
- JSON representation: approximately 75 tokens
- TOON representation: approximately 30 tokens
- Savings: 60%
Why LLMs Understand TOON
This isn't a hack—it aligns directly with how modern LLMs learn and apply structured patterns. LLMs are excellent at:
- Pattern recognition: Once they see the schema line, they apply it to subsequent rows
- Positional understanding: They track which value corresponds to which position
- Delimiter parsing: Simple delimiters like the pipe character are unambiguous single tokens
In my testing across GPT-4, Claude, and Gemini models, TOON representations are parsed correctly and reliably.
Getting Started with TOON
I've open-sourced libraries for both Python and .NET:
Python (PyPI):
pip install toon-token-optimizer
.NET (NuGet):
dotnet add package Toon.TokenOptimizer
Quick Example
from toon_converter import json_to_toon
customers = [
{"name": "John", "age": 30, "city": "NYC"},
{"name": "Jane", "age": 25, "city": "LA"},
]
toon_data = json_to_toon(customers)
# @schema:name,age,city
# John|30|NYC
# Jane|25|LA
When to Use TOON
Use TOON When:
- Processing hundreds or thousands of records per request
- Records share the same schema
- Token costs are a meaningful part of your operational budget
- Building RAG systems with structured context data
Consider Alternatives When:
- Under 10 records (schema line overhead not justified)
- Objects with varying schemas
- When users see raw prompts/responses
- When you need the model to return structured JSON
Conclusion
Token usage is a first-class constraint in LLM-integrated systems—just like memory, compute, or network bandwidth in traditional architectures.
TOON reflects a broader principle: when the cost model changes, long-standing design assumptions should be re-examined. What made sense for human-facing APIs is not always optimal for machine-facing ones.
The libraries are available on GitHub and PyPI. Both include full documentation, 75+ unit tests, and handle edge cases like nested objects, arrays, and special characters.