Optimizing Data for LLMs: An Introduction to TOON

As an AI architect working on enterprise-scale LLM systems, I've learned that the most expensive problems are often the least visible ones. Not model accuracy. Not latency. Not even infrastructure.

It's how data is represented when it flows into the model.

This usually comes up indirectly. A team notices their LLM costs climbing faster than expected. They assume it's the model choice, or the prompt length, or the output verbosity. When we dig in, the real issue is often much simpler—and much more mundane. It's the JSON payloads being sent to the model, repeated thousands of times a day, quietly consuming tokens.

JSON is familiar, readable, and works extremely well for APIs built for humans and services. But LLMs don't read data the way humans do. They don't see structure—they see tokens. Every brace, quote, colon, and repeated field name costs money.

The Architectural Challenge

When we architect traditional distributed systems, we obsess over serialization formats. We debate Protocol Buffers vs. JSON vs. MessagePack. We benchmark compression algorithms. We optimize for network bandwidth and parsing speed.

But when architecting LLM-integrated systems, we face a different constraint: tokens. The model doesn't see bytes—it sees tokens. And tokens are what you pay for.

This creates an interesting architectural tension. JSON is the lingua franca of modern APIs, and LLMs understand it well. But JSON is verbose by design. Its human-readability is a feature—until that readability costs you money at scale.

Understanding LLM Tokenization

When you send text to an LLM, it doesn't process characters directly. The model's tokenizer breaks your input into tokens—subword units that the model was trained on.

Most modern LLMs use variations of Byte-Pair Encoding (BPE) or SentencePiece tokenization. The key characteristics:

Common words become single tokens: "the", "is", "data" = 1 token each
Longer or uncommon words split into subwords: "optimization" = "optim" + "ization" = 2 tokens
Technical terms often fragment: "authentication" = 3-4 tokens
Punctuation and syntax tokenize individually: braces, quotes, colons = 1 token each

This last point is critical. JSON syntax itself is a token consumer. Every structural element—every brace, bracket, quote, colon, and comma—is a token. In production systems processing millions of tokens per day, this overhead becomes a material cost driver.

Token Economics: A Systems Perspective

From an architectural standpoint, tokens are your unit of cost, latency, and capacity:

Cost: You pay per token, both input and output. At enterprise scale, this becomes a significant operational expense.
Latency: Token count affects response time. More input tokens mean more processing time before the model starts generating.
Context Window: Every model has a maximum context length. Tokens spent on verbose data formatting are tokens unavailable for actual content.

When I design LLM architectures, I think of tokens as a scarce resource to be budgeted carefully—not unlike memory or compute in traditional systems.

Token Pricing Across the LLM Landscape

Understanding the cost structure is essential for architectural decision-making.

OpenAI:

GPT-4o: Input 2.50 per million / Output 10.00 per million
GPT-4o-mini: Input 0.15 per million / Output 0.60 per million

Anthropic:

Claude 3.5 Sonnet: Input 3.00 per million / Output 15.00 per million
Claude 3.5 Haiku: Input 0.80 per million / Output 4.00 per million

Google:

Gemini 1.5 Pro: Input 1.25 per million / Output 5.00 per million
Gemini 1.5 Flash: Input 0.075 per million / Output 0.30 per million

Introducing TOON: Token Optimized Object Notation

Standard optimization patterns—minification, schema pruning, attribute name compression—optimize within the JSON paradigm. TOON takes a different approach: it changes the paradigm entirely for LLM communication.

The Core Insight

When you send an array of JSON objects, you repeat the attribute names for every single record:

[
  {"customerId": "C12345", "firstName": "John", "status": "active"},
  {"customerId": "C12346", "firstName": "Jane", "status": "active"},
  {"customerId": "C12347", "firstName": "Bob", "status": "inactive"}
]

The strings "customerId", "firstName", and "status" appear three times each. Every occurrence costs tokens.

The TOON Solution

TOON separates the schema from the data, declaring attribute names once and representing data as position-delimited values:

@schema:customerId,firstName,status
C12345|John|active
C12346|Jane|active
C12347|Bob|inactive

Token comparison:

JSON representation: approximately 75 tokens
TOON representation: approximately 30 tokens
Savings: 60%

Why LLMs Understand TOON

This isn't a hack—it aligns directly with how modern LLMs learn and apply structured patterns. LLMs are excellent at:

Pattern recognition: Once they see the schema line, they apply it to subsequent rows
Positional understanding: They track which value corresponds to which position
Delimiter parsing: Simple delimiters like the pipe character are unambiguous single tokens

In my testing across GPT-4, Claude, and Gemini models, TOON representations are parsed correctly and reliably.

Getting Started with TOON

I've open-sourced libraries for both Python and .NET:

Python (PyPI):

pip install toon-token-optimizer

.NET (NuGet):

dotnet add package Toon.TokenOptimizer

Quick Example

from toon_converter import json_to_toon

customers = [
    {"name": "John", "age": 30, "city": "NYC"},
    {"name": "Jane", "age": 25, "city": "LA"},
]

toon_data = json_to_toon(customers)
# @schema:name,age,city
# John|30|NYC
# Jane|25|LA

When to Use TOON

Use TOON When:

Processing hundreds or thousands of records per request
Records share the same schema
Token costs are a meaningful part of your operational budget
Building RAG systems with structured context data

Consider Alternatives When:

Under 10 records (schema line overhead not justified)
Objects with varying schemas
When users see raw prompts/responses
When you need the model to return structured JSON

Conclusion

Token usage is a first-class constraint in LLM-integrated systems—just like memory, compute, or network bandwidth in traditional architectures.

TOON reflects a broader principle: when the cost model changes, long-standing design assumptions should be re-examined. What made sense for human-facing APIs is not always optimal for machine-facing ones.

The libraries are available on GitHub and PyPI. Both include full documentation, 75+ unit tests, and handle edge cases like nested objects, arrays, and special characters.