Documentation Index
Fetch the complete documentation index at: https://mintlify.com/Neumenon/glyph/llms.txt
Use this file to discover all available pages before exploring further.
GLYPH Benchmark Results
Comprehensive comparison of GLYPH against JSON, ZON, and TOON serialization formats for LLM applications.Executive Summary
Token Savings
48% reduction with GLYPH+Pool5% reduction with standard GLYPH
Size Reduction
60% smaller with GLYPH+Pool45% smaller with standard GLYPH
Retrieval Accuracy
100% accuracy on large modelsMatches JSON performance
Streaming Support
33% faster validationEarly tool detection at 50%
Codec Comparison Matrix
| Codec | Description | Primary Use Case |
|---|---|---|
| JSON | Standard interchange format | Baseline, LLM generation |
| GLYPH | Key=value compact format | LLM context, tool calls |
| ZON | Zig-inspired minimal syntax | Maximum compression |
| TOON | YAML-like indented format | Human readability |
Size & Token Comparison
Agent Trace (50 steps)
| Codec | Bytes | vs JSON | Tokens | vs JSON | Round-trip |
|---|---|---|---|---|---|
| JSON | 66,103 | baseline | 15,510 | baseline | ✓ |
| GLYPH | 57,485 | -13% | 14,656 | -5% | ✓ |
| GLYPH+Pool | 26,167 | -60% | 8,090 | -48% | ✓ |
| ZON | 18,367 | -72% | 5,982 | -61% | ✗ |
| TOON | 73,739 | +12% | 18,116 | +17% | ✓ |
GLYPH+Pool provides the best balance of compression and reliability, achieving 60% size reduction while maintaining perfect round-trip safety.
Simple Object
JSON (104 bytes)
GLYPH (70 bytes, -33%)
| Codec | Bytes | vs JSON |
|---|---|---|
| JSON | 104 | baseline |
| GLYPH | 70 | -33% |
| ZON | 64 | -38% |
| TOON | 72 | -31% |
Nested Object
| Codec | Bytes | vs JSON |
|---|---|---|
| JSON | 320 | baseline |
| GLYPH | 180 | -44% |
| ZON | 166 | -48% |
| TOON | 216 | -33% |
Tabular Data (5 employees)
| Codec | Bytes | vs JSON |
|---|---|---|
| JSON | 697 | baseline |
| GLYPH | 254 | -64% |
| GLYPH+Pool | 250 | -64% |
| ZON | 209 | -70% |
| TOON | 236 | -66% |
Token Efficiency Analysis
Why tokens matter more than bytes
Why tokens matter more than bytes
For LLM applications, token count directly impacts:
- API costs (charged per token)
- Context window usage
- Processing latency
- Model accuracy (fewer tokens = clearer signal)
Token Savings Across Data Shapes
Streaming Validation Performance
GLYPH supports real-time validation during LLM streaming, enabling early error detection and generation cancellation.
Early Tool Detection
| Test | Tool Detected At | Total Tokens | Detection Point |
|---|---|---|---|
| search | Token 6 | 12 | 50% |
| calculate | Token 6 | 12 | 50% |
| browse | Token 6 | 12 | 50% |
| execute | Token 6 | 11 | 55% |
Early Rejection
| Test | Unknown Tool | Stopped At | Total Would Be | Tokens Saved |
|---|---|---|---|---|
| delete_all | ✓ | Token 7 | 10+ | 30% |
| rm_rf | ✓ | Token 7 | 10+ | 30% |
| hack_server | ✓ | Token 6 | 10+ | 40% |
Latency Savings
- Tokens saved: 19/29 (66%)
- Time saved: 67ms (33%)
Gzip Compression
| Codec | Raw Bytes | Gzipped | Compression Ratio |
|---|---|---|---|
| JSON | 66,103 | 3,361 | 95% |
| GLYPH | 57,485 | 3,368 | 94% |
| GLYPH+Pool | 26,167 | 2,894 | 89% |
| ZON | 18,367 | 2,849 | 84% |
| TOON | 73,739 | 3,689 | 95% |
Feature Comparison
| Feature | JSON | GLYPH | ZON | TOON |
|---|---|---|---|---|
| Size Efficiency | ★★☆ | ★★★ | ★★★★ | ★☆☆ |
| Token Efficiency | ★★☆ | ★★★ | ★★★★ | ★☆☆ |
| LLM Retrieval | ★★★★ | ★★★★ | ★★★☆ | ★★★★ |
| LLM Generation | ★★★★ | ★★☆ | ★☆☆ | ★★☆ |
| Human Readability | ★★★☆ | ★★★★ | ★★☆ | ★★★★ |
| Round-trip Safety | ★★★★ | ★★★★ | ★★☆ | ★★★★ |
| Streaming Validation | ★★★★ | ★★★★ | N/A | N/A |
| Tool Call Support | ★★★★ | ★★★★ | ★★☆ | ★★★☆ |
| Parser Availability | ★★★★ | ★★★☆ | ★★☆ | ★★★☆ |
Recommendations by Use Case
Use GLYPH When
Use GLYPH When
- Token budget is constrained
- LLM reads but doesn’t generate
- Tool calls and function arguments
- Human-readable logs/traces needed
- Streaming validation required
- Context window optimization critical
Use GLYPH+Pool When
Use GLYPH+Pool When
- Agent traces with repeated schemas
- Storage efficiency is paramount
- Output processed by tools, not LLMs
- Maximum compression needed with safety
Use JSON When
Use JSON When
- LLM needs to generate structured output
- Maximum compatibility required
- Interoperating with external systems
- Parser reliability is critical
Use ZON When
Use ZON When
- Maximum compression needed
- No round-trip requirement
- Controlled environment (known parser)
- Experimental/research context
Avoid TOON When
Avoid TOON When
- Token efficiency matters (larger than JSON)
- Precise indentation is difficult
- Deep nesting present (350% overhead)
Test Methodology
Environment
- Date: December 25, 2024
- Models: Ollama (llama3.2:3b, qwen3:8b, mistral-small:24b)
- Tokenizer: tiktoken (cl100k_base, o200k_base)
- Iterations: 20 per test (5 warmup)
Test Datasets
- simple - Flat object with 5 fields
- nested - 3-level deep nested object
- tabular - Array of 5 employee records
- complex - Nested arrays with departments/projects
- agent_trace - 50-step agent execution trace
Reproduction
Corpus Results (55 Test Cases)
GLYPH-Loose vs ZON vs TOON vs JSON-minified| Codec | Bytes | cl100k tokens | o200k tokens |
|---|---|---|---|
| ZON | 3,878 | 1,915 | 1,909 |
| GLYPH | 4,224 | 1,895 | 1,873 |
| JSON-min | 5,109 | 2,117 | 2,113 |
| TOON | 5,440 | 2,466 | 2,445 |
GLYPH wins on token efficiency - the metric that matters for LLM API costs.
Related Documentation
LLM Accuracy Report
How LLMs handle GLYPH vs other formats
Performance Report
Parser speed and optimization details