Controlling LLM Language Output with llama.cpp and GBNF
TLDR
Use llama.cpp and GBNF (GGML BNF) to precisely control the language and structure of LLM-generated text. Enforce specific languages, create valid JSON, or even speak only in emojis!
Summary
This blog post explores how to leverage llama.cpp and GBNF to control the output of large language models (LLMs). We'll cover what GBNF grammars are, how to set up llama.cpp with GBNF support, and how to use them to constrain model outputs. By the end of this post, you'll have a solid understanding of how to use GBNF to create more reliable and predictable LLM applications.
llama.cpp: The Inference Engine
Llama.cpp is a lightweight and efficient C++ library for running large language models (LLMs). It was originally created to run the LLaMA model, but it now supports a wide range of LLMs. Llama.cpp is designed to be easy to use and deploy, making it a popular choice for running LLMs on various hardware platforms.
Key Features of llama.cpp
- Efficiency: Llama.cpp is highly optimized for performance.
- Cross-Platform: Llama.cpp supports various operating systems and hardware platforms.
- Ease of Use: Llama.cpp is designed to be easy to use, with a simple API.
- GBNF Support: Llama.cpp has built-in support for GBNF grammars.
GBNF Grammar
GBNF (GGML BNF) is a format for defining formal grammars to constrain model outputs in llama.cpp. It allows you to force the model to generate outputs that conform to a specific grammar. For example, you can use it to force the model to generate valid JSON, or speak only in emojis. GBNF grammars are supported in various ways in examples/main
and examples/server
within the llama.cpp repository.
GBNF is primarily used to constrain the output of language models to follow a specific structure or format. This is useful for generating structured data like JSON, code, or data in a specific language.
Understanding GBNF Syntax
GBNF grammars are based on a subset of the Backus-Naur Form (BNF), a notation technique for context-free grammars. A BNF grammar consists of a set of production rules that specify how symbols can be expanded into other symbols. These rules define the valid structure of the output.
The basic syntax is rule ::= expression
.
Here's a simple example of a GBNF grammar for generating integers:
root ::= integer
integer ::= [+-]? [0-9]+
In this grammar:
root
andinteger
are non-terminal symbols.[+-]?
means an optional plus or minus sign.[0-9]+
means one or more digits from 0 to 9.
This grammar specifies that an integer
can be an optional sign followed by one or more digits.
Operators
GBNF supports several operators:
|
for alternation()
for grouping[]
for grouping{}
for zero or more repetitions""
for literal strings
Comments start with #
.
Use Cases
- JSON Generation: Enforce the generation of valid JSON data.
- Code Generation: Generate code in a specific programming language.
- Dialogue Systems: Control the flow of a conversation by defining a grammar for allowed responses.
- Data Extraction: Extract data from text by defining a grammar for the expected data format.
llama.cpp's GBNF Option Parameters
Llama.cpp provides the following command-line options for using GBNF grammars:
-g
or--grammar
: Specifies the path to the GBNF grammar file.--grammar-alphabetical
: Enables the experimental grammar-alphabetical mode.
Other relevant parameters that might indirectly affect GBNF behavior:
-m
or--model
: Specifies the path to the LLM model file.-p
or--prompt
: Specifies the prompt to use for generating text.-n
or--n-predict
: Specifies the number of tokens to predict.--temp
: Controls the sampling temperature.--top_k
: Reduces the vocabulary to the K most likely tokens at each step.--top_p
: Chooses from the smallest possible set of tokens whose cumulative probability exceeds P.
How to Use GBNF with llama.cpp
Here's a step-by-step guide on how to use GBNF with llama.cpp:
- Create a GBNF grammar file: Define your grammar using the GBNF syntax. Save the file with a
.gbnf
extension (e.g.,json.gbnf
). - Download a LLM model: Download a compatible LLM model file (e.g.,
llama-2-7b.Q4_K_M.gguf
). - Run llama.cpp with the grammar and model: Use the
-g
or--grammar
option to specify the path to the grammar file, and the-m
or--model
option to specify the path to the model file.
For example:
./main -m models/llama-2-7b.Q4_K_M.gguf -g grammars/json.gbnf -p "Generate a JSON object:"
This command will run llama.cpp with the llama-2-7b.Q4_K_M.gguf
model and the json.gbnf
grammar, and prompt the model to generate a JSON object.
Result Comparison with Qwen Model (English, Chinese)
In this section, we'll compare the results of using the Qwen2.5 7B instruction model with and without GBNF. We'll use the task of responding to a greeting in Chinese.
Responding to a Chinese Greeting
Prompt: Say "Hi" in Chinese (你好) to the model.
Without GBNF:
[Image Placeholder: Qwen2.5 7B instruction model output in Chinese without GBNF]
The model responds in Chinese.
With GBNF (English Only):
[Image Placeholder: Qwen2.5 7B instruction model output in English with GBNF enforcing English language]
The model responds in English, even though the prompt was in Chinese.
This demonstrates how GBNF can be used to enforce a specific language, even when the input is in a different language.
GBNF Grammars
English:
root ::= en-char+ ([ \t\n] en-char+)*
en-char ::= letter | digit | punctuation
letter ::= [a-zA-Z]
digit ::= [0-9]
punctuation ::= [!"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~]
Japanese:
root ::= jp-char+ ([ \t\n] jp-char+)*
jp-char ::= hiragana | katakana | punctuation | cjk
hiragana ::= [ぁ-ゟ]
katakana ::= [ァ-ヿ]
punctuation ::= [、-〾]
cjk ::= [一-鿿]
Chinese:
root ::= zh-char+ ([ \t\n] zh-char+)*
zh-char ::= cjk | zh-punctuation | digit | letter
cjk ::= [一-鿿]
zh-punctuation ::= [,。?!;:「」『』()、—…《》〈〉【】、,.;:?!'"\-\[\]\(\)\{\}\<\>\/\\]
digit ::= [0-9]
letter ::= [a-zA-Z]
Korean:
root ::= ko-char+ ([ \t\n] ko-char+)*
ko-char ::= hangul | ko-punctuation | digit | letter | hanja
hangul ::= [가-힣]
hanja ::= [一-鿿]
ko-punctuation ::= [.,!?…‘’“”"":;/\-()\[\]{}<>《》「」『』,.;:?!'"\-\[\]\(\)\{\}\<\>\/\\]
digit ::= [0-9]
letter ::= [a-zA-Z]
Use Cases (American, Japanese, Chinese, Korean)
- American English: Enforce the use of American English spelling and grammar in customer service chatbots. For example, ensure that the chatbot uses "color" instead of "colour." A GBNF grammar could enforce specific vocabulary and spelling rules.
- Japanese: Generate Japanese product descriptions that adhere to specific marketing guidelines and tone. For example, ensure the use of polite language and specific keywords. A GBNF grammar could enforce the use of specific honorifics and sentence structures.
- Chinese: Create Chinese dialogue systems that adhere to specific politeness levels and social contexts. For example, use different greetings and honorifics depending on the relationship between the speakers. A GBNF grammar could enforce the use of specific vocabulary and sentence structures for different social situations.
- Korean: Generate Korean news articles that adhere to specific journalistic style guidelines. For example, ensure the use of formal language and avoid slang or colloquialisms. A GBNF grammar could enforce the use of specific vocabulary and grammatical structures for news reporting.
Conclusion
In conclusion, GBNF provides a powerful mechanism for controlling the language and structure of LLM-generated text in llama.cpp. By defining formal grammars, you can enforce specific languages, create structured data, and build more reliable and predictable LLM applications. While creating comprehensive GBNF grammars for full language control can be complex, the basic principles are straightforward, and the benefits of using GBNF for specific use cases are significant. We encourage you to experiment with GBNF and explore its potential for your own LLM projects.