Teuken-7B-instruct-commercial-v0.4 Overview

Description

Teuken-7B-instruct-commercial-v0.4 generates text as an instruction-tuned 7B-parameter multilingual large language model (LLM) pre-trained on 4 trillion tokens across all 24 official European languages. This model is specifically designed to provide more stable and culturally relevant results across these languages compared to models primarily focused on English.

This model is ready for commercial use.

Third-Party Community Consideration

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case. See link to Non-NVIDIA openGPT-X/Teuken-7B-instruct-commercial-v0.4

GOVERNING TERMS: This trial service is governed by the NVIDIA API Trial Terms of Service. The model is governed by the NVIDIA Community Model License Agreement; ADDITIONAL INFORMATION: Apache License Version 2.0.

You are responsible for ensuring that your use of NVIDIA AI Foundation Models complies with all applicable laws.

Deployment Geography

Global

Use Case

Teuken-7B-instruct-commercial-v0.4 is designed for a range of multilingual natural language processing tasks, with a strong emphasis on serving both commercial and research needs across the 24 official languages of the European Union.
Its primary use case is to provide a powerful and culturally-aware language model that performs reliably across European languages, which are often underrepresented in other large language models.

Specific use cases include:

Multilingual Chatbots and Virtual Assistants: Ideal for developing conversational AI for international customer service and enabling businesses to communicate with customers in their native language.
Text Generation and Content Creation: The model can generate articles, marketing copy, reports, and other written content in any of the 24 official European Union (EU) languages.
Document Summarization: It can be used to create concise summaries of long documents, reports, or articles, saving time and effort in multilingual environments.
Information Extraction: The model can identify and pull specific information from unstructured text across various languages, which is useful for data analysis and business intelligence.
Retrieval-Augmented Generation (RAG): It is well-suited for integration into RAG systems, where it can query a knowledge base to provide informed answers in multiple languages.

Limitations

The developers explicitly state that this model is not intended for tasks that require mathematical reasoning or code generation. Its core strength lies in its linguistic capabilities across its trained languages.

Release Date

Hugging Face 10/25/2024 via
openGPT-X/Teuken-7B-instruct-commercial-v0.4.

Build.NVIDIA.com 07-25-2025 via link

Model Architecture

Architecture Type: Transformer

Network Architecture: Teuken-7B-Instruct

This model was developed based on openGPT-X/Teuken-7B-base
openGPT-X/Teuken-7B-instruct-commercial-v0.4 · Hugging Face
The model is an instruction-tuned version of its base model, fine-tuned on a high-quality, multilingual dataset using the axolotl framework. Key design choices for the training process focused on optimizing for multilingual performance.

Model Optimization: The training was performed with bfloat16 precision and used the paged_adamw_8bit optimizer.
Hyperparameter Tuning:
- Learning Rate: 2⋅10−5
- Learning Rate Scheduler: Cosine
- Batch Size: 64
- Warmup Steps: 100
Training Parameters:
- Epochs: 2
- Sequence Length: 4096 tokens

Input

Input Type(s): Text

Input Format(s): Strings

Input Parameters: One-Dimensional (1D)

Input Range: [0, 1] (float32) or [0, 255] (uint8, auto-converted)

Other Properties Related to Input: Max Input Tokens: 4,096

Output

Output Type(s):Text

Output Format(s): Strings

Output parameters: One-Dimensional (1D)

Other Properties Related to Output: Max Input Tokens: 4,096

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

Software Integration

Runtime Engine: vLLM, TensorRT

Supported Hardware Microarchitecture Compatibility

NVIDIA Ampere
NVIDIA Blackwell
NVIDIA Hopper
NVIDIA Lovelace

Preferred Operating System(s)

Linux

Model Version(s)

Teuken-7B-instruct-commercial-v0.4

Usage

Prerequisites

The model requires a few libraries that can be installed in your Python environment:

python -m pip install numpy torch huggingface_hub transformers sentencepiece

After installation, here's an example of how to use the model:

As this model is a fine-tuned model, it must be used with the provided prompt template. Using the model without the prompt template is not intended and is not recommended. The prompt template is defined as follows:

user="Hi!"
lang_code = "DE"
system_messages={
            "EN": "A chat between a human and an artificial intelligence assistant."
            " The assistant gives helpful and polite answers to the human's questions.",
            "DE": "Ein Gespräch zwischen einem Menschen und einem Assistenten mit künstlicher Intelligenz."
            " Der Assistent gibt hilfreiche und höfliche Antworten auf die Fragen des Menschen.",
        }
 
prompt = f"System: {system_messages[lang_code]}\nUser: {user}\nAssistant:"

The prompt template is also directly integrated in the Tokenizer and can be used as follows:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model_name = "openGPT-X/Teuken-7B-instruct-commercial-v0.4"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)
model = model.to(device).eval()
tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    use_fast=False,
    trust_remote_code=True,
)
messages = [{"role": "User", "content": "Wer bist du?"}]
prompt_ids = tokenizer.apply_chat_template(messages, chat_template="DE", tokenize=True, add_generation_prompt=True, return_tensors="pt")
prediction = model.generate(
    prompt_ids.to(model.device),
    max_length=512,
    do_sample=True,
    top_k=50,
    top_p=0.95,
    temperature=0.7,
    num_return_sequences=1,
)
prediction_text = tokenizer.decode(prediction[0].tolist())
print(prediction_text)

This example demonstrates how to load the model and tokenizer, prepare input, generate text, and print the result.

Running the Model with vLLM

Starting the vLLM Server:

vllm serve openGPT-X/Teuken-7B-instruct-commercial-v0.4 --trust-remote-code

Use Chat API with vLLM and pass the language of the Chat-Template as extra body:

from openai import OpenAI

client = OpenAI(
    api_key="EMPTY",
    base_url="http://localhost:8000/v1",
)
completion = client.chat.completions.create(
    model="openGPT-X/Teuken-7B-instruct-commercial-v0.4",
    messages=[{"role": "User", "content": "Hallo"}],
    extra_body={"chat_template":"DE"}
)
print(f"Assistant: {completion]")

The default language of the Chat-Template can also be set when starting the vLLM Server. For this create a new file with the name lang and the content DE and start the vLLM Server as follows:

vllm serve openGPT-X/Teuken-7B-instruct-commercial-v0.4 --trust-remote-code --chat-template lang

Running the Model with vLLM offline Batched Inference

from vllm import LLM, SamplingParams

sampling_params = SamplingParams(temperature=0.01, max_tokens=1024, stop=[""])
llm = LLM(model="openGPT-X/Teuken-7B-instruct-commercial-v0.4", trust_remote_code=True, dtype="bfloat16") 
outputs = llm.chat(
    messages=[{"role": "User", "content": "Hallo"}], 
    sampling_params=sampling_params, 
    chat_template="DE"
)
print(f"Prompt: {outputs[0].prompt}")
print(f"Assistant: {outputs[0].outputs[0].text}")

Training, Testing, and Evaluation Datasets

Training Dataset:

Link: Undisclosed
Data Collection Method by dataset: Hybrid: Human, Automated
Labeling Method by dataset: Hybrid: Human, Automated
Properties: The base model, Teuken-7B-base-v0.4, was pre-trained on 4 trillion tokens from publicly available sources, with a data cutoff of September 2023. The instruction-tuned model was fine-tuned on a collection of datasets in English, German, and 22 other official European languages.

Testing Dataset

Link: Undisclosed
Data Collection Method by dataset: Hybrid: Human, Automated
Labeling Method by dataset: Hybrid: Human, Automated
Properties: The model was evaluated on translated versions of several established benchmark datasets, including HellaSwag, ARC (AI2 Reasoning Challenge), and TruthfulQA. These benchmarks were translated into 21 official EU languages to assess the model's multilingual capabilities. The evaluation aimed to measure performance on tasks such as commonsense reasoning, question answering, and truthfulness

Evaluation Dataset

Link: Undisclosed
Data Collection Method by dataset: Hybrid: Human, Automated
Labeling Method by dataset: Human
Properties: Undisclosed

Benchmark Results

Results on multilingual benchmarks for 21 European languages with instruction-tuned models:

Model	Avg.	EU21-ARC	EU21-HeSw	EU21-TQA	EU21-MMLU
Meta-Llama-3.1-8B-Instruct	0.563	0.563	0.579	0.532	0.576
Mistral-7B-Instruct-v0.3	0.527	0.530	0.538	0.548	0.491
Salamandra-7B-Instruct	0.543	0.595	0.637	0.482	0.459
Aya-23-8B	0.485	0.475	0.535	0.476	0.455
Occiglot-7B-eu5-Instruct	0.475	0.484	0.519	0.471	0.428
Pharia-1-LLM-7B-C-A	0.417	0.396	0.438	0.469	0.366
Bloomz-7B1	0.358	0.316	0.354	0.461	0.302
Teuken-7B-instruct-commercial-v0.4	0.531	0.569	0.620	0.503	0.430

Inference

Acceleration Engine: vLLM

Test Hardware:

L40s

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Please report security vulnerabilities or NVIDIA AI Concerns here.