Model Overview

Description

Breeze-7B-Instruct derives from the base model Breeze-7B-Base, making the resulting model amenable to be used as-is for commonly seen tasks.
The current release version of Breeze-7B is v1.0, which has undergone a more refined training process compared to Breeze-7B-v0_1, resulting in significantly improved performance in both English and Traditional Chinese.

Third-Party Community Consideration

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to the Breeze Model card.

License and Terms of use

GOVERNING TERMS: Your use of this API is governed by the NVIDIA API Trial Service Terms of Use; and the use of this model is governed by the NVIDIA AI Foundation Models Community License.

Model Developer: MediaTek Research

Model Release Date: March 5, 2024.

Features

Expanding the vocabulary dictionary size from 32k to 62k to better support Traditional Chinese
8k-token context length
Multi-turn dialogue (without special handling for harmfulness)

Benchmark Performance

The comparison of Breeze-7B-Instruct-v1_0 with other open-source instruction-tuned language models of similar parameter size, known for their good performance in Chinese, is presented here.

Models	# Parameters	↑ MT-Bench-tw (Score)	TMMLU+ (ACC)	Table (ACC)	MT-Bench (Score)	MMLU (ACC)
		TC, Chat	TC, Knowledge	TC, Reasoning	EN, Chat	EN, Knowledge
		0 shot	0 shot	0 shot	0 shot	0 shot
GPT-3.5-Turbo		7.1	43.56	45.14	7.9	67.09
Qwen1.5-7B-Chat	7B	6.4	45.65	34.72	7.6	61.85
Breeze-7B-Instruct-v1_0	7B	6.0	42.67	39.58	7.4	61.73
Mistral-7B-v0.2-Instruct	7B	5.6	34.95	33.33	7.6	59.97
Yi-6B-Chat	6B	5.0	44.79	25.69	6.0	59.45
Taiwan-LLM-13B-v2.0-chat	13B	5.0	29.47	23.61	N/A*	50.50
Taiwan-LLM-7B-v2.1-chat	7B	4.2	28.08	31.25	N/A*	42.72

* Taiwan-LLM models respond to multi-turn questions (English) in Traditional Chinese.

Details on MT-Bench-tw (0 shot): Models	STEM	Extraction	Reasoning	Math	Coding	Roleplay	Writing	Humanities	AVG
GPT-3.5-Turbo	7.8	6.1	5.1	6.4	6.2	8.7	7.4	9.3	7.1
Qwen1.5-7B-Chat	9	5.6	4.7	2.8	3.7	8.0	8.0	9.4	6.4
Breeze-7B-Instruct-v1_0	7.8	5.2	4.2	4.2	4.1	7.6	5.9	9.1	6.0
Mistral-7B-v0.2-Instruct	6.9	4.6	4.3	3.3	4.4	7.2	6.2	7.8	5.6
Yi-6B-Chat	7.3	2.7	3.1	3.3	2.3	7.2	5.2	8.8	5.0
Taiwan-LLM-13B-v2.0-chat	6.1	3.4	4.1	2.3	3.1	7.4	6.6	6.8	5.0
Taiwan-LLM-7B-v2.1-chat	5.2	2.6	2.3	1.2	3.4	6.6	5.7	6.8	4.2

Details on TMMLU+ (0 shot): Model	STEM	Social Science	Humanities	Other	AVG
GPT-3.5-Turbo	41.58	48.52	40.96	43.18	43.56
Qwen1.5-7B-Chat	41.48	51.66	44.05	45.40	45.65
Breeze-7B-Instruct-v1_0	36.46	48.38	45.11	40.75	42.67
Mistral-7B-v0.2-Instruct	32.79	38.05	34.89	34.04	34.94
Yi-6B-Chat	37.80	51.74	45.36	44.25	44.79
Taiwan-LLM-13B-v2.0-chat	27.74	33.69	27.03	29.43	29.47
Taiwan-LLM-7B-v2.1-chat	25.58	31.76	27.36	27.61	28.08

Model Architecture

Architecture Type: Causal decoder-only transformer language model
Network Architecture: Mistral7b

Input

Input Type: Text
Input Format: String
Input Parameters: max_tokens, temperature, top_p, stop, frequency_penalty, presence_penalty, seed

Output

Output Type: Text
Output Format: String

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here.

Software Integration:

Supported Hardware Platform(s): Lovelace

[Preferred/Supported] Operating System(s):

Linux

Model Version

Breeze-7B-Instruct-v1_0

Inference

Engine: Triton + TensorRT-LLM

Test Hardware: L40