igenius / italia_10b_instruct_16k

Model Overview

Description:

Italia-10B is a large language model (LLM) that can be used for use-cases in regulated industries—such as financial services, government and heavy industry. It supports multi-lingual single-turn and multi-turn chat formats, with a context length of 16,384 tokens.

The base model was pre-trained on a corpus of approximately 9 trillion tokens drawn from diverse English texts, more than 50 natural languages, and a wide range of coding languages. Next, it went through alignment steps including:

  • Supervised Fine-tuning (SFT)
  • Direct Preference Optimization (DPO)

The pre-training dataset primarily consists of web documents and open-source repositories such as ArXiv, PubMed Central, GitHub and similar sources.

The model supports over 50 languages, with a strong focus on European languages such as German, French, Italian, Spanish, Portuguese, Russian, Romanian, and Polish.

Additionally, the model incorporates specialized sources from domains such as finance and reasoning, drawing from high-quality datasets to enhance its performance in these areas.

This model is for research and development only. For commercial use please follow the Terms of Use

Third-Party Community Consideration

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case

License/Terms of Use:

GOVERNING TERMS: This trial service is governed by the NVIDIA API Trial Terms of Service. Use of this model is governed by the License agreement for Colosseum

Model Architecture:

Architecture Type: Transformer Decoder (auto-regressive language model)

Network Architecture: Italia-10B

Input:

Input Type(s): Text

Input Format: String

Input Parameter(s): 1D

Output:

Output Type(s): Text

Output Format: String

Output Parameter(s): 1D

Model Version(s):

Italia-10B v1.0

Supported Hardware Microarchitecture Compatibility:

  • NVIDIA Ampere
  • NVIDIA Hopper

Supported Operating System(s):

  • Linux
  • Windows

Inference:

Engine: [TensorRT_LLM, Triton, vLLM]

BF16 - single GPU - A100, H100, A10
FP8 - single GPU - H100, L40s

Prompt Format

Single Turn

<extra_id_0>System

<extra_id_1>User
{prompt}
<extra_id_1>Assistant

Multi-Turn or Few-shot

<extra_id_0>System

<extra_id_1>User
{prompt 1}
<extra_id_1>Assistant
{response 1}
<extra_id_1>User
{prompt 2}
<extra_id_1>Assistant
{response 2}
...
<extra_id_1>User
{prompt N}
<extra_id_1>Assistant

Evaluation Results

MT-Bench (GPT-4-Turbo)
Evaluated using MT-Bench judging by GPT-4-0125-Preview as described in Appendix H in the HelpSteer2 Dataset Paper

6.46

IFEval
Evaluated using the Instruction Following Eval (IFEval) introduced in Instruction-Following Evaluation for Large Language Models.

Prompt-Strict Acc: 57,31
Instruction-Strict Acc: 68,23

MMLU
Evaluated using the Multi-task Language Understanding benchmarks as introduced in Measuring Massive Multitask Language Understanding.

5-shot: 63,5

ARC-C
The AI2’s Reasoning Challenge (ARC-C) dataset is a multiple-choice question-answering dataset, containing questions from science exams from grade 3 to grade 9.

5-shot: 84,1

Usage

Deployment and inference with Italia-10B can be done in many ways -
Deployment of the TRTLLM engines with Triton with TRTLLM backend (Single GPU)
Deployment as NIM - Nvidia Inference Microservices
Deployment with PyTriton

Limitations

The model was trained on data collected from the internet, which may contain language that is biased or inappropriate. As a result, the model might occasionally reflect these biases or generate responses that are inaccurate, omit key information, or include irrelevant or redundant text. There is also the possibility that it could produce content that is socially unacceptable or undesirable, even if the prompt does not include any offensive material.

Ethical Considerations:

We believe that developing trustworthy AI is a shared responsibility and have established policies and practices to support the development of a wide array of AI applications. When using this model in accordance with our terms of service, developers are encouraged to work with their internal teams to ensure the model meets the requirements of their specific industry and use case, and to address any potential misuse. If you have any concerns or wish to report security vulnerabilities, please contact us here

Please report security vulnerabilities or NVIDIA AI Concerns here.