thudm / chatglm3-6b

Model Overview

Description

ChatGLM3 is a generation of pre-trained dialogue models jointly released by Zhipu AI and Tsinghua KEG. ChatGLM3-6B is the open-source model in the ChatGLM3 series, maintaining many excellent features of the first two generations such as smooth dialogue and low deployment threshold. It adopts a more diverse training dataset, and a newly designed Prompt format, and has a more comprehensive open-source series. All these weights are fully open for academic research, and free commercial use is also allowed after registration via a questionnaire.

Terms and Conditions

We hereby declare that our team has not developed any applications based on ChatGLM3 models, not on iOS, Android, the web, or any other platform. We strongly call on all users not to use ChatGLM3 models for any activities that harm national / social security or violate the law. Also, we ask users not to use ChatGLM3 models for Internet services that have not undergone appropriate security reviews and filings. We hope that all users can abide by this principle and ensure that the development of technology proceeds in a regulated and legal environment.

If any problems arise due to the use of ChatGLM3 open-source models, including but not limited to data security issues, public opinion risks, or any risks and problems brought about by the model being misled, abused, spread or improperly exploited, we will not assume any responsibility.

Third-Party Community Consideration

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to THUDM's Model Card.

License, Acceptable Use, and Research Privacy Policy

By using this model, you are agreeing to the terms and conditions of the Apache 2.0 and ChatGLM3-6B License.

Only for internal study and test purposes. Please follow the Model License and Terms of Use, and please contact ZhipuAI(https://open.bigmodel.cn/mla/form) for any commercial use.

The ChatGLM3-6B open-source model aims to promote the development of large-model technology. Developers and everyone are earnestly requested to comply with these rule:

  1. Comply with applicable laws, regulations and policies.
  2. Don't perform or facilitate any activities that may harm the safety, wellbeing, or rights of others, or for any services that have not been evaluated and registered.
  3. Don't distribute output from ChatGLM3-6B open-source model and the service to misinform, misrepresent, mislead others or use in any improper purpose .

Although every effort has been made to ensure the compliance and accuracy of the data at various stages of model training, due to the smaller scale of the ChatGLM3-6B model and the influence of probabilistic randomness factors, the accuracy of output content cannot be guaranteed. The model output is also easily misled by user input. Developers and users should build safeguards and assume risks and liabilities in the use of open-source models and codes.

References

GitHub: https://github.com/THUDM/ChatGLM3

HuggingFace: https://huggingface.co/THUDM/chatglm3-6b

Technical Report: https://arxiv.org/abs/2103.10360

Model Developer: Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University

Model Release Date: October 22, 2023

Model Architecture:

Architecture Type: Transformer

Fine-tuned from model: ChatGLM3-6B base model

Input:

Input Type: Text

Input Format: String

Input Parameters: Temperature, Top P, Max Output Tokens

Output:

Output Type: Text

Output Format: String

Model Version: ChatGLM3-6B v1.1.0

Evaluation

ChatGLM3-6B conducted performance tests on 8 typical Chinese-English datasets including: general language test, mathematics, code, and multilingual translation. For more detailed evaluation results of original models, please refer to GitHub.

Inference:

Engine: Triton TensorRT-LLM

Test Hardware: L40