nvidia / maisi

Model Overview

Description:

NVIDIA MAISI (Medical AI for Synthetic Imaging) is a state-of-the-art three-dimensional (3D) Latent Diffusion Model designed for generating high-quality synthetic CT images with or without anatomical annotations. This AI model excels in data augmentation and creating realistic medical imaging data to supplement limited datasets due to privacy concerns or rare conditions. It can also significantly enhance the performance of other medical imaging AI models by generating diverse and realistic training data.

MAISI offers several key features:

Generates high-resolution 3D CT images up to 512 × 512 × 768 voxels
Supports variable voxel sizes ranging from 0.5mm to 5.0mm
Capable of annotating up to 127 anatomical classes, including organs and tumors
Allows controllable anatomy size for 10 specific classes
Produces paired segmentation masks

By providing these capabilities, MAISI is a valuable tool for researchers advancing AI applications in healthcare. However, it is important to note that this model is intended for research purposes only and not for clinical usage.

Terms of Use

By using this model, you are agreeing to the terms and conditions of the license.

References:

[1] Rombach, Robin, et al. "High-resolution image synthesis with latent diffusion models." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. https://openaccess.thecvf.com/content/CVPR2022/papers/Rombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVPR_2022_paper.pdf

[2] Lvmin Zhang, Anyi Rao, Maneesh Agrawala; “Adding Conditional Control to Text-to-Image Diffusion Models.” Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 3836-3847.
https://openaccess.thecvf.com/content/ICCV2023/papers/Zhang_Adding_Conditional_Control_to_Text-to-Image_Diffusion_Models_ICCV_2023_paper.pdf

Model Architecture:

Architecture Type: Convolutional Neural Network (CNN)

Network Architecture: 3D UNet + attention blocks

Input:

num_output_samples

Input Type: Integer

Input Format: Single integer value

Input Parameters: Required input indicates the number of synthetic images the model will generate.

body_region

Input Type: List

Input Format: Array of Strings

Input Parameters: Required input indicates the region of body the generated CT will focus on

Options: ["head", "chest", "thorax", "abdomen", "pelvis", "lower"]

anatomy_list

Input Type: List

Input Format: Array of Strings

Input Parameters: Optional list of 127 anatomical classes (listed in the Additional Information section)

output_size

Input Type: List

Input Format: Array of 3 Integers

Input Parameters: Optional list of 3 numbers that indicate the x, y, and z size of the CT image.

x- and y-axes: 128, 256, 384, 512
z-axis: 128, 256, 384, 512, 640, 768

spacing

Input Type: List

Input Format: Array of 3 Floats

Input Parameters: Optional list of 3 floats that indicate the spacing of the CT image

Each element must be in the range: 0.5 to 5.0

controllable_anatomy_size

Input Type: List

Input Format: Array of Tuples (String, Float)

Input Parameters: Optional list of tuples for up to 10 different anatomies. Each tuple consists of an (organ_name, size_value) pair.

organ_name options: ["liver", "gallbladder", "stomach", "pancreas", "colon", "lung tumor", "bone lesion", "hepatic tumor", "colon cancer primaries", "pancreatic tumor"]
size_value range: 0.0 to 1.0, or -1 (means not exist/delete this organ)

Output:

Output Type(s): Image(s)

Output Format: (Neuroimaging Informatics Technology Initiative) NIfTI, (Digital Imaging and Communications in Medicine) DICOM, and (Nearly Raw Raster Data) Nrrd

Output Parameters: Three-Dimensional (3D)

Output Description: Synthetic CT image with dimensions up to 512x512x768 and spacing between 0.5mm and 5.0mm, reflecting controllable anatomy sizes as specified. If requested in input parameters, an additional NIfTI file containing the corresponding label map for the anatomy_list is also provided.

Software Integration:

Runtime Engine(s):
MONAI Core v.1.3.2

Supported Hardware Microarchitecture Compatibility:

NVIDIA Ampere
NVIDIA Hopper

[Preferred/Supported] Operating System(s):

Linux

Model Version(s):

0.3.1

Inference:

Engine: PyTorch

Test Hardware:
A100 with at least 80GB memory for 512x512x512 images

H100 with at least 80GB memory for 512x512x512 images

Additional Information:

The current list of classes available within MAISI:

"liver": 1,

"spleen": 3,

"pancreas": 4,

"right kidney": 5,

"aorta": 6,

"inferior vena cava": 7,

"right adrenal gland": 8,

"left adrenal gland": 9,

"gallbladder": 10,

"esophagus": 11,

"stomach": 12,

"duodenum": 13,

"left kidney": 14,

"bladder": 15,

"portal vein and splenic vein": 17,

"small bowel": 19,

"brain": 22,

"lung tumor": 23,

"pancreatic tumor": 24,

"hepatic vessel": 25,

"hepatic tumor": 26,

"colon cancer primaries": 27,

"left lung upper lobe": 28,

"left lung lower lobe": 29,

"right lung upper lobe": 30,

"right lung middle lobe": 31,

"right lung lower lobe": 32,

"vertebrae L5": 33,

"vertebrae L4": 34,

"vertebrae L3": 35,

"vertebrae L2": 36,

"vertebrae L1": 37,

"vertebrae T12": 38,

"vertebrae T11": 39,

"vertebrae T10": 40,

"vertebrae T9": 41,

"vertebrae T8": 42,

"vertebrae T7": 43,

"vertebrae T6": 44,

"vertebrae T5": 45,

"vertebrae T4": 46,

"vertebrae T3": 47,

"vertebrae T2": 48,

"vertebrae T1": 49,

"vertebrae C7": 50,

"vertebrae C6": 51,

"vertebrae C5": 52,

"vertebrae C4": 53,

"vertebrae C3": 54,

"vertebrae C2": 55,

"vertebrae C1": 56,

"trachea": 57,

"left iliac artery": 58,

"right iliac artery": 59,

"left iliac vena": 60,

"right iliac vena": 61,

"colon": 62,

"left rib 1": 63,

"left rib 2": 64,

"left rib 3": 65,

"left rib 4": 66,

"left rib 5": 67,

"left rib 6": 68,

"left rib 7": 69,

"left rib 8": 70,

"left rib 9": 71,

"left rib 10": 72,

"left rib 11": 73,

"left rib 12": 74,

"right rib 1": 75,

"right rib 2": 76,

"right rib 3": 77,

"right rib 4": 78,

"right rib 5": 79,

"right rib 6": 80,

"right rib 7": 81,

"right rib 8": 82,

"right rib 9": 83,

"right rib 10": 84,

"right rib 11": 85,

"right rib 12": 86,

"left humerus": 87,

"right humerus": 88,

"left scapula": 89,

"right scapula": 90,

"left clavicula": 91,

"right clavicula": 92,

"left femur": 93,

"right femur": 94,

"left hip": 95,

"right hip": 96,

"sacrum": 97,

"left gluteus maximus": 98,

"right gluteus maximus": 99,

"left gluteus medius": 100,

"right gluteus medius": 101,

"left gluteus minimus": 102,

"right gluteus minimus": 103,

"left autochthon": 104,

"right autochthon": 105,

"left iliopsoas": 106,

"right iliopsoas": 107,

"left atrial appendage": 108,

"brachiocephalic trunk": 109,

"left brachiocephalic vein": 110,

"right brachiocephalic vein": 111,

"left common carotid artery": 112,

"right common carotid artery": 113,

"costal cartilages": 114,

"heart": 115,

"left kidney cyst": 116,

"right kidney cyst": 117,

"prostate": 118,

"pulmonary vein": 119,

"skull": 120,

"spinal cord": 121,

"sternum": 122,

"left subclavian artery": 123,

"right subclavian artery": 124,

"superior vena cava": 125,

"thyroid gland": 126,

"vertebrae S1": 127,

"bone lesion": 128,

"airway": 132

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here.