Model Overview

Note: You need to request the model checkpoint and license from Stability AI

Request the model checkpoint from Stability AI

Description:

Stable Video Diffusion (SVD) is a generative diffusion model that leverages a single image as a conditioning frame to synthesize video sequences.
This model was trained to generate 25 frames at resolution 576x1024 given a context frame of the same size, fine tuned from SVD Image-to-Video [14 frames].

Developed by: Stability AI
Funded by: Stability AI
Model type: Generative image-to-video model

Terms of use

By using this software or model, you are agreeing to the terms and conditions of the license, acceptable use policy and Stability’s privacy policy.

References(s):

Model Card:

Stable Video Diffusion model Card

Model Architecture:

Architecture Type: Convolutional Neural Network (CNN)

Network Architecture: UNet + attention blocks

Model Version: SVD XT

Input:

Input Format: Red, Green, Blue (RGB) Image

Input Parameters: motion_bucket_id, frames_per_second, guidance_scale, seed

Output:

Output Format: Video

Output Parameters: seed

Software Integration:

Supported Hardware Platform(s): Hopper, Ampere/Turing

Supported Operating System(s): Linux

Inference:

Engine: Triton

Test Hardware: Other

Request Model Checkpoint:

You can request the model checkpoint from Stability AI