Model Overview
Note: You need to request the model checkpoint and license from Stability AI
Request the model checkpoint from Stability AI
Description:
Stable Video Diffusion (SVD) is a generative diffusion model that leverages a single image as a conditioning frame to synthesize video sequences.
This model was trained to generate 25 frames at resolution 576x1024 given a context frame of the same size, fine tuned from SVD Image-to-Video [14 frames].
Developed by: Stability AI
Funded by: Stability AI
Model type: Generative image-to-video model
Terms of use
By using this software or model, you are agreeing to the terms and conditions of the license, acceptable use policy and Stability’s privacy policy.
References(s):
- Stable Video — Stability AI
- Stable Video Diffusion (huggingface.co)
- Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets — Stability AI
Model Card:
Stable Video Diffusion model Card
Model Architecture:
Architecture Type: Convolutional Neural Network (CNN)
Network Architecture: UNet + attention blocks
Model Version: SVD XT
Input:
Input Format: Red, Green, Blue (RGB) Image
Input Parameters: motion_bucket_id, frames_per_second, guidance_scale, seed
Output:
Output Format: Video
Output Parameters: seed
Software Integration:
Supported Hardware Platform(s): Hopper, Ampere/Turing
Supported Operating System(s): Linux
Inference:
Engine: Triton
Test Hardware: Other
Request Model Checkpoint:
You can request the model checkpoint from Stability AI