nvidia / deepvariant

Model Overview

Description

DeepVariant (the model behind the Universal Variant Calling Microservice) is a deep learning model that can help identify variants in short- and long-read sequencing datasets. This model is ready for commercial use.

Parabricks DeepVariant is a highly optimized implementation of the DeepVariant pipeline that dramatically improves variant calling runtimes.

This model supports read sets from Illumina, Oxford Nanopore, and Pacific Biosciences natively; supports both whole-genome and whole-exome sequencing; and can output either Variant Call Format (VCF) or genomic VCF.

The Universal Variant Calling NIM can:

  • Process short-read whole exome data
  • Process short-read and long-read whole genome data
  • Perform inference locally or on NVIDIA GPU Cloud
  • Output VCF or gVCF.

📗

Note

Apply here to self-host this API

Third-Party Community Consideration

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to GitHub.

References(s)

Parabricks Latest Documentation

Terms of use

By using this software or model, you are agreeing to the NVIDIA Parabricks Terms of Use

Model Architecture

Architecture Type: Convolution Neural Network (CNN)

Network Architecture: Inceptionv2

For more information, see the Parabricks documentation.

Input

Input Type(s): Indices (Text, Binary)

Input Format(s): Tarball

Input Parameters: One Dimensional (1D)

  • A reference genome tarball that contains a reference genome and the indices generated by samtools and bwa. This can be generated by running:
samtools faidx <reference genome>
bwa index <reference genome>
tar cvf <reference genome>.tar <reference genome>*
  • A Binary Alignment Map (BAM) file from Parabricks fq2bam or Burrows-Wheeler Aligner.
  • A BAM Index (BAI) file.

Output

Output Type(s): Text (Sample, Manifest, Path, Path)

Output Format: VCF File

Output Parameters: 1D

The output of the DeepVariant Microservice is the following:

  • A VCF file containing variant calls for your sample.
  • A VCF manifest (which contains the needed parts to sign a multipart-upload request if running in the cloud).
  • A path to the STDOUT of the run (either locally or in cloud storage)
  • A path to the STDERR of the run (either locally or in cloud storage)

Software Integration

Supported Hardware Platform(s): NVIDIA GPU(s) with at least 24 GB of RAM, including Hopper, Lovelace, Ampere, Turing, and Volta generations.

Supported Operating System(s): Linux

Model Version:

  • V4.2.1-1

Inference

Engine: Triton and PyTriton

Test Hardware: Other