Output Formats

VCF and MAF output format details.

VCF Output (default for VCF input)

When the input is a VCF file, vibe-vep outputs VCF format by default. Original VCF header and data lines are preserved, and a CSQ INFO field is appended with consequence annotations for all overlapping transcripts. The CSQ format follows the VEP convention — pipe-delimited fields per transcript, comma-separated between transcripts:

CSQ=Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|CANONICAL

Canonical transcripts are flagged with CANONICAL=YES.

MAF Output (default for MAF input)

When the input is a MAF file, vibe-vep outputs MAF format by default. All original columns are preserved exactly as-is, and vibe.* namespaced columns are appended with fresh predictions:

ColumnDescription
vibe.hugo_symbolGene symbol
vibe.consequenceSO consequence term
vibe.variant_classificationMAF variant classification
vibe.transcript_idEnsembl transcript ID
vibe.hgvscHGVS coding DNA notation
vibe.hgvspHGVS protein notation (3-letter)
vibe.hgvsp_shortHGVS protein notation (1-letter)

When annotation sources are configured, additional vibe.{source}.{column} columns are appended (e.g., vibe.oncokb.gene_type, vibe.alphamissense.score).

Use vibe-vep version --maf-columns to see the full column mapping.

Performance

  • End-to-end throughput: ~14,000 variants/sec parallel (4 workers), ~5,000 single-threaded
  • Cache loading: ~25 seconds to load 254k GENCODE v46 transcripts
  • Memory: Proportional to transcript count (~254k for GENCODE v46)
  • Total benchmark: 1,052,366 variants across 7 TCGA studies