Get started

60-Second Quick Start

Get up and running with OpenVals in under 60 seconds.

1. Install the CLI

pip install openvals

2. Run a Benchmark

Evaluate and compare models on a specific dataset:

openvals benchmark \
  --dataset finance \
  --models mistral,llama3

Expected CLI Output:

Model      Accuracy    DRS
--------------------------------
llama3     91.4        89.2
mistral    87.8        82.4
QWEN       70.7        69.7

3. Validate a Dataset

Verify schema and quality before running model evaluations:

openvals validate-dataset finance
openvals validate-dataset ./customer_dataset.json
openvals validate-dataset ./customer_dataset.csv

Benchmark Multiple Models with Config

openvals benchmark \
  --dataset finance \
  --models mistral,llama3 \
  --config finance

Show Version

openvals version

Python SDK Example

from openvals.benchmarking.runner import BenchmarkRunner
from openvals.models.ollama_model import OllamaModel
from openvals.datasets.loader import load_dataset

dataset = load_dataset("examples/sample_eval.json")

models = {
    "llama2": OllamaModel("llama2"),
    "llama3": OllamaModel("llama3"),
    "mistral": OllamaModel("mistral")
}

runner = BenchmarkRunner(models, dataset)
results = runner.run()

print(results)

Example Trust Intelligence Report

Below is an example of the detailed Trust Intelligence Report generated by the CLI:

===================================================
OpenVals Trust Intelligence Report
===================================================

Model: llama3

Accuracy Score      : 91.4
Semantic Score      : 89.1
Factuality Score    : 92.3
Safety Score        : 95.2
Latency Score       : 83.0

Hallucination Risk  : LOW

Decision Reliability Score (DRS)

89.2 / 100

Deployment Status:

READY FOR PRODUCTION