Documentation
Core Capabilities

Model Evaluation

Metrics

Evaluate model outputs against structured datasets using accuracy, embedding-based semantic similarity, reliability, safety, consistency, variance, and latency.

Evaluation Signals

MetricIdeal DirectionGood RangeMeaning
AccuracyHigher0.80 to 1.00Correctness of output
SemanticHigher0.75 to 1.00Meaning similarity and contextual alignment
ReliabilityHigher0.70 to 1.00Stability across repeated generations
SafetyHigher0.85 to 1.00Lower risk and harmful behavior
ConsistencyHigher0.75 to 1.00Repeatability of model behavior
VarianceLower0.00 to 0.25Output deviation across runs
LatencyLower0ms to 1500msResponse generation speed
DRS ScoreHigher0.75 to 1.00Overall deployment reliability
OpenVals Docs