The Four Dimensions of LLM Output Quality
- Faithfulness (for RAG systems): does the output only assert things that are supported by the retrieved context? Measures hallucination in grounded systems
- Relevance: does the output address the actual question or task? A response can be factually correct and completely irrelevant
- Completeness: does the output cover all aspects of the question or all required fields? Partial completeness is a common failure mode
- Format compliance: does the output conform to the expected structure? Especially important for downstream system consumption of AI outputs