serhii.net

In the middle of the desert you can say anything you want

05 May 2025

Evaluating RAG

General info

  • Roughly, it’s evaluating
    • context — how relevant/correct are the retrieved chunks
    • answer — how good are the generated claims
    • (with interplay inbetween — e.g. whether the answer comes from the context, regardless of whether both are relevant to the query)
  • Or: generator metrics, retriever metrics, and overall metrics as used in RAGChecker (see RAGChecker picture later below).

Sources / libs

  • Evaluating - LlamaIndex

  • Deepeval’s metrics as given in the llamaindex docs:

    from deepeval.integrations.llama_index import ( DeepEvalAnswerRelevancyEvaluator, DeepEvalFaithfulnessEvaluator, DeepEvalContextualRelevancyEvaluator, DeepEvalSummarizationEvaluator, DeepEvalBiasEvaluator, DeepEvalToxicityEvaluator, )
    

RAGChecker

Pasted image 20250505103043.png

Nel mezzo del deserto posso dire tutto quello che voglio.
comments powered by Disqus