capreolus.evaluator

Module Contents

Functions

judged(qrels, runs, n)

mrr_10(qrels, runs)

eval_runs(runs, qrels, metrics[, relevance_level])

Evaluate runs produced by a ranker (or loaded with Searcher.load_trec_run)

eval_runfile(runfile, qrels, metrics, relevance_level)

Evaluate a single runfile produced by ranker or reranker

search_best_run(runfile_dirs, benchmark, primary_metric)

Select the runfile with respect to the specified metric

interpolate_runs(run1, run2, qids, alpha)

interpolated_eval(run1, run2, benchmark, primary_metric)

Attributes

logger

MRR_10

DEFAULT_METRICS

capreolus.evaluator.logger[source]
capreolus.evaluator.MRR_10 = 'MRR@10'[source]
capreolus.evaluator.DEFAULT_METRICS[source]
capreolus.evaluator.judged(qrels, runs, n)[source]
capreolus.evaluator.mrr_10(qrels, runs)[source]
capreolus.evaluator.eval_runs(runs, qrels, metrics, relevance_level=1)[source]

Evaluate runs produced by a ranker (or loaded with Searcher.load_trec_run)

Parameters
  • runs – dict in the format {qid: {docid: score}}

  • qrels – dict containing relevance judgements (e.g., benchmark.qrels)

  • metrics (str or list) – metrics to calculate (e.g., evaluator.DEFAULT_METRICS)

  • relevance_level (int) – relevance label threshold to use with non-graded metrics (equivalent to trec_eval’s –level_for_rel)

Returns

a dict in the format {metric: score} containing the average score for each metric

Return type

dict

capreolus.evaluator.eval_runfile(runfile, qrels, metrics, relevance_level)[source]

Evaluate a single runfile produced by ranker or reranker

Parameters
  • runfile – str, path to runfile

  • qrels – dict, containing the judgements provided by benchmark

  • metrics – str or list, metrics expected to calculate, e.g. ndcg_cut_20, etc

Returns

score}, containing the evaluation score of specified metrics

Return type

a dict with format {metric

capreolus.evaluator.search_best_run(runfile_dirs, benchmark, primary_metric, metrics=None, folds=None)[source]

Select the runfile with respect to the specified metric

Parameters
  • runfile_dirs – the directory path to all the runfiles to select from

  • benchmark – Benchmark class

  • primary_metric – str, metric used to select the best runfile , e.g. ndcg_cut_20, etc

  • metrics – str or list, metric expected by be calculated on the best runs

  • folds – str, the name of fold to select from

Returns

a dict storing specified metric score and path to the corresponding runfile

capreolus.evaluator.interpolate_runs(run1, run2, qids, alpha)[source]
capreolus.evaluator.interpolated_eval(run1, run2, benchmark, primary_metric, metrics=None)[source]