capreolus.evaluator

Module Contents

Functions

judged(qrels, runs, n)
eval_runs(runs, qrels, metrics, relevance_level) Evaluate runs loaded by Searcher.load_trec_run
eval_runfile(runfile, qrels, metrics, relevance_level) Evaluate a single runfile produced by ranker or reranker
search_best_run(runfile_dir, benchmark, primary_metric, metrics=None, folds=None) Select the runfile with respect to the specified metric
capreolus.evaluator.logger[source]
capreolus.evaluator.DEFAULT_METRICS = ['P_1', 'P_5', 'P_10', 'P_20', 'judged_10', 'judged_20', 'judged_200', 'map', 'ndcg_cut_5', 'ndcg_cut_10', 'ndcg_cut_20', 'recall_100', 'recall_1000', 'recip_rank'][source]
capreolus.evaluator.judged(qrels, runs, n)[source]
capreolus.evaluator.eval_runs(runs, qrels, metrics, relevance_level)[source]

Evaluate runs loaded by Searcher.load_trec_run

Parameters:
  • runs – a dict with format {qid: {docid: score}}, could be prepared by Searcher.load_trec_run
  • qrels – dict, containing the judgements provided by benchmark
  • metrics – str or list, metrics expected to calculate, e.g. ndcg_cut_20, etc
Returns:

score}, containing the evaluation score of specified metrics

Return type:

a dict with format {metric

capreolus.evaluator.eval_runfile(runfile, qrels, metrics, relevance_level)[source]

Evaluate a single runfile produced by ranker or reranker

Parameters:
  • runfile – str, path to runfile
  • qrels – dict, containing the judgements provided by benchmark
  • metrics – str or list, metrics expected to calculate, e.g. ndcg_cut_20, etc
Returns:

score}, containing the evaluation score of specified metrics

Return type:

a dict with format {metric

capreolus.evaluator.search_best_run(runfile_dir, benchmark, primary_metric, metrics=None, folds=None)[source]

Select the runfile with respect to the specified metric

Parameters:
  • runfile_dir – the directory path to all the runfiles to select from
  • benchmark – Benchmark class
  • primary_metric – str, metric used to select the best runfile , e.g. ndcg_cut_20, etc
  • metrics – str or list, metric expected by be calculated on the best runs
  • folds – str, the name of fold to select from
Returns:

a dict storing specified metric score and path to the corresponding runfile