capreolus.evaluator
¶
Module Contents¶
Functions¶
judged (qrels, runs, n) |
|
eval_runs (runs, qrels, metrics, relevance_level=1) |
Evaluate runs produced by a ranker (or loaded with Searcher.load_trec_run) |
eval_runfile (runfile, qrels, metrics, relevance_level) |
Evaluate a single runfile produced by ranker or reranker |
search_best_run (runfile_dirs, benchmark, primary_metric, metrics=None, folds=None) |
Select the runfile with respect to the specified metric |
interpolate_runs (run1, run2, qids, alpha) |
|
interpolated_eval (run1, run2, benchmark, primary_metric, metrics=None) |
-
capreolus.evaluator.
DEFAULT_METRICS
= ['P_1', 'P_5', 'P_10', 'P_20', 'judged_10', 'judged_20', 'judged_200', 'map', 'ndcg_cut_5', 'ndcg_cut_10', 'ndcg_cut_20', 'recall_100', 'recall_1000', 'recip_rank'][source]¶
-
capreolus.evaluator.
eval_runs
(runs, qrels, metrics, relevance_level=1)[source]¶ Evaluate runs produced by a ranker (or loaded with Searcher.load_trec_run)
Parameters: - runs – dict in the format
{qid: {docid: score}}
- qrels – dict containing relevance judgements (e.g.,
benchmark.qrels
) - metrics (str or list) – metrics to calculate (e.g.,
evaluator.DEFAULT_METRICS
) - relevance_level (int) – relevance label threshold to use with non-graded metrics (equivalent to trec_eval’s –level_for_rel)
Returns: a dict in the format
{metric: score}
containing the average score for each metricReturn type: dict
- runs – dict in the format
-
capreolus.evaluator.
eval_runfile
(runfile, qrels, metrics, relevance_level)[source]¶ Evaluate a single runfile produced by ranker or reranker
Parameters: - runfile – str, path to runfile
- qrels – dict, containing the judgements provided by benchmark
- metrics – str or list, metrics expected to calculate, e.g. ndcg_cut_20, etc
Returns: score}, containing the evaluation score of specified metrics
Return type: a dict with format {metric
-
capreolus.evaluator.
search_best_run
(runfile_dirs, benchmark, primary_metric, metrics=None, folds=None)[source]¶ Select the runfile with respect to the specified metric
Parameters: - runfile_dirs – the directory path to all the runfiles to select from
- benchmark – Benchmark class
- primary_metric – str, metric used to select the best runfile , e.g. ndcg_cut_20, etc
- metrics – str or list, metric expected by be calculated on the best runs
- folds – str, the name of fold to select from
Returns: a dict storing specified metric score and path to the corresponding runfile