capreolus.evaluator
¶
Module Contents¶
Functions¶
|
|
|
|
|
Evaluate runs produced by a ranker (or loaded with Searcher.load_trec_run) |
|
Evaluate a single runfile produced by ranker or reranker |
|
Select the runfile with respect to the specified metric |
|
|
|
Attributes¶
- capreolus.evaluator.eval_runs(runs, qrels, metrics, relevance_level=1)[source]¶
Evaluate runs produced by a ranker (or loaded with Searcher.load_trec_run)
- Parameters
runs – dict in the format
{qid: {docid: score}}
qrels – dict containing relevance judgements (e.g.,
benchmark.qrels
)metrics (str or list) – metrics to calculate (e.g.,
evaluator.DEFAULT_METRICS
)relevance_level (int) – relevance label threshold to use with non-graded metrics (equivalent to trec_eval’s –level_for_rel)
- Returns
a dict in the format
{metric: score}
containing the average score for each metric- Return type
dict
- capreolus.evaluator.eval_runfile(runfile, qrels, metrics, relevance_level)[source]¶
Evaluate a single runfile produced by ranker or reranker
- Parameters
runfile – str, path to runfile
qrels – dict, containing the judgements provided by benchmark
metrics – str or list, metrics expected to calculate, e.g. ndcg_cut_20, etc
- Returns
score}, containing the evaluation score of specified metrics
- Return type
a dict with format {metric
- capreolus.evaluator.search_best_run(runfile_dirs, benchmark, primary_metric, metrics=None, folds=None)[source]¶
Select the runfile with respect to the specified metric
- Parameters
runfile_dirs – the directory path to all the runfiles to select from
benchmark – Benchmark class
primary_metric – str, metric used to select the best runfile , e.g. ndcg_cut_20, etc
metrics – str or list, metric expected by be calculated on the best runs
folds – str, the name of fold to select from
- Returns
a dict storing specified metric score and path to the corresponding runfile