capreolus.benchmark
¶
Submodules¶
capreolus.benchmark.antique
capreolus.benchmark.cds
capreolus.benchmark.codesearchnet
capreolus.benchmark.core17
capreolus.benchmark.core18
capreolus.benchmark.covid
capreolus.benchmark.covidabstract
capreolus.benchmark.dummy
capreolus.benchmark.genomics
capreolus.benchmark.gov2
capreolus.benchmark.msmarco
capreolus.benchmark.nf
capreolus.benchmark.robust04
Package Contents¶
Classes¶
Base class for Benchmark modules. The purpose of a Benchmark is to provide the data needed to run an experiment, such as queries, folds, and relevance judgments. |
|
Base class for Benchmark modules. The purpose of a Benchmark is to provide the data needed to run an experiment, such as queries, folds, and relevance judgments. |
Functions¶
|
Attributes¶
- class capreolus.benchmark.Benchmark(config=None, provide=None, share_dependency_objects=False, build=True)[source]¶
Bases:
capreolus.ModuleBase
Base class for Benchmark modules. The purpose of a Benchmark is to provide the data needed to run an experiment, such as queries, folds, and relevance judgments.
- Modules should provide:
a
topics
dict mapping query ids (qids) to queriesa
qrels
dict mapping qids to docids and relevance labelsa
folds
dict mapping a fold name to training, dev (validation), and testing qidsif these can be loaded from files in standard formats, they can be specified by setting the
topic_file
,qrel_file
, andfold_file
, respectively, rather than by setting the above attributes directly
- relevance_level = 1[source]¶
Documents with a relevance label >= relevance_level will be considered relevant. This corresponds to trec_eval’s –level_for_rel (and is passed to pytrec_eval as relevance_level).
- use_train_as_dev = True[source]¶
Whether to use training set as validate set when there is no training needed, e.g. for traditional IR algorithms like BM25
- class capreolus.benchmark.IRDBenchmark(config=None, provide=None, share_dependency_objects=False, build=True)[source]¶
Bases:
Benchmark
Base class for Benchmark modules. The purpose of a Benchmark is to provide the data needed to run an experiment, such as queries, folds, and relevance judgments.
- Modules should provide:
a
topics
dict mapping query ids (qids) to queriesa
qrels
dict mapping qids to docids and relevance labelsa
folds
dict mapping a fold name to training, dev (validation), and testing qidsif these can be loaded from files in standard formats, they can be specified by setting the
topic_file
,qrel_file
, andfold_file
, respectively, rather than by setting the above attributes directly