Available Modules¶
The Benchmark
, Reranker
, and Searcher
module types are most often configured by the end user.
For a complete list of modules, run the command capreolus modules
or see the API Reference.
Important
When using Capreolus’ configuration system, modules are selected by specifying their module_name
.
For example, the NF
benchmark can be selected with the benchmark.name=nf
config string or the equivalent config dictionary {"benchmark": {"name": "nf"}}
.
The corresponding class can be created as benchmark.nf.NF(config=..., provide=...)
or created by name with Benchmark.create("nf", config=..., provide=...)
.
Benchmarks¶
ANTIQUE¶
-
class
capreolus.benchmark.antique.
ANTIQUE
(config=None, provide=None, share_dependency_objects=False, build=True)[source] A Non-factoid Question Answering Benchmark from Hashemi et al. [1]
[1] Helia Hashemi, Mohammad Aliannejadi, Hamed Zamani, and W. Bruce Croft. 2020. ANTIQUE: A non-factoid question answering benchmark. ECIR 2020.
-
module_name
= antique[source]
-
CodeSearchNet¶
-
class
capreolus.benchmark.codesearchnet.
CodeSearchNetCorpus
(config=None, provide=None, share_dependency_objects=False, build=True)[source] CodeSearchNet Corpus. [1]
[1] Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. CodeSearchNet Challenge: Evaluating the State of Semantic Code Search. arXiv 2019.
-
module_name
= codesearchnet_corpus[source]
-
-
class
capreolus.benchmark.codesearchnet.
CodeSearchNetChallenge
(config=None, provide=None, share_dependency_objects=False, build=True)[source] CodeSearchNet Challenge. [1] This benchmark can only be used for training (and challenge submissions) because no qrels are provided.
[1] Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. CodeSearchNet Challenge: Evaluating the State of Semantic Code Search. arXiv 2019.
-
module_name
= codesearchnet_challenge[source]
-
(TREC) COVID¶
-
class
capreolus.benchmark.covid.
COVID
(config=None, provide=None, share_dependency_objects=False, build=True)[source] Ongoing TREC-COVID bechmark from https://ir.nist.gov/covidSubmit that uses documents from CORD, the COVID-19 Open Research Dataset (https://www.semanticscholar.org/cord19).
-
module_name
= covid[source]
-
Dummy¶
NF Corpus¶
-
class
capreolus.benchmark.nf.
NF
(config=None, provide=None, share_dependency_objects=False, build=True)[source] NFCorpus: A Full-Text Learning to Rank Dataset for Medical Information Retrieval [1]
[1] Vera Boteva, Demian Gholipour, Artem Sokolov and Stefan Riezler. A Full-Text Learning to Rank Dataset for Medical Information Retrieval Proceedings of the 38th European Conference on Information Retrieval (ECIR), Padova, Italy, 2016. https://www.cl.uni-heidelberg.de/statnlpgroup/nfcorpus/
-
module_name
= nf[source]
-
(TREC) Robust04¶
-
class
capreolus.benchmark.robust04.
Robust04
(config=None, provide=None, share_dependency_objects=False, build=True)[source] Robust04 benchmark using the title folds from Huston and Croft. [1] Each of these is used as the test set. Given the remaining four folds, we split them into the same train and dev sets used in recent work. [2]
[1] Samuel Huston and W. Bruce Croft. 2014. Parameters learned in the comparison of retrieval models using term dependencies. Technical Report.
[2] Sean MacAvaney, Andrew Yates, Arman Cohan, Nazli Goharian. 2019. CEDR: Contextualized Embeddings for Document Ranking. SIGIR 2019.
-
module_name
= robust04[source]
-
-
class
capreolus.benchmark.robust04.
Robust04Yang19
(config=None, provide=None, share_dependency_objects=False, build=True)[source] Robust04 benchmark using the folds from Yang et al. [1]
[1] Wei Yang, Kuang Lu, Peilin Yang, and Jimmy Lin. 2019. Critically Examining the “Neural Hype”: Weak Baselines and the Additivity of Effectiveness Gains from Neural Ranking Models. SIGIR 2019.
-
module_name
= robust04.yang19[source]
-
Searchers¶
Note
Some searchers (e.g., BM25) automatically perform a cross-validated grid search when their parameters are provided as lists. For example, searcher.b=0.4,0.6,0.8 searcher.k1=1.0,1.5
.
BM25¶
BM25 with Axiomatic expansion¶
-
class
capreolus.searcher.anserini.
AxiomaticSemanticMatching
(config=None, provide=None, share_dependency_objects=False, build=True)[source] Anserini BM25 with Axiomatic query expansion. This searcher’s parameters can also be specified as lists indicating parameters to grid search (e.g.,
"0.4,0.6,0.8"
or"0.4..1,0.2"
).-
module_name
= axiomatic[source]
-
BM25 with RM3 expansion¶
-
class
capreolus.searcher.anserini.
BM25RM3
(config=None, provide=None, share_dependency_objects=False, build=True)[source] Anserini BM25 with RM3 expansion. This searcher’s parameters can also be specified as lists indicating parameters to grid search (e.g.,
"0.4,0.6,0.8"
or"0.4..1,0.2"
).-
module_name
= BM25RM3[source]
-
BM25 PRF¶
-
class
capreolus.searcher.anserini.
BM25PRF
(config=None, provide=None, share_dependency_objects=False, build=True)[source] Anserini BM25 PRF. This searcher’s parameters can also be specified as lists indicating parameters to grid search (e.g.,
"0.4,0.6,0.8"
or"0.4..1,0.2"
).-
module_name
= BM25PRF[source]
-
F2Exp¶
F2Log¶
I(n)L2¶
QL with Dirichlet smoothing¶
-
class
capreolus.searcher.anserini.
DirichletQL
(config=None, provide=None, share_dependency_objects=False, build=True)[source] Anserini QL with Dirichlet smoothing. This searcher’s parameters can also be specified as lists indicating parameters to grid search (e.g.,
"0.4,0.6,0.8"
or"0.4..1,0.2"
).-
module_name
= DirichletQL[source]
-
QL with J-M smoothing¶
-
class
capreolus.searcher.anserini.
QLJM
(config=None, provide=None, share_dependency_objects=False, build=True)[source] Anserini QL with Jelinek-Mercer smoothing. This searcher’s parameters can also be specified as lists indicating parameters to grid search (e.g.,
"0.4,0.6,0.8"
or"0.4..1,0.2"
).-
module_name
= QLJM[source]
-
SDM¶
Rerankers¶
Note
Rerankers are implemented in PyTorch or TensorFlow. Rerankers with TensorFlow implementations can run on both GPUs and TPUs.
CDSSM¶
-
class
capreolus.reranker.CDSSM.
CDSSM
(config=None, provide=None, share_dependency_objects=False, build=True)[source] Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Grégoire Mesnil. 2014. A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval. In CIKM‘14.
-
module_name
= CDSSM[source]
-
ConvKNRM¶
-
class
capreolus.reranker.ConvKNRM.
ConvKNRM
(config=None, provide=None, share_dependency_objects=False, build=True)[source] Zhuyun Dai, Chenyan Xiong, Jamie Callan, and Zhiyuan Liu. 2018. Convolutional Neural Networks for Soft-Matching N-Grams in Ad-hoc Search. In WSDM‘18.
-
module_name
= ConvKNRM[source]
-
DRMM¶
DRMMTKS¶
DSSM¶
-
class
capreolus.reranker.DSSM.
DSSM
(config=None, provide=None, share_dependency_objects=False, build=True)[source] Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In CIKM‘13.
-
module_name
= DSSM[source]
-
DUET¶
DeepTileBars¶
HiNT¶
KNRM¶
PACRR¶
POSITDRMM¶
TK¶
TensorFlow KNRM¶
-
class
capreolus.reranker.TFKNRM.
TFKNRM
(config=None, provide=None, share_dependency_objects=False, build=True)[source] TensorFlow implementation of KNRM.
Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power. 2017. End-to-End Neural Ad-hoc Ranking with Kernel Pooling. In SIGIR‘17.
-
module_name
= TFKNRM[source]
-