Available Modules¶
The Benchmark
, Reranker
, and Searcher
module types are most often configured by the end user.
For a complete list of modules, run the command capreolus modules
or see the API Reference.
Important
When using Capreolus’ configuration system, modules are selected by specifying their module_name
.
For example, the NF
benchmark can be selected with the benchmark.name=nf
config string or the equivalent config dictionary {"benchmark": {"name": "nf"}}
.
The corresponding class can be created as benchmark.nf.NF(config=..., provide=...)
or created by name with Benchmark.create("nf", config=..., provide=...)
.
Benchmarks¶
ANTIQUE¶
- class capreolus.benchmark.antique.ANTIQUE(config=None, provide=None, share_dependency_objects=False, build=True)[source]
A Non-factoid Question Answering Benchmark from Hashemi et al. [1]
[1] Helia Hashemi, Mohammad Aliannejadi, Hamed Zamani, and W. Bruce Croft. 2020. ANTIQUE: A non-factoid question answering benchmark. ECIR 2020.
- module_name = antique[source]
CodeSearchNet¶
- class capreolus.benchmark.codesearchnet.CodeSearchNetCorpus(config=None, provide=None, share_dependency_objects=False, build=True)[source]
CodeSearchNet Corpus. [1]
[1] Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. CodeSearchNet Challenge: Evaluating the State of Semantic Code Search. arXiv 2019.
- module_name = codesearchnet_corpus[source]
- class capreolus.benchmark.codesearchnet.CodeSearchNetChallenge(config=None, provide=None, share_dependency_objects=False, build=True)[source]
CodeSearchNet Challenge. [1] This benchmark can only be used for training (and challenge submissions) because no qrels are provided.
[1] Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. CodeSearchNet Challenge: Evaluating the State of Semantic Code Search. arXiv 2019.
- module_name = codesearchnet_challenge[source]
(TREC) COVID¶
- class capreolus.benchmark.covid.COVID(config=None, provide=None, share_dependency_objects=False, build=True)[source]
Ongoing TREC-COVID bechmark from https://ir.nist.gov/covidSubmit that uses documents from CORD, the COVID-19 Open Research Dataset (https://www.semanticscholar.org/cord19).
- module_name = covid[source]
Dummy¶
NF Corpus¶
- class capreolus.benchmark.nf.NF(config=None, provide=None, share_dependency_objects=False, build=True)[source]
NFCorpus: A Full-Text Learning to Rank Dataset for Medical Information Retrieval [1]
[1] Vera Boteva, Demian Gholipour, Artem Sokolov and Stefan Riezler. A Full-Text Learning to Rank Dataset for Medical Information Retrieval Proceedings of the 38th European Conference on Information Retrieval (ECIR), Padova, Italy, 2016. https://www.cl.uni-heidelberg.de/statnlpgroup/nfcorpus/
- module_name = nf[source]
(TREC) Robust04¶
- class capreolus.benchmark.robust04.Robust04(config=None, provide=None, share_dependency_objects=False, build=True)[source]
Robust04 benchmark using the title folds from Huston and Croft. [1] Each of these is used as the test set. Given the remaining four folds, we split them into the same train and dev sets used in recent work. [2]
[1] Samuel Huston and W. Bruce Croft. 2014. Parameters learned in the comparison of retrieval models using term dependencies. Technical Report.
[2] Sean MacAvaney, Andrew Yates, Arman Cohan, Nazli Goharian. 2019. CEDR: Contextualized Embeddings for Document Ranking. SIGIR 2019.
- module_name = robust04[source]
- class capreolus.benchmark.robust04.Robust04Yang19(config=None, provide=None, share_dependency_objects=False, build=True)[source]
Robust04 benchmark using the folds from Yang et al. [1]
[1] Wei Yang, Kuang Lu, Peilin Yang, and Jimmy Lin. 2019. Critically Examining the “Neural Hype”: Weak Baselines and the Additivity of Effectiveness Gains from Neural Ranking Models. SIGIR 2019.
- module_name = robust04.yang19[source]
Searchers¶
Note
Some searchers (e.g., BM25) automatically perform a cross-validated grid search when their parameters are provided as lists. For example, searcher.b=0.4,0.6,0.8 searcher.k1=1.0,1.5
.
BM25¶
BM25 with Axiomatic expansion¶
- class capreolus.searcher.anserini.AxiomaticSemanticMatching(config=None, provide=None, share_dependency_objects=False, build=True)[source]
Anserini BM25 with Axiomatic query expansion. This searcher’s parameters can also be specified as lists indicating parameters to grid search (e.g.,
"0.4,0.6,0.8,1.0"
or"0.4..1,0.2"
).- module_name = axiomatic[source]
BM25 with RM3 expansion¶
- class capreolus.searcher.anserini.BM25RM3(config=None, provide=None, share_dependency_objects=False, build=True)[source]
Anserini BM25 with RM3 expansion. This searcher’s parameters can also be specified as lists indicating parameters to grid search (e.g.,
"0.4,0.6,0.8,1.0"
or"0.4..1,0.2"
).- module_name = BM25RM3[source]
BM25 PRF¶
- class capreolus.searcher.anserini.BM25PRF(config=None, provide=None, share_dependency_objects=False, build=True)[source]
Anserini BM25 PRF. This searcher’s parameters can also be specified as lists indicating parameters to grid search (e.g.,
"0.4,0.6,0.8,1.0"
or"0.4..1,0.2"
).- module_name = BM25PRF[source]
F2Exp¶
F2Log¶
I(n)L2¶
QL with Dirichlet smoothing¶
- class capreolus.searcher.anserini.DirichletQL(config=None, provide=None, share_dependency_objects=False, build=True)[source]
Anserini QL with Dirichlet smoothing. This searcher’s parameters can also be specified as lists indicating parameters to grid search (e.g.,
"0.4,0.6,0.8,1.0"
or"0.4..1,0.2"
).- module_name = DirichletQL[source]
QL with J-M smoothing¶
- class capreolus.searcher.anserini.QLJM(config=None, provide=None, share_dependency_objects=False, build=True)[source]
Anserini QL with Jelinek-Mercer smoothing. This searcher’s parameters can also be specified as lists indicating parameters to grid search (e.g.,
"0.4,0.6,0.8,1.0"
or"0.4..1,0.2"
).- module_name = QLJM[source]
SDM¶
SPL¶
Rerankers¶
Note
Rerankers are implemented in PyTorch or TensorFlow. Rerankers with TensorFlow implementations can run on both GPUs and TPUs.
CEDR-KNRM¶
- class capreolus.reranker.CEDRKNRM.CEDRKNRM(config=None, provide=None, share_dependency_objects=False, build=True)[source]
PyTorch implementation of CEDR-KNRM. Equivalant to BERT-KNRM when cls=None.
CEDR: Contextualized Embeddings for Document Ranking Sean MacAvaney, Andrew Yates, Arman Cohan, and Nazli Goharian. SIGIR 2019. https://arxiv.org/pdf/1904.07094
- module_name = CEDRKNRM[source]
CDSSM¶
- class capreolus.reranker.CDSSM.CDSSM(config=None, provide=None, share_dependency_objects=False, build=True)[source]
Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Grégoire Mesnil. 2014. A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval. In CIKM’14.
- module_name = CDSSM[source]
ConvKNRM¶
- class capreolus.reranker.ConvKNRM.ConvKNRM(config=None, provide=None, share_dependency_objects=False, build=True)[source]
Zhuyun Dai, Chenyan Xiong, Jamie Callan, and Zhiyuan Liu. 2018. Convolutional Neural Networks for Soft-Matching N-Grams in Ad-hoc Search. In WSDM’18.
- module_name = ConvKNRM[source]
DRMM¶
DRMMTKS¶
DSSM¶
- class capreolus.reranker.DSSM.DSSM(config=None, provide=None, share_dependency_objects=False, build=True)[source]
Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In CIKM’13.
- module_name = DSSM[source]
DUET¶
DeepTileBars¶
HiNT¶
KNRM¶
PACRR¶
POSITDRMM¶
TK¶
TensorFlow KNRM¶
- class capreolus.reranker.TFKNRM.TFKNRM(config=None, provide=None, share_dependency_objects=False, build=True)[source]
TensorFlow implementation of KNRM.
Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power. 2017. End-to-End Neural Ad-hoc Ranking with Kernel Pooling. In SIGIR’17.
- module_name = TFKNRM[source]
TensorFlow BERT-MaxP¶
- class capreolus.reranker.TFBERTMaxP.TFBERTMaxP(config=None, provide=None, share_dependency_objects=False, build=True)[source]
TensorFlow implementation of BERT-MaxP.
Deeper Text Understanding for IR with Contextual Neural Language Modeling. Zhuyun Dai and Jamie Callan. SIGIR 2019. https://arxiv.org/pdf/1905.09217.pdf
- module_name = TFBERTMaxP[source]
TensorFlow CEDR-KNRM¶
- class capreolus.reranker.TFCEDRKNRM.TFCEDRKNRM(config=None, provide=None, share_dependency_objects=False, build=True)[source]
TensorFlow implementation of CEDR-KNRM. Equivalant to BERT-KNRM when cls=None.
CEDR: Contextualized Embeddings for Document Ranking Sean MacAvaney, Andrew Yates, Arman Cohan, and Nazli Goharian. SIGIR 2019. https://arxiv.org/pdf/1904.07094
- module_name = TFCEDRKNRM[source]
TensorFlow PARADE¶
- class capreolus.reranker.parade.TFParade(config=None, provide=None, share_dependency_objects=False, build=True)[source]
TensorFlow implementation of PARADE.
PARADE: Passage Representation Aggregation for Document Reranking. Canjia Li, Andrew Yates, Sean MacAvaney, Ben He, and Yingfei Sun. arXiv 2020. https://arxiv.org/pdf/2008.09093.pdf
- module_name = parade[source]