Available Modules

The Benchmark, Reranker, and Searcher module types are most often configured by the end user. For a complete list of modules, run the command capreolus modules or see the API Reference.

Important

When using Capreolus’ configuration system, modules are selected by specifying their module_name. For example, the NF benchmark can be selected with the benchmark.name=nf config string or the equivalent config dictionary {"benchmark": {"name": "nf"}}.

The corresponding class can be created as benchmark.nf.NF(config=..., provide=...) or created by name with Benchmark.create("nf", config=..., provide=...).

Benchmarks

ANTIQUE

class capreolus.benchmark.antique.ANTIQUE(config=None, provide=None, share_dependency_objects=False, build=True)[source]

A Non-factoid Question Answering Benchmark from Hashemi et al. [1]

[1] Helia Hashemi, Mohammad Aliannejadi, Hamed Zamani, and W. Bruce Croft. 2020. ANTIQUE: A non-factoid question answering benchmark. ECIR 2020.

module_name = antique[source]

CodeSearchNet

class capreolus.benchmark.codesearchnet.CodeSearchNetCorpus(config=None, provide=None, share_dependency_objects=False, build=True)[source]

CodeSearchNet Corpus. [1]

[1] Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. CodeSearchNet Challenge: Evaluating the State of Semantic Code Search. arXiv 2019.

module_name = codesearchnet_corpus[source]
class capreolus.benchmark.codesearchnet.CodeSearchNetChallenge(config=None, provide=None, share_dependency_objects=False, build=True)[source]

CodeSearchNet Challenge. [1] This benchmark can only be used for training (and challenge submissions) because no qrels are provided.

[1] Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. CodeSearchNet Challenge: Evaluating the State of Semantic Code Search. arXiv 2019.

module_name = codesearchnet_challenge[source]

(TREC) COVID

class capreolus.benchmark.covid.COVID(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Ongoing TREC-COVID bechmark from https://ir.nist.gov/covidSubmit that uses documents from CORD, the COVID-19 Open Research Dataset (https://www.semanticscholar.org/cord19).

module_name = covid[source]

Dummy

class capreolus.benchmark.dummy.DummyBenchmark(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Tiny benchmark for testing

module_name = dummy[source]

NF Corpus

class capreolus.benchmark.nf.NF(config=None, provide=None, share_dependency_objects=False, build=True)[source]

NFCorpus: A Full-Text Learning to Rank Dataset for Medical Information Retrieval [1]

[1] Vera Boteva, Demian Gholipour, Artem Sokolov and Stefan Riezler. A Full-Text Learning to Rank Dataset for Medical Information Retrieval Proceedings of the 38th European Conference on Information Retrieval (ECIR), Padova, Italy, 2016. https://www.cl.uni-heidelberg.de/statnlpgroup/nfcorpus/

module_name = nf[source]

(TREC) Robust04

class capreolus.benchmark.robust04.Robust04(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Robust04 benchmark using the title folds from Huston and Croft. [1] Each of these is used as the test set. Given the remaining four folds, we split them into the same train and dev sets used in recent work. [2]

[1] Samuel Huston and W. Bruce Croft. 2014. Parameters learned in the comparison of retrieval models using term dependencies. Technical Report.

[2] Sean MacAvaney, Andrew Yates, Arman Cohan, Nazli Goharian. 2019. CEDR: Contextualized Embeddings for Document Ranking. SIGIR 2019.

module_name = robust04[source]
class capreolus.benchmark.robust04.Robust04Yang19(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Robust04 benchmark using the folds from Yang et al. [1]

[1] Wei Yang, Kuang Lu, Peilin Yang, and Jimmy Lin. 2019. Critically Examining the “Neural Hype”: Weak Baselines and the Additivity of Effectiveness Gains from Neural Ranking Models. SIGIR 2019.

module_name = robust04.yang19[source]

Searchers

Note

Some searchers (e.g., BM25) automatically perform a cross-validated grid search when their parameters are provided as lists. For example, searcher.b=0.4,0.6,0.8 searcher.k1=1.0,1.5.

BM25

class capreolus.searcher.anserini.BM25(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Anserini BM25. This searcher’s parameters can also be specified as lists indicating parameters to grid search (e.g., "0.4,0.6,0.8,1.0" or "0.4..1,0.2").

module_name = BM25[source]

BM25 with Axiomatic expansion

class capreolus.searcher.anserini.AxiomaticSemanticMatching(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Anserini BM25 with Axiomatic query expansion. This searcher’s parameters can also be specified as lists indicating parameters to grid search (e.g., "0.4,0.6,0.8,1.0" or "0.4..1,0.2").

module_name = axiomatic[source]

BM25 with RM3 expansion

class capreolus.searcher.anserini.BM25RM3(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Anserini BM25 with RM3 expansion. This searcher’s parameters can also be specified as lists indicating parameters to grid search (e.g., "0.4,0.6,0.8,1.0" or "0.4..1,0.2").

module_name = BM25RM3[source]

BM25 PRF

class capreolus.searcher.anserini.BM25PRF(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Anserini BM25 PRF. This searcher’s parameters can also be specified as lists indicating parameters to grid search (e.g., "0.4,0.6,0.8,1.0" or "0.4..1,0.2").

module_name = BM25PRF[source]

F2Exp

class capreolus.searcher.anserini.F2Exp(config=None, provide=None, share_dependency_objects=False, build=True)[source]

F2Exp scoring model. This searcher does not support list parameters.

module_name = F2Exp[source]

F2Log

class capreolus.searcher.anserini.F2Log(config=None, provide=None, share_dependency_objects=False, build=True)[source]

F2Log scoring model. This searcher does not support list parameters.

module_name = F2Log[source]

I(n)L2

class capreolus.searcher.anserini.INL2(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Anserini I(n)L2 scoring model. This searcher does not support list parameters.

module_name = INL2[source]

QL with Dirichlet smoothing

class capreolus.searcher.anserini.DirichletQL(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Anserini QL with Dirichlet smoothing. This searcher’s parameters can also be specified as lists indicating parameters to grid search (e.g., "0.4,0.6,0.8,1.0" or "0.4..1,0.2").

module_name = DirichletQL[source]

QL with J-M smoothing

class capreolus.searcher.anserini.QLJM(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Anserini QL with Jelinek-Mercer smoothing. This searcher’s parameters can also be specified as lists indicating parameters to grid search (e.g., "0.4,0.6,0.8,1.0" or "0.4..1,0.2").

module_name = QLJM[source]

SDM

class capreolus.searcher.anserini.SDM(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Anserini BM25 with the Sequential Dependency Model. This searcher supports list parameters for only k1 and b.

module_name = SDM[source]

SPL

class capreolus.searcher.anserini.SPL(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Anserini SPL scoring model. This searcher does not support list parameters.

module_name = SPL[source]

Rerankers

Note

Rerankers are implemented in PyTorch or TensorFlow. Rerankers with TensorFlow implementations can run on both GPUs and TPUs.

CEDR-KNRM

class capreolus.reranker.CEDRKNRM.CEDRKNRM(config=None, provide=None, share_dependency_objects=False, build=True)[source]

PyTorch implementation of CEDR-KNRM. Equivalant to BERT-KNRM when cls=None.

CEDR: Contextualized Embeddings for Document Ranking Sean MacAvaney, Andrew Yates, Arman Cohan, and Nazli Goharian. SIGIR 2019. https://arxiv.org/pdf/1904.07094

module_name = CEDRKNRM[source]

CDSSM

class capreolus.reranker.CDSSM.CDSSM(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Grégoire Mesnil. 2014. A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval. In CIKM’14.

module_name = CDSSM[source]

ConvKNRM

class capreolus.reranker.ConvKNRM.ConvKNRM(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Zhuyun Dai, Chenyan Xiong, Jamie Callan, and Zhiyuan Liu. 2018. Convolutional Neural Networks for Soft-Matching N-Grams in Ad-hoc Search. In WSDM’18.

module_name = ConvKNRM[source]

DRMM

class capreolus.reranker.DRMM.DRMM(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft. 2016. A Deep Relevance Matching Model for Ad-hoc Retrieval. In CIKM’16.

module_name = DRMM[source]

DRMMTKS

class capreolus.reranker.DRMMTKS.DRMMTKS(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft. 2016. A Deep Relevance Matching Model for Ad-hoc Retrieval. In CIKM’16.

module_name = DRMMTKS[source]

DSSM

class capreolus.reranker.DSSM.DSSM(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In CIKM’13.

module_name = DSSM[source]

DUET

class capreolus.reranker.DUET.DUET(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Bhaskar Mitra, Fernando Diaz, and Nick Craswell. 2017. Learning to Match using Local and Distributed Representations of Text for Web Search. In WWW’17.

module_name = DUET[source]

DeepTileBars

class capreolus.reranker.DeepTileBar.DeepTileBar(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Zhiwen Tang and Grace Hui Yang. 2019. DeepTileBars: Visualizing Term Distribution for Neural Information Retrieval. In AAAI’19.

module_name = DeepTileBar[source]

HiNT

class capreolus.reranker.HINT.HINT(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Yixing Fan, Jiafeng Guo, Yanyan Lan, Jun Xu, Chengxiang Zhai, and Xueqi Cheng. 2018. Modeling Diverse Relevance Patterns in Ad-hoc Retrieval. In SIGIR’18.

module_name = HINT[source]

KNRM

class capreolus.reranker.KNRM.KNRM(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power. 2017. End-to-End Neural Ad-hoc Ranking with Kernel Pooling. In SIGIR’17.

module_name = KNRM[source]

PACRR

class capreolus.reranker.PACRR.PACRR(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Kai Hui, Andrew Yates, Klaus Berberich, and Gerard de Melo. 2017. PACRR: A Position-Aware Neural IR Model for Relevance Matching. EMNLP 2017.

module_name = PACRR[source]

POSITDRMM

class capreolus.reranker.POSITDRMM.POSITDRMM(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Ryan McDonald, George Brokos, and Ion Androutsopoulos. 2018. Deep Relevance Ranking Using Enhanced Document-Query Interactions. In EMNLP’18.

module_name = POSITDRMM[source]

TK

class capreolus.reranker.TK.TK(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Sebastian Hofstätter, Markus Zlabinger, and Allan Hanbury. 2019. TU Wien @ TREC Deep Learning ‘19 – Simple Contextualization for Re-ranking. In TREC ‘19.

module_name = TK[source]

TensorFlow KNRM

class capreolus.reranker.TFKNRM.TFKNRM(config=None, provide=None, share_dependency_objects=False, build=True)[source]

TensorFlow implementation of KNRM.

Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power. 2017. End-to-End Neural Ad-hoc Ranking with Kernel Pooling. In SIGIR’17.

module_name = TFKNRM[source]

TensorFlow BERT-MaxP

class capreolus.reranker.TFBERTMaxP.TFBERTMaxP(config=None, provide=None, share_dependency_objects=False, build=True)[source]

TensorFlow implementation of BERT-MaxP.

Deeper Text Understanding for IR with Contextual Neural Language Modeling. Zhuyun Dai and Jamie Callan. SIGIR 2019. https://arxiv.org/pdf/1905.09217.pdf

module_name = TFBERTMaxP[source]

TensorFlow CEDR-KNRM

class capreolus.reranker.TFCEDRKNRM.TFCEDRKNRM(config=None, provide=None, share_dependency_objects=False, build=True)[source]

TensorFlow implementation of CEDR-KNRM. Equivalant to BERT-KNRM when cls=None.

CEDR: Contextualized Embeddings for Document Ranking Sean MacAvaney, Andrew Yates, Arman Cohan, and Nazli Goharian. SIGIR 2019. https://arxiv.org/pdf/1904.07094

module_name = TFCEDRKNRM[source]

TensorFlow PARADE

class capreolus.reranker.parade.TFParade(config=None, provide=None, share_dependency_objects=False, build=True)[source]

TensorFlow implementation of PARADE.

PARADE: Passage Representation Aggregation for Document Reranking. Canjia Li, Andrew Yates, Sean MacAvaney, Ben He, and Yingfei Sun. arXiv 2020. https://arxiv.org/pdf/2008.09093.pdf

module_name = parade[source]