Capreolus is a toolkit for constructing flexible ad hoc retrieval pipelines. Capreolus pipelines can be run via a Python or command line interface.
Capreolus is organized around the idea of interchangeable and configurable modules, such as a neural
Reranker or a first stage
Searcher. Researchers can implement new module classes, such as a new neural
Reranker, to experiment with a new module while controlling for all other variables in the pipeline (e.g., the first stage ranking method and its parameters, folds used for cross-validation, tokenization and embeddings if applicable used with the reranker, neural training options like the number of iterations, batch size, and loss function, etc).
Since Capreolus v0.2, pipelines are instances of the
Task module and can be combined like any other module.
For example, the
RerankTask implements a “search-then-rerank” pipeline by running
RankTask and reranking its output.
Task modules respect the same folds (provided by a
Benchmark) and can be configured independently (e.g., to optimize for different metrics).
Looking for the code? Find Capreolus on GitHub.
Authors: Andrew Yates, Kevin Martin Jose, Xinyu Zhang, Siddhant Arora, Wei Yang, Jimmy Lin
Looking for the previous “search-then-rerank” pipeline that was presented in the WSDM‘20 demo paper? Check out Capreolus v0.1 and the corresponding documentation.