Python API

Capreolus exposes an API that supports functionality similar to its command line interface (CLI). This API is currently a work-in-progress that will be expanded in the next release. This page assumes the reader is already familiar with running experiments using Capreolus’ CLI.

Basic usage

import capreolus
pipeline = capreolus.train_pipeline({"reranker": "KNRM", "niters": 2, "benchmark": "robust04.title"})

The train_pipeline method trains a reranking pipeline for the specified number of iterations (niters). This function expects a config dict as an argument, with keys and values corresponding exactly to those provided on the command line.

The train_pipeline method returns a Pipeline object describing the pipeline that was run. In the above example, pipeline.reranker would correspond to a trained reranker.KNRM object and pipeline.reranker_path indicates the path where output was stored. The pipeline’s outputs are the same as when using the CLI’s train command.

Pipeline config

The configuration dict accepted by train_pipeline accepts the same config options as the CLI. As with the CLI, any missing config options are filled in with reasonable defaults.

Train a pipeline:

config = {
  "reranker": "KNRM",
  "benchmark": "robust04.title.wsdm20demo",
  "niters": 10,
  "expid": "testing",
pipeline = capreolus.train_pipeline(config)

Retrieve the full config used, after missing keys have been filled in with defaults:

>>> print(pipeline.cfg)
{collection': 'robust04', ... 'reranker': KNRM', ... 'embeddings': 'glove6b', ... }

Evaluating a trained model

pipeline = capreolus.train_pipeline(config)

# results will be a dict of the form: {'map': 0.3423, 'ndcg': '0.12312', ...}
results = capreolus.evaluate_pipeline(pipeline)