Installation

This section covers installing Capreolus via its pip package or from source.

Java

Java 11 is required. On Debian-based distributions, this can be installed with sudo apt install openjdk-11-jre. You may additionally need to set the JAVA_HOME environment variable and/or use update-alternatives to ensure the correct version of Java is used by default.

$ java -version
openjdk version "11.0.5" 2019-10-15
OpenJDK Runtime Environment (build 11.0.5+10-post-Ubuntu-0ubuntu1.119.04)
OpenJDK 64-Bit Server VM (build 11.0.5+10-post-Ubuntu-0ubuntu1.119.04, mixed mode, sharing)

Python

Setup a Python 3.6+ environment in your home directory. We strongly recommend using Conda for performance reasons, but this is not strictly necessary.

a) Recommended conda approach: install pyenv into your home directory, and then to use pyenv to install an anaconda or miniconda distribution with Python 3.6+ (3.7+ preferred).

b) Alternate conda approach: manually install a miniconda distribution with Python 3.6+.

c) Alternate approach with system Python: install Python 3.6+ with your system’s package manager (e.g., sudo apt install python3). If you do not create and activate a virtual environment (venv) as described below, you will need to use sudo when installing packages with pip below.

You may optionally setup a virtual environment using venv to isolate Capreolus and its dependencies from other packages. This is especially useful if using a system Python, because it allows you to install packages (for your own user) without sudo.

Installing Capreolus

From source

  • Clone the repository: git clone git@github.com:capreolus-ir/capreolus.git and cd into it.
  • Install PyTorch 1.2. Note that the installation command differs depending on your CUDA version and whether you’re using a Conda distribution (see “Conda” section) or a system Python (see “Wheel” section with pip commands).
  • Install other requirements: pip install -r requirements.txt. (This command is the same regardless of which Python distribution you’re using.)

Configuring Capreolus

Capreolus uses environment variables to indicate where outputs should be stored and where document inputs can be found. Consult the list below to determine which variables should be set. Set these environment variables either on the fly (export CAPREOLUS_RESULTS=...) before running Capreolus or by editing your shell’s initialization files (e.g., ~/.bashrc or ~/.zshrc).

  • CAPREOLUS_RESULTS: the directory where results will be stored. If unset, defaults to ~/.capreolus/results/
  • CAPREOLUS_CACHE: the directory where cache files are stored. If unset, defaults to ~/.capreolus/cache/
  • CAPREOLUS_LOGGING: Indicates the logging level. Can take on the values INFO, DEBUG, WARN and ERROR
  • CUDA_VISIBLE_DEVICES: Indicates GPUs available to PyTorch, starting from 0. For example, setting to ‘1’ will use the system’s 2nd GPU (as numbered by nvidia-smi). Set to ‘’ (an empty string) to force CPU.

To avoid confusion and failed experiments due to limited disk space, we recommend always setting CAPREOLUS_RESULTS and CAPREOLUS_CACHE rather than relying on the default behavior. Typically, CUDA_VISIBLE_DEVICES is set immediately before running an experiment (e.g., to run several separate experiments on different GPUs in parallel).

You’re now ready to run capreolus.