This section covers installing Capreolus via its pip package or from source.
Java 11 is required. On Debian-based distributions, this can be installed with
sudo apt install openjdk-11-jre. You may additionally need to set the
JAVA_HOME environment variable and/or use
update-alternatives to ensure the correct version of Java is used by default.
$ java -version openjdk version "11.0.5" 2019-10-15 OpenJDK Runtime Environment (build 11.0.5+10-post-Ubuntu-0ubuntu1.119.04) OpenJDK 64-Bit Server VM (build 11.0.5+10-post-Ubuntu-0ubuntu1.119.04, mixed mode, sharing)
Setup a Python 3.6+ environment in your home directory. We strongly recommend using Conda for performance reasons, but this is not strictly necessary.
a) Recommended conda approach: install pyenv into your home directory, and then to use pyenv to install an anaconda or miniconda distribution with Python 3.6+ (3.7+ preferred).
b) Alternate conda approach: manually install a miniconda distribution with Python 3.6+.
c) Alternate approach with system Python: install Python 3.6+ with your system’s package manager (e.g.,
sudo apt install python3). If you do not create and activate a virtual environment (venv) as described below, you will need to use
sudo when installing packages with
You may optionally setup a virtual environment using
venv to isolate Capreolus and its dependencies from other packages. This is especially useful if using a system Python, because it allows you to install packages (for your own user) without
Via pip package (recommended)¶
pip install capreolus
- Clone the repository:
git clone firstname.lastname@example.org:capreolus-ir/capreolus.gitand
- Install PyTorch 1.2. Note that the installation command differs depending on your CUDA version and whether you’re using a Conda distribution (see “Conda” section) or a system Python (see “Wheel” section with
- Install other requirements:
pip install -r requirements.txt. (This command is the same regardless of which Python distribution you’re using.)
Capreolus uses environment variables to indicate where outputs should be stored and where document inputs can be found. Consult the list below to determine which variables should be set. Set these environment variables either on the fly (
export CAPREOLUS_RESULTS=...) before running Capreolus or by editing your shell’s initialization files (e.g.,
CAPREOLUS_RESULTS: the directory where results will be stored. If unset, defaults to
CAPREOLUS_CACHE: the directory where cache files are stored. If unset, defaults to
CAPREOLUS_LOGGING: Indicates the logging level. Can take on the values
CUDA_VISIBLE_DEVICES: Indicates GPUs available to PyTorch, starting from 0. For example, setting to ‘1’ will use the system’s 2nd GPU (as numbered by
nvidia-smi). Set to ‘’ (an empty string) to force CPU.
To avoid confusion and failed experiments due to limited disk space, we recommend always setting
CAPREOLUS_CACHE rather than relying on the default behavior. Typically,
CUDA_VISIBLE_DEVICES is set immediately before running an experiment (e.g., to run several separate experiments on different GPUs in parallel).
You’re now ready to run