capreolus.benchmark.covid

Module Contents

Classes

COVID

Ongoing TREC-COVID bechmark from https://ir.nist.gov/covidSubmit that uses documents from CORD, the COVID-19 Open Research Dataset (https://www.semanticscholar.org/cord19).

CovidQA

Base class for Benchmark modules. The purpose of a Benchmark is to provide the data needed to run an experiment, such as queries, folds, and relevance judgments.

Attributes

logger

PACKAGE_PATH

capreolus.benchmark.covid.logger[source]
capreolus.benchmark.covid.PACKAGE_PATH[source]
class capreolus.benchmark.covid.COVID(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Bases: capreolus.benchmark.Benchmark

Ongoing TREC-COVID bechmark from https://ir.nist.gov/covidSubmit that uses documents from CORD, the COVID-19 Open Research Dataset (https://www.semanticscholar.org/cord19).

module_name = covid[source]
dependencies[source]
data_dir[source]
topic_url = https://ir.nist.gov/covidSubmit/data/topics-rnd%d.xml[source]
qrel_url_v1 = https://ir.nist.gov/covidSubmit/data/qrels-rnd%d.txt[source]
qrel_url_v2 = https://ir.nist.gov/covidSubmit/data/qrels-covid_d%d_j0.5-%d.txt[source]
lastest_round = 5[source]
query_type = title[source]
config_spec[source]
build(self)[source]
download_if_missing(self)[source]
prep_backward_compatible_qrels(self, tmp_dir, prev_qrels_fn, tgt_qrel_fn)[source]
Prepare qrels file for round 3 adaptable to previous rounds:

convert the new docids in qrels-covid_d3_j0.5-3.txt back to its old id remove judgement existed in round1 and round2

Warning: this function should not be used when search / training is done on collection released since round 4, where docids are already updated

Parameters
  • tmp_dir – pathlib.Path object, sthe directory to store downloaded files

  • prev_qrels_fn – qrels file which store the qrels from previous rounds (round 1 and round 2)

  • tgt_qrel_fn – qrels file path where to store the processed round 3 qrels file

xml2trectopic(self, xmlfile)[source]
class capreolus.benchmark.covid.CovidQA(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Bases: capreolus.benchmark.Benchmark

Base class for Benchmark modules. The purpose of a Benchmark is to provide the data needed to run an experiment, such as queries, folds, and relevance judgments.

Modules should provide:
  • a topics dict mapping query ids (qids) to queries

  • a qrels dict mapping qids to docids and relevance labels

  • a folds dict mapping a fold name to training, dev (validation), and testing qids

  • if these can be loaded from files in standard formats, they can be specified by setting the topic_file, qrel_file, and fold_file, respectively, rather than by setting the above attributes directly

module_name = covidqa[source]
dependencies[source]
url = https://raw.githubusercontent.com/castorini/pygaggle/master/data/kaggle-lit-review-%s.json[source]
available_versions = ['0.1', '0.2'][source]
datadir[source]
config_spec[source]
build(self)[source]
download_if_missing(self)[source]