capreolus.sampler

Package Contents

Classes

Sampler Base class for profane modules.
TrainTripletSampler Samples training data triplets. Each samples is of the form (query, relevant doc, non-relevant doc)
TrainPairSampler Samples training data pairs. Each sample is of the form (query, doc)
PredSampler Creates a Dataset for evaluation (test) data to be used with a pytorch DataLoader
capreolus.sampler.logger[source]
class capreolus.sampler.Sampler(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Bases: capreolus.ModuleBase

Base class for profane modules. Module construction proceeds as follows: 1) Any config options not present in config are filled in with their default values. Config options and their defaults are specified in the config_spec class attribute. 2) Any dependencies declared in the dependencies class attribute are recursively instantiated. If the dependency object is present in provide, this object will be used instead of instantiating a new object for the dependency. 3) The module object’s config variable is updated to reflect the configs of its dependencies and then frozen.

After construction is complete, the module’s dependencies are available as instance variables: self.`dependency key`.

Parameters:
  • config – dictionary containing a config to apply to this module and its dependencies
  • provide – dictionary mapping dependency keys to module objects
  • share_dependency_objects – if true, dependencies will be cached in the registry based on their configs and reused. See the share_objects argument of ModuleBase.create.
module_type = sampler[source]
requires_random_seed = True[source]
prepare(self, qid_to_docids, qrels, extractor, relevance_level=1, **kwargs)[source]

params: qid_to_docids: A dict of the form {qid: [list of docids to rank]} qrels: A dict of the form {qid: {docid: label}} extractor: An Extractor instance (eg: EmbedText) relevance_level: Threshold score below which documents are considered to be non-relevant.

clean(self)[source]
get_hash(self)[source]
get_total_samples(self)[source]
generate_samples(self)[source]
class capreolus.sampler.TrainTripletSampler(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Bases: capreolus.sampler.Sampler, torch.utils.data.IterableDataset

Samples training data triplets. Each samples is of the form (query, relevant doc, non-relevant doc)

module_name = triplet[source]
get_hash(self)[source]
generate_samples(self)[source]

Generates triplets infinitely.

class capreolus.sampler.TrainPairSampler(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Bases: capreolus.sampler.Sampler, torch.utils.data.IterableDataset

Samples training data pairs. Each sample is of the form (query, doc) The number of generate positive and negative samples are the same.

module_name = pair[source]
dependencies = [][source]
get_hash(self)[source]
generate_samples(self)[source]
class capreolus.sampler.PredSampler(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Bases: capreolus.sampler.Sampler, torch.utils.data.IterableDataset

Creates a Dataset for evaluation (test) data to be used with a pytorch DataLoader

module_name = pred[source]
requires_random_seed = False[source]
get_hash(self)[source]
generate_samples(self)[source]
get_qid_docid_pairs(self)[source]

Returns a generator for the (qid, docid) pairs. Useful if you want to sequentially access the pred pairs without extracting the actual content