`capreolus.sampler`¶

Package Contents¶

Classes¶

`Sampler`	Base class for profane modules.
`TrainingSamplerMixin`
`TrainTripletSampler`	Samples training data triplets. Each samples is of the form (query, relevant doc, non-relevant doc)
`TrainPairSampler`	Samples training data pairs. Each sample is of the form (query, doc)
`LCETrainSampler`	Samples training data triplets. Each samples is of the form (query, relevant doc, non-relevant doc)
`PredSampler`	Creates a Dataset for evaluation (test) data to be used with a pytorch DataLoader

Attributes¶

logger

capreolus.sampler.logger[source]¶

class capreolus.sampler.Sampler(config=None, provide=None, share_dependency_objects=False, build=True)[source]¶

Bases: capreolus.ModuleBase

Base class for profane modules. Module construction proceeds as follows: 1) Any config options not present in config are filled in with their default values. Config options and their defaults are specified in the config_spec class attribute. 2) Any dependencies declared in the dependencies class attribute are recursively instantiated. If the dependency object is present in provide, this object will be used instead of instantiating a new object for the dependency. 3) The module object’s config variable is updated to reflect the configs of its dependencies and then frozen.

After construction is complete, the module’s dependencies are available as instance variables: self.`dependency key`.

Parameters

config – dictionary containing a config to apply to this module and its dependencies
provide – dictionary mapping dependency keys to module objects
share_dependency_objects – if true, dependencies will be cached in the registry based on their configs and reused. See the share_objects argument of ModuleBase.create.

module_type = 'sampler'[source]¶

requires_random_seed = True[source]¶

prepare(qid_to_docids, qrels, extractor, relevance_level=1, **kwargs)[source]¶: params: qid_to_docids: A dict of the form {qid: [list of docids to rank]} qrels: A dict of the form {qid: {docid: label}} extractor: An Extractor instance (eg: EmbedText) relevance_level: Threshold score below which documents are considered to be non-relevant.

abstract get_hash()[source]¶

get_total_samples()[source]¶

abstract generate_samples()[source]¶

class capreolus.sampler.TrainingSamplerMixin[source]¶

clean()[source]¶

class capreolus.sampler.TrainTripletSampler(config=None, provide=None, share_dependency_objects=False, build=True)[source]¶

Bases: Sampler, TrainingSamplerMixin, torch.utils.data.IterableDataset

Samples training data triplets. Each samples is of the form (query, relevant doc, non-relevant doc)

module_name = 'triplet'[source]¶

get_hash()[source]¶

generate_samples()[source]¶: Generates triplets infinitely.

class capreolus.sampler.TrainPairSampler(config=None, provide=None, share_dependency_objects=False, build=True)[source]¶

Bases: Sampler, TrainingSamplerMixin, torch.utils.data.IterableDataset

Samples training data pairs. Each sample is of the form (query, doc) The number of generate positive and negative samples are the same.

module_name = 'pair'[source]¶

dependencies = [][source]¶

get_hash()[source]¶

generate_samples()[source]¶

class capreolus.sampler.LCETrainSampler(config=None, provide=None, share_dependency_objects=False, build=True)[source]¶

Bases: TrainTripletSampler

Samples training data triplets. Each samples is of the form (query, relevant doc, non-relevant doc)

module_name = 'LCE'[source]¶

config_spec[source]¶

get_hash()[source]¶

generate_samples()[source]¶: Generates (pos, neg * n) infinitely.

class capreolus.sampler.PredSampler(config=None, provide=None, share_dependency_objects=False, build=True)[source]¶

Bases: Sampler, torch.utils.data.IterableDataset

Creates a Dataset for evaluation (test) data to be used with a pytorch DataLoader

module_name = 'pred'[source]¶

requires_random_seed = False[source]¶

get_hash()[source]¶

generate_samples()[source]¶

clean()[source]¶

get_qid_docid_pairs()[source]¶: Returns a generator for the (qid, docid) pairs. Useful if you want to sequentially access the pred pairs without extracting the actual content

capreolus.sampler¶

Package Contents¶

Classes¶

Attributes¶

`capreolus.sampler`¶