capreolus.extractor

Package Contents

Classes

Extractor(config=None, provide=None, share_dependency_objects=False, build=True) Base class for profane modules.
EmbedText(config=None, provide=None, share_dependency_objects=False, build=True) Base class for profane modules.
BertText(config=None, provide=None, share_dependency_objects=False, build=True) Base class for profane modules.
capreolus.extractor.logger[source]
class capreolus.extractor.Extractor(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Bases: profane.ModuleBase

Base class for profane modules. Module construction proceeds as follows: 1) Any config options not present in config are filled in with their default values. Config options and their defaults are specified in the config_spec class attribute. 2) Any dependencies declared in the dependencies class attribute are recursively instantiated. If the dependency object is present in provide, this object will be used instead of instantiating a new object for the dependency. 3) The module object’s config variable is updated to reflect the configs of its dependencies and then frozen.

After construction is complete, the module’s dependencies are available as instance variables: self.`dependency key`.

Parameters:
  • config – dictionary containing a config to apply to this module and its dependencies
  • provide – dictionary mapping dependency keys to module objects
  • share_dependency_objects – if true, dependencies will be cached in the registry based on their configs and reused. See the share_objects argument of ModuleBase.create.
module_type = extractor[source]
cache_state(self, qids, docids)[source]
load_state(self, qids, docids)[source]
get_state_cache_file_path(self, qids, docids)[source]

Returns the path to the cache file used to store the extractor state, regardless of whether it exists or not

is_state_cached(self, qids, docids)[source]

Returns a boolean indicating whether the state corresponding to the qids and docids passed has already been cached

build_from_benchmark(self, *args, **kwargs)[source]
class capreolus.extractor.EmbedText(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Bases: capreolus.extractor.Extractor

Base class for profane modules. Module construction proceeds as follows: 1) Any config options not present in config are filled in with their default values. Config options and their defaults are specified in the config_spec class attribute. 2) Any dependencies declared in the dependencies class attribute are recursively instantiated. If the dependency object is present in provide, this object will be used instead of instantiating a new object for the dependency. 3) The module object’s config variable is updated to reflect the configs of its dependencies and then frozen.

After construction is complete, the module’s dependencies are available as instance variables: self.`dependency key`.

Parameters:
  • config – dictionary containing a config to apply to this module and its dependencies
  • provide – dictionary mapping dependency keys to module objects
  • share_dependency_objects – if true, dependencies will be cached in the registry based on their configs and reused. See the share_objects argument of ModuleBase.create.
module_name = embedtext[source]
requires_random_seed = True[source]
dependencies[source]
config_spec[source]
pad = 0[source]
pad_tok = <pad>[source]
embed_paths[source]
load_state(self, qids, docids)[source]
cache_state(self, qids, docids)[source]
get_tf_feature_description(self)[source]
create_tf_feature(self, sample)[source]

sample - output from self.id2vec() return - a tensorflow feature

parse_tf_example(self, example_proto)[source]
exist(self)[source]
preprocess(self, qids, docids, topics)[source]
id2vec(self, qid, posid, negid=None)[source]
class capreolus.extractor.BertText(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Bases: capreolus.extractor.Extractor

Base class for profane modules. Module construction proceeds as follows: 1) Any config options not present in config are filled in with their default values. Config options and their defaults are specified in the config_spec class attribute. 2) Any dependencies declared in the dependencies class attribute are recursively instantiated. If the dependency object is present in provide, this object will be used instead of instantiating a new object for the dependency. 3) The module object’s config variable is updated to reflect the configs of its dependencies and then frozen.

After construction is complete, the module’s dependencies are available as instance variables: self.`dependency key`.

Parameters:
  • config – dictionary containing a config to apply to this module and its dependencies
  • provide – dictionary mapping dependency keys to module objects
  • share_dependency_objects – if true, dependencies will be cached in the registry based on their configs and reused. See the share_objects argument of ModuleBase.create.
module_name = berttext[source]
dependencies[source]
config_spec[source]
pad = 0[source]
pad_tok = <pad>[source]
static config()[source]
load_state(self, qids, docids)[source]
cache_state(self, qids, docids)[source]
get_tf_feature_description(self)[source]
create_tf_feature(self, sample)[source]

sample - output from self.id2vec() return - a tensorflow feature

parse_tf_example(self, example_proto)[source]
exist(self)[source]
preprocess(self, qids, docids, topics)[source]
id2vec(self, qid, posid, negid=None)[source]
get_mask(self, doc, to_len)[source]

Returns a mask where it is 1 for actual toks and 0 for pad toks