`capreolus.extractor.embedtext`¶

Module Contents¶

Classes¶

EmbedText

Base class for Extractor modules. The purpose of an Extractor is to convert queries and documents to a representation suitable for use with a Reranker module.

Attributes¶

logger

capreolus.extractor.embedtext.logger[source]¶

class capreolus.extractor.embedtext.EmbedText(config=None, provide=None, share_dependency_objects=False, build=True)[source]¶

Bases: capreolus.extractor.Extractor

Base class for Extractor modules. The purpose of an Extractor is to convert queries and documents to a representation suitable for use with a Reranker module.

Modules should provide:

an id2vec(qid, posid, negid=None) method that converts the given query and document ids to an appropriate representation

module_name = 'embedtext'[source]¶

requires_random_seed = True[source]¶

dependencies[source]¶

config_spec[source]¶

pad_tok = '<pad>'[source]¶

build()[source]¶

get_tf_feature_description()[source]¶

create_tf_feature(sample)[source]¶: sample - output from self.id2vec() return - a tensorflow feature

parse_tf_example(example_proto)[source]¶

preprocess(qids, docids, topics)[source]¶

get_doc_tokens(docid)[source]¶

id2vec(qid, posid, negid=None, *args, **kwargs)[source]¶: Creates a feature from the (qid, docid) pair. If negdocid is supplied, that’s also included in the feature (needed for training with pairwise hinge loss) Label is a vector of shape [num_classes], and is supplied only when using pointwise training (i.e cross entropy) When using pointwise samples, negdocid is None, and label is either [0, 1] or [1, 0] depending on whether the document represented by posdocid is relevant or irrelevant respectively.

capreolus.extractor.embedtext¶

Module Contents¶

Classes¶

Attributes¶

`capreolus.extractor.embedtext`¶