capreolus.extractor.bagofwords

Module Contents

Classes

BagOfWords

Bag of Words (or bag of trigrams when datamode=trigram) extractor. Used with the DSSM reranker.

Attributes

logger

capreolus.extractor.bagofwords.logger[source]
class capreolus.extractor.bagofwords.BagOfWords(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Bases: capreolus.extractor.Extractor

Bag of Words (or bag of trigrams when datamode=trigram) extractor. Used with the DSSM reranker.

module_name = 'bagofwords'[source]
dependencies[source]
config_spec[source]
pad = 0[source]
pad_tok = '<pad>'[source]
load_state(qids, docids)[source]
cache_state(qids, docids)[source]
get_trigrams_for_toks(toks_list)[source]
exist()[source]
preprocess(qids, docids, topics)[source]
id2vec(q_id, posdoc_id, negdoc_id=None, *args, **kwargs)[source]

Creates a feature from the (qid, docid) pair. If negdocid is supplied, that’s also included in the feature (needed for training with pairwise hinge loss) Label is a vector of shape [num_classes], and is supplied only when using pointwise training (i.e cross entropy) When using pointwise samples, negdocid is None, and label is either [0, 1] or [1, 0] depending on whether the document represented by posdocid is relevant or irrelevant respectively.

transform_txt(term_list, maxlen)[source]