capreolus.tokenizer.bert

Module Contents

Classes

BertTokenizer Base class for Tokenizer modules. The purpose of a Tokenizer is to tokenize strings of text (e.g., as required by an Extractor).
class capreolus.tokenizer.bert.BertTokenizer(config=None, provide=None, share_dependency_objects=False, build=True)[source]

Bases: capreolus.tokenizer.Tokenizer

Base class for Tokenizer modules. The purpose of a Tokenizer is to tokenize strings of text (e.g., as required by an Extractor).

Modules should provide:
  • a tokenize(strings) method that takes a list of strings and returns tokenized versions
module_name = berttokenizer[source]
config_spec[source]
build(self)[source]
convert_tokens_to_ids(self, tokens)[source]
tokenize(self, sentences)[source]