concepts.benchmark.common.vocab.Vocab#
- class Vocab[source]#
Bases:
object
A simple vocabulary class.
Methods
add
(word)Add a word to the vocabulary.
add_word
(word)Add a word to the vocabulary.
check_json_consistency
(json_file)Check whether the vocabulary is consistent with a json file.
dump_json
(json_file)Dump the vocabulary to a json file.
from_dataset
(dataset, keys[, extra_words, ...])Generate a vocabulary from a dataset.
from_json
(json_file)Load a vocabulary from a json file.
from_list
(dataset[, extra_words, single_word])Generate a vocabulary from a list of strings.
invmap_sequence
(sequence[, proc_be])Map a sequence of indices to a sequence of words.
map
(word)Map a word to its index.
map_fields
(feed_dict, fields)Map the content in a specified set of fields in a dictionary to indices.
map_sequence
(sequence[, add_be])Map a sequence of words to a sequence of indices.
words
()Attributes
A dictionary mapping indices to words.
- __init__(word2idx=None)[source]#
Initialize the vocabulary.
- Parameters:
word2idx – a dictionary mapping words to indices. If not specified, the vocabulary will be empty.
- __new__(**kwargs)#
- add(word)[source]#
Add a word to the vocabulary. Alias of
add_word()
.- Parameters:
word (str)
- check_json_consistency(json_file)[source]#
Check whether the vocabulary is consistent with a json file.
- classmethod from_dataset(dataset, keys, extra_words=None, single_word=False)[source]#
Generate a vocabulary from a dataset.
- Parameters:
- Return type:
- classmethod from_list(dataset, extra_words=None, single_word=False)[source]#
Generate a vocabulary from a list of strings.
- invmap_sequence(sequence, proc_be=False)[source]#
Map a sequence of indices to a sequence of words. If the argument proc_be is True, the begin-of-sentence and end-of-sentence tokens will be removed from the sequence.
- map(word)[source]#
Map a word to its index. If the word is not in the vocabulary, return the index of the unknown token.
- map_fields(feed_dict, fields)[source]#
Map the content in a specified set of fields in a dictionary to indices. The argument fields is a list of keys in the dictionary to map. This function will modify the dictionary in-place.