The Python bindings are distributed as the theopendictionary package on PyPI. They are native extensions built with PyO3 .
pip install theopendictionary
Requires Python 3.8.1+.
from theopendictionary import OpenDictionary, compile
<dictionary name="My Dictionary">
<definition value="A greeting">
<example value="Hello, world!" />
compiled_bytes = compile ( xml )
dictionary = OpenDictionary ( compiled_bytes )
results = dictionary. lookup ( " hello " )
print ( results [ 0 ] .entry.term ) # "hello"
print ( results [ 0 ] .entry.etymologies ) # [Etymology(...)]
Compiles an ODXML string into binary .odict data (as a bytes object). This data can be passed to OpenDictionary() or saved to disk.
from theopendictionary import compile
data = compile ( " <dictionary><entry term='hi'><ety><sense><definition value='greeting'/></sense></ety></entry></dictionary> " )
The main class for working with compiled dictionaries.
Creates a dictionary from compiled binary data (as returned by compile()) or directly from an XML string.
from theopendictionary import OpenDictionary, compile
data = compile ( xml_string )
dictionary = OpenDictionary ( data )
# Directly from XML string
dictionary = OpenDictionary ( xml_string )
Loads a dictionary from a file path, alias, or remote identifier. This is an async method.
If dictionary is a path to a .odict file, it loads from disk.
If it matches the format org/lang (e.g. wiktionary/eng), it downloads from the remote registry.
from theopendictionary import OpenDictionary, LoadOptions, RemoteLoadOptions
dictionary = await OpenDictionary. load ( " ./my-dictionary.odict " )
# Load from remote registry
dictionary = await OpenDictionary. load ( " wiktionary/eng " )
remote = RemoteLoadOptions ( caching = True )
dictionary = await OpenDictionary. load ( " wiktionary/eng " , opts )
Property Type Description min_rankint | NoneThe minimum rank value across all entries, or None if no entries have ranks max_rankint | NoneThe maximum rank value across all entries, or None if no entries have ranks
Saves the dictionary to disk as a .odict file. Optionally configure Brotli compression.
Parameter Type Default Description pathstr— Output file path qualityint | NoneNoneBrotli compression level (0–11) window_sizeint | NoneNoneBrotli window size (0–22)
dictionary. save ( " output.odict " )
dictionary. save ( " output.odict " , quality = 11 , window_size = 22 )
Looks up one or more terms by exact match.
Parameter Type Default Description querystr | list[str]— Term(s) to look up splitint | NoneNoneMinimum word length for compound splitting followbool | NoneNoneFollow see cross-references until an entry with etymologies is found insensitivebool | NoneNoneEnable case-insensitive matching
results = dictionary. lookup ( " cat " )
results = dictionary. lookup ( [ " cat " , " dog " ] )
# Follow cross-references, case-insensitive
results = dictionary. lookup ( " RaN " , follow = True , insensitive = True )
# results[0].entry.term == "run"
# results[0].directed_from.term == "ran"
# Compound word splitting
results = dictionary. lookup ( " catdog " , split = 3 )
Splits one or more compound terms into component dictionary entries. Unlike lookup(..., split=N), this does not try the whole query first.
Parameter Type Default Description querystr | list[str]— Term(s) to split min_lengthint | NoneNoneMinimum character length for each segment followbool | NoneNoneFollow see cross-references insensitivebool | NoneNoneEnable case-insensitive matching
results = dictionary. split ( " catdog " , min_length = 3 )
results = dictionary. split ( " CATdog " , min_length = 3 , insensitive = True )
Returns all terms defined in the dictionary, sorted alphabetically.
words = dictionary. lexicon ()
# ["cat", "dog", "run", ...]
Creates a full-text search index for the dictionary.
Parameter Type Default Description optionsIndexOptions | NoneNoneIndexing configuration
from theopendictionary import IndexOptions
dictionary. index ( IndexOptions ( overwrite = True , memory = 50_000_000 ))
Runs a full-text search across the dictionary. Requires an index (call index() first).
Parameter Type Default Description querystr— Search query optionsSearchOptions | NoneNoneSearch configuration
from theopendictionary import SearchOptions
results = dictionary. search ( " domesticated mammal " )
results = dictionary. search ( " greeting " , SearchOptions ( limit = 5 ))
Tokenizes text using NLP-based segmentation and matches each token against the dictionary. Supports Chinese, Japanese, Korean, Thai, Khmer, German, Swedish, and Latin-script languages.
Parameter Type Default Description textstr— Text to tokenize followbool | int | NoneNoneFollow see cross-references. Accepts True/False or a number (nonzero = follow) insensitivebool | NoneNoneCase-insensitive matching
tokens = dictionary. tokenize ( " the cat ran " )
print ( token.lemma , token.entries )
Property Type Description entryEntryThe matched entry directed_fromEntry | NoneThe original entry if a see redirect was followed
Property Type Description termstrThe headword rankint | NoneOptional frequency rank see_alsostr | NoneCross-reference target term etymologieslist[Etymology]List of etymologies medialist[MediaURL]Media URLs
Property Type Description lemmastrThe original token text languagestr | NoneDetected language code scriptstrDetected script name kindstrToken kind startintStart offset in the original text endintEnd offset in the original text entrieslist[LookupResult]Matched dictionary entries
Parameter Type Default Description directorystr | NoneNoneCustom directory for the index memoryint | NoneNoneMemory arena per thread in bytes (must be >15MB) overwritebool | NoneNoneOverwrite existing index
Parameter Type Default Description directorystr | NoneNoneCustom index directory thresholdint | NoneNoneRelevance threshold autoindexbool | NoneNoneAuto-create index if missing limitint | NoneNoneMaximum results
Parameter Type Default Description min_lengthint | NoneNoneMinimum character length for each segment followbool | NoneNoneFollow see cross-references insensitivebool | NoneNoneEnable case-insensitive matching
Property Type Description kindEnumWrapper | NoneThe pronunciation system (e.g. IPA, Pinyin) valuestrThe pronunciation notation medialist[MediaURL]Audio URLs
Property Type Description srcstrURL or path to the media file mime_typestr | NoneMIME type (e.g. audio/mpeg) descriptionstr | NoneDescription of the media