spaCy-based Parser Implementation
This section covers the spaCy-based implementation of the Parser interface, as well as the required implementation of the respective text element Protocols.
Bases: Parser
A Parser that performs morphosyntactic analysis on a raw string and returns
an instance of a concrete implementation of the DocumentProtocol.
Source code in src/limes/parsers/spacy_parser.py
Protocol Adapters
Bases: DocumentProtocol
Source code in src/limes/parsers/spacy_parser.py
text
property
The text contained in the given document.
noun_chunks
property
All noun chunks contained in the given document; a noun chunk is a span consisting of one or more nouns and - optionally - adjectives and/or auxiliary verbs.
sents
property
All sentences contained in the provided document. Any sentence is considered to be a document.
__iter__()
Iterate over the document, one token at a time. Iteration happens in the direction common in reading the language (e.g. "left to right" in German or English).
Source code in src/limes/parsers/spacy_parser.py
__getitem__(i)
__len__()
The length of the document, as counted by the number of tokens (i.e. distinct words or punctuation marks) contained within it.
span(start_idx, end_idx)
Create a span of all tokens between the start index and the end index provided.
Bases: SpanProtocol
Source code in src/limes/parsers/spacy_parser.py
text
property
The actual text of the tokens contained within the span.
noun_chunks
property
All noun chunks contained in the given span; a noun chunk is another span consisting of one or more nouns and - optionally - adjectives and/or auxiliary verbs.
__iter__()
Iterate over the Span, one token at a time. Iteration happens in the direction common in reading the language (e.g. "left to right" in German or English).
Source code in src/limes/parsers/spacy_parser.py
Bases: TokenProtocol
A token in a text. The token contains both of its string literal (i.e. the word) as well as its metadata (e.g. its part of speech, its position in the sentence etc.). The TokenAdapter wraps around a spaCy Token object to only expose attributes that adhere to the
Source code in src/limes/parsers/spacy_parser.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | |
text
property
The text of the token.
morph
property
The morphological analysis of the string token.
pos_
property
The part-of-speech tag of the given token (based on the Universal POS tagset).
fine_pos
property
The part-of-speech tag of the given token based on a language-specific tagset - if available for the given language.
dep_
property
The dependency tag of the given token.
lemma_
property
The lemma of the word contained within the given token.
i
property
The index of the token within the context of the document that contains it, where a document can be considered as a list of tokens.
head
property
The syntactic parent of the given token.
is_punct
property
Whether or not the given token is punctuation.
children
property
All tokens that constitutes descendants of the given token in the dependency tree of the document.
ancestors
property
All tokens that constitute ancestors of the given token in the dependency tree of the document.
subtree
property
The given token as well as all its descendants in the dependency tree of the given document.