BS ISO 24611:2012 pdf download
BS ISO 24611:2012 pdf download.L anguage resource management一Morpho-syntactic annotation framework(MAF).
3.5
grapheme
minimal unit in a wntten language
EXAMPLE Letter, pactogram, Ideogram, numeral, potictuatlon.
3.6
inflection
modification or marling of a lexenie that reflects its morpho-syntactic properties
3.7
inflected form
form that a word can take when used in a sentence or a phrase
Note ito entry: An inflected (ann at a word is associated with a conibliation at morphological features, such as grammatical number and case.
3.8 lemma lernmatised torn, conventional fomi chosen to represent a lexeme
Note 1 to entry: In European languages, the lemma Is usually the singular it mere is a variation si nLanber, the masculine form II mere is a vanatron In gander, and the Inhlnatlve foq’ aM verbs In some languages, certain nouns are defective In the sangtAar form; in these cases, the plural Is chosen. For verbs In Arabic, the lemma is usuaSy deemed to be the third person singular with the accomplished aspect.
3-9
lexerne
morpheme generally associated with a set of word-forms sharing a common meaning
3.10
lexical entry
container for managing a set of word-forms and possibly one or more meanings to descnbe a lexeme
3.11
lexicon
resource comprising a collection of lexical entries for a language
3.12
morpheme
smallest linguistic unit that carries a meaning in a discourse, but which cannot be divided into smaller meaningful units
Note I to entry: A morpheme Is either grammatical (grammeme) or lexical (lexeme)
3.13
morphological feature morpho-syntactic feature
feature induced from the inflected form of a word
Note 1 to entry The ISOCat data category registry provides a coniøreflensive iis 01 values ror european ianguages EXAMPLE granvnaticalGender.
3.14
morphology
description of the structure and formation of word-forms
6 Word-forms as linguistic units
The segments identified by
A token may be associated with more than one word-form and, conversely, a word-form may represent more then one token. The @tokens attribute is used to associate a
For instance, en French, the morphological agglutination of ‘auquel’ (‘to which’) may have several representations. depending on the granularity of the tokenisation.
When the priority is put on coarse granularity, the character sequence auquer is not decomposed and corresponds to a single token, but there are two different word-forms associated with this single token. It is encoded as shown in Figure 13.
7 Morpho-syntactic content
7.1 General
This dause explains how to attach morpho-syntactic content to word-forms, how to define reusable tagsets in order to provide compact notations by means of tags. and how to control the validity of such tags.
The previous section explained how to enrich a documenl with morpho-syntactic annotations, but did not define the content of these annotations. What set of features and feature values should be used to express such content, and how should it be interpreted?
Such a set, which is usually referred to as a tagset, spec*fies the range of possible annotatlons The diversity of approaches and languages makes the definition of any single universally applcable tagset almost impossible. In this context, this International Standard provides mechanisms for defining tagsets by using the ISO data category registry (DCR) and FSR.