Home>BS Standards>BS ISO 24611:2012 pdf download

BS ISO 24611:2012 pdf download

BS ISO 24611:2012 pdf download.L anguage resource management一Morpho-syntactic annotation framework(MAF).
3.5
grapheme
minimal unit in a wntten language
EXAMPLE Letter, pactogram, Ideogram, numeral, potictuatlon.
3.6
inflection
modification or marling of a lexenie that reflects its morpho-syntactic properties
3.7
inflected form
form that a word can take when used in a sentence or a phrase
Note ito entry: An inflected (ann at a word is associated with a conibliation at morphological features, such as grammatical number and case.
3.8 lemma lernmatised torn, conventional fomi chosen to represent a lexeme
Note 1 to entry: In European languages, the lemma Is usually the singular it mere is a variation si nLanber, the masculine form II mere is a vanatron In gander, and the Inhlnatlve foq’ aM verbs In some languages, certain nouns are defective In the sangtAar form; in these cases, the plural Is chosen. For verbs In Arabic, the lemma is usuaSy deemed to be the third person singular with the accomplished aspect.
3-9
lexerne
morpheme generally associated with a set of word-forms sharing a common meaning
3.10
lexical entry
container for managing a set of word-forms and possibly one or more meanings to descnbe a lexeme
3.11
lexicon
resource comprising a collection of lexical entries for a language
3.12
morpheme
smallest linguistic unit that carries a meaning in a discourse, but which cannot be divided into smaller meaningful units
Note I to entry: A morpheme Is either grammatical (grammeme) or lexical (lexeme)
3.13
morphological feature morpho-syntactic feature
feature induced from the inflected form of a word
Note 1 to entry The ISOCat data category registry provides a coniøreflensive iis 01 values ror european ianguages EXAMPLE granvnaticalGender.
3.14
morphology
description of the structure and formation of word-forms
6 Word-forms as linguistic units
The segments identified by elements correspond to word-forms. A word-form may be associated with a lexical entry in a lexicon using the ©entry attribute (see 6.3). It may also be characterised by qualifying part- of-speech information that expresses morphological and grammatical properties as a feature structure (see 7,2). Information about the lemma and inflected forms may also be provided, using the @lemnia and ©focm attributes, In particular, the form attribute on the ‘cwordForm> element is useful wtien the inflected lorni attached to the word-form does not coincide with the content of the element, for instance because of spelling corrections.
A token may be associated with more than one word-form and, conversely, a word-form may represent more then one token. The @tokens attribute is used to associate a element with one or more elements. The association is represented by means of a pointer, typically supplying the value of the @xml:id attnbute on one or more elements.
For instance, en French, the morphological agglutination of ‘auquel’ (‘to which’) may have several representations. depending on the granularity of the tokenisation.
When the priority is put on coarse granularity, the character sequence auquer is not decomposed and corresponds to a single token, but there are two different word-forms associated with this single token. It is encoded as shown in Figure 13.
7 Morpho-syntactic content
7.1 General
This dause explains how to attach morpho-syntactic content to word-forms, how to define reusable tagsets in order to provide compact notations by means of tags. and how to control the validity of such tags.
The previous section explained how to enrich a documenl with morpho-syntactic annotations, but did not define the content of these annotations. What set of features and feature values should be used to express such content, and how should it be interpreted?
Such a set, which is usually referred to as a tagset, spec*fies the range of possible annotatlons The diversity of approaches and languages makes the definition of any single universally applcable tagset almost impossible. In this context, this International Standard provides mechanisms for defining tagsets by using the ISO data category registry (DCR) and FSR.

Related Standards