Poetology: Problems of Constructing a Thesaurus and Verse Text Specification

It is a brief version of the report made by the authors at the seminar “Modeling and Analysis of Information Systems,” Yaroslavl, May 17, 2017. The connection between the tasks of automatic thesaurus construction and verse specification is considered.


INTRODUCTION
Poetology is a group of disciplines, focused on a comprehensive theoretical and historical study of poetry (as a way of organizing speech), its texts and verse (as a segment of a text written in verse form). The task of studying a text in verse as an object of poetology is its specification. In other words, on the one hand, it involves defining parameters of specification represented by a certain matrix (or a form) of distinctive attributes (features, properties, characteristics) of a particular class of study objects. On the other hand, it is a procedure of an object identification, which includes detection of unique values of these attributes for a specific object.
Research of this kind is supposed to have a meta description of verse text in terms of a thesaurus in the domain of poetology, which, therefore, requires the existence the thesaurus or its automatic construction as in our case. A thesaurus is a collection of terms (words or word combinations), where each term is a domain specific concept with its definition, as well as relations between them which constitute their specification [3]. Thus, both verses and the terms of the thesaurus are the objects of research as well as the establishment of their specifications (meta-descriptions). The complex solution of these tasks characterizes poetology as an information analytical system [1,2].
The poetology thesaurus was set up as an instrument for linguo-poetic and historical literary metadescription of poetry and text in verse in previous studies [5][6][7][8][9][11][12][13][14]. The original terminological list of the poetology thesaurus was composed of 1544 words and phrases. A hierarchical index for the upper levels of terms has been created and the lower levels were clustered, so rubrication of thesaurus terms on poetology in the sub-domain is as follows: (1) Poetry.
(2) Stylistics. The main attributes of these subtopics can be automatically analyzed and set the corresponding specification of the verse text. Generally, the purpose of the information analytical system is to establish three levels of text specification: (C1) Metric-stanza specification of a particular work of poetry (verse text). (C2) Poetic and stylistic specification of the work of poetry of a particular author based on the metricstanza specification of their works.
(C3) Historical and literary specification of branches, schools and periods based on the statistical similarity of the poetic and stylistic specifications of various authors.
It is necessary for automatic analysis to distinguish between a work of poetry as an aesthetic entity and a text in verse as an object of study of poetry. Consequently, we should distinguish between linguistic poetic tagging of the verse and the specification of the text in verse. Verse tagging mainly reflects the numerical (quantitative and ordinal) characteristics of the text in verse while specification is its metadescription based on the terms of the thesaurus.
Verse tagging (line tagging) allows to create a formal linguistic syllable representation of the verse [4,10]: where means ictus (metrically stressed syllables), is the number of ictus, and mean unstressed prepositional and postpositional syllables between ictuses, respectively, and stand for the length of the corresponding subchains of unstressed syllables. Subgroups New authors of poetry, contributing something novel to the process of versification, appear all the time. At the same time new researchers, discovering new phenomena in poetology, emerge in study of poetry. These two facts bring both additional terms to thesaurus and changes of specification of the terms. This, in its turn, updates the matrix of the specification of text in verse. Consequently, poetology as an information analytical system cannot and should not be finite.