<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="ru"><front><journal-meta><journal-id journal-id-type="publisher-id">mais</journal-id><journal-title-group><journal-title xml:lang="ru">Моделирование и анализ информационных систем</journal-title><trans-title-group xml:lang="en"><trans-title>Modeling and Analysis of Information Systems</trans-title></trans-title-group></journal-title-group><issn pub-type="ppub">1818-1015</issn><issn pub-type="epub">2313-5417</issn><publisher><publisher-name>Yaroslavl State University</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.18255/1818-1015-2021-3-280-291</article-id><article-id custom-type="elpub" pub-id-type="custom">mais-1528</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="ru"><subject>Theory of Data</subject></subj-group></article-categories><title-group><article-title>Классификация текстов по жанрам на основе ритмических характеристик</article-title><trans-title-group xml:lang="en"><trans-title>Text Classification by Genre Based on Rhythm Features</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><contrib-id contrib-id-type="orcid">https://orcid.org/0000-0002-1742-3240</contrib-id><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Лагутина</surname><given-names>Ксения Владимировна</given-names></name><name name-style="western" xml:lang="en"><surname>Lagutina</surname><given-names>Ksenia Vladimirovna</given-names></name></name-alternatives><bio xml:lang="ru"><p>Аспирант.</p><p>Ул. Советская, д. 14, Ярославль, 150003</p></bio><bio xml:lang="en"><p>Postgraduate student.</p><p>14 Sovetskaya str., Yaroslavl 150003</p></bio><email xlink:type="simple">lagutinakv@mail.ru</email><xref ref-type="aff" rid="aff-1"/></contrib><contrib contrib-type="author" corresp="yes"><contrib-id contrib-id-type="orcid">https://orcid.org/0000-0002-6137-8643</contrib-id><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Лагутина</surname><given-names>Надежда Станиславовна</given-names></name><name name-style="western" xml:lang="en"><surname>Lagutina</surname><given-names>Nadezhda Stanislavovna</given-names></name></name-alternatives><bio xml:lang="ru"><p>Канд. физико-математических наук, доцент.</p><p>Ул. Советская, д. 14, Ярославль, 150003</p></bio><bio xml:lang="en"><p>PhD, associate professor.</p><p>14 Sovetskaya str., Yaroslavl 150003</p></bio><email xlink:type="simple">lagutinans@rambler.ru</email><xref ref-type="aff" rid="aff-1"/></contrib><contrib contrib-type="author" corresp="yes"><contrib-id contrib-id-type="orcid">https://orcid.org/0000-0001-6600-2971</contrib-id><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Бойчук</surname><given-names>Елена Игоревна</given-names></name><name name-style="western" xml:lang="en"><surname>Boychuk</surname><given-names>Elena Igorevna</given-names></name></name-alternatives><bio xml:lang="ru"><p>Доктор философский наук, доцент.</p><p>Ул. Республиканская, д. 108/1, Ярославль, 150000</p></bio><bio xml:lang="en"><p>PhD, associate professor.</p><p>108/1 Respublikanskaya str., Yaroslavl 150000</p></bio><email xlink:type="simple">elena-boychouk@rambler.ru</email><xref ref-type="aff" rid="aff-2"/></contrib></contrib-group><aff-alternatives id="aff-1"><aff xml:lang="ru">Ярославский государственный университет им. П.Г. Демидова<country>Россия</country></aff><aff xml:lang="en">P.G. Demidov Yaroslavl State University<country>Russian Federation</country></aff></aff-alternatives><aff-alternatives id="aff-2"><aff xml:lang="ru">Ярославский государственный педагогический университет им. К.Д. Ушинского<country>Россия</country></aff><aff xml:lang="en">Yaroslavl State Pedagogical University named after K.D. Ushinsky<country>Russian Federation</country></aff></aff-alternatives><pub-date pub-type="collection"><year>2021</year></pub-date><pub-date pub-type="epub"><day>12</day><month>10</month><year>2021</year></pub-date><volume>28</volume><issue>3</issue><fpage>280</fpage><lpage>291</lpage><permissions><copyright-statement>Copyright &amp;#x00A9; Лагутина К.В., Лагутина Н.С., Бойчук Е.И., 2021</copyright-statement><copyright-year>2021</copyright-year><copyright-holder xml:lang="ru">Лагутина К.В., Лагутина Н.С., Бойчук Е.И.</copyright-holder><copyright-holder xml:lang="en">Lagutina K.V., Lagutina N.S., Boychuk E.I.</copyright-holder><license license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>This work is licensed under a Creative Commons Attribution 4.0 License.</license-p></license></permissions><self-uri xlink:href="https://www.mais-journal.ru/jour/article/view/1528">https://www.mais-journal.ru/jour/article/view/1528</self-uri><abstract><p>Статья посвящена анализу ритма текстов различных жанров: художественных романов, рекламы, научных статей, отзывов, твитов и политических статей. Авторы выделили в текстах лексико-грамматические средства: анафору, эпифору, диакопу, апозиопезу и т. п., которые являются маркерами ритма текста. На их основе были подсчитаны статистические характеристики, описывающие количественно и структурно данные ритмические средства.</p><p>Полученная модель текста была визуализирована для статистического анализа с помощью диаграмм размаха и тепловых карт, которые показали отличия в ритме текстов различных жанров. Диаграммы размаха показали, что практически все жанры отличаются друг от друга по общей плотности ритмических характеристик. Тепловые карты показали различную структуру ритма у жанров.</p><p>Далее ритмические характеристики успешно использовались для классификации текстов по шести жанрам. Высокое качество классификации показало, что ритмические характеристики являются хорошим маркером для большинства жанров, в особенности для художественной литературы. Эксперименты проводились с помощью программного инструмента ProseRhythmDetector для русского и английского языков. Корпуса текстов содержат по 300 текстов для каждого языка.</p></abstract><trans-abstract xml:lang="en"><p>The article is devoted to the analysis of the rhythm of texts of different genres: fiction novels, advertisements, scientific articles, reviews, tweets, and political articles. The authors identified lexico-grammatical figures in the texts: anaphora, epiphora, diacope, aposiopesis, etc., that are markers of the text rhythm. On their basis, statistical features were calculated that describe quantitatively and structurally these rhythm features.</p><p>The resulting text model was visualized for statistical analysis using boxplots and heat maps that showed differences in the rhythm of texts of different genres. The boxplots showed that almost all genres differ from each other in terms of the overall density of rhythm features. Heatmaps showed different rhythm patterns across genres. Further, the rhythm features were successfully used to classify texts into six genres. The classification was carried out in two ways: a binary classification for each genre in order to separate a particular genre from the rest genres, and a multi-class classification of the text corpus into six genres at once. Two text corpora in English and Russian were used for the experiments. Each corpus contains 100 fiction novels, scientific articles, advertisements and tweets, 50 reviews and political articles, i.e. a total of 500 texts. The high quality of the classification with neural networks showed that rhythm features are a good marker for most genres, especially fiction. The experiments were carried out using the ProseRhythmDetector software tool for Russian and English languages. Text corpora contains 300 texts for each language.</p></trans-abstract><kwd-group xml:lang="ru"><kwd>стилометрия</kwd><kwd>обработка естественного языка</kwd><kwd>ритмические характеристики</kwd><kwd>жанры</kwd><kwd>классификация текстов</kwd></kwd-group><kwd-group xml:lang="en"><kwd>stylometry</kwd><kwd>natural language processing</kwd><kwd>rhythm features</kwd><kwd>genres</kwd><kwd>text classification</kwd></kwd-group><funding-group xml:lang="ru"><funding-statement>Исследование выполнено при финансовой поддержке РФФИ в рамках научного проекта № 19-07-00243</funding-statement></funding-group><funding-group xml:lang="en"><funding-statement>The reported study was funded by RFBR, project number 19-07-00243</funding-statement></funding-group></article-meta></front><back><ref-list><title>References</title><ref id="cit1"><label>1</label><citation-alternatives><mixed-citation xml:lang="ru">J. Worsham and J. Kalita, “Genre identification and the compositional effect of genre in literature”, in Proceedings of the 27th international conference on computational linguistics, 2018, pp. 1963–1973.</mixed-citation><mixed-citation xml:lang="en">J. Worsham and J. Kalita, “Genre identification and the compositional effect of genre in literature”, in Proceedings of the 27th international conference on computational linguistics, 2018, pp. 1963–1973.</mixed-citation></citation-alternatives></ref><ref id="cit2"><label>2</label><citation-alternatives><mixed-citation xml:lang="ru">M. N. Melissourgou and K. T. Frantzi, “Genre identification based on SFL principles: The representation of text types and genres in English language teaching material”, Corpus Pragmatics, vol. 1, no. 4, pp. 373–392, 2017.</mixed-citation><mixed-citation xml:lang="en">M. N. Melissourgou and K. T. Frantzi, “Genre identification based on SFL principles: The representation of text types and genres in English language teaching material”, Corpus Pragmatics, vol. 1, no. 4, pp. 373–392, 2017.</mixed-citation></citation-alternatives></ref><ref id="cit3"><label>3</label><citation-alternatives><mixed-citation xml:lang="ru">L. A. Kochetova and V. V. Popov, “Research of Axiological Dominants in Press Release Genre based on Automatic Extraction of Key Words from Corpus”, Nauchnyi dialog, no. 6, 2019, In Russian.</mixed-citation><mixed-citation xml:lang="en">L. A. Kochetova and V. V. Popov, “Research of Axiological Dominants in Press Release Genre based on Automatic Extraction of Key Words from Corpus”, Nauchnyi dialog, no. 6, 2019, In Russian.</mixed-citation></citation-alternatives></ref><ref id="cit4"><label>4</label><citation-alternatives><mixed-citation xml:lang="ru">S. E. Murphy, “Shakespeare and his contemporaries: Designing a genre classification scheme for Early English Books Online 1560-1640”, ICAME Journal, pp. 59–82, 2019.</mixed-citation><mixed-citation xml:lang="en">S. E. Murphy, “Shakespeare and his contemporaries: Designing a genre classification scheme for Early English Books Online 1560-1640”, ICAME Journal, pp. 59–82, 2019.</mixed-citation></citation-alternatives></ref><ref id="cit5"><label>5</label><citation-alternatives><mixed-citation xml:lang="ru">R. Malhotra and A. Sharma, “Quantitative evaluation of web metrics for automatic genre classification of web pages”, International Journal of System Assurance Engineering and Management, vol. 8, no. 2, pp. 1567–1579, 2017.</mixed-citation><mixed-citation xml:lang="en">R. Malhotra and A. Sharma, “Quantitative evaluation of web metrics for automatic genre classification of web pages”, International Journal of System Assurance Engineering and Management, vol. 8, no. 2, pp. 1567–1579, 2017.</mixed-citation></citation-alternatives></ref><ref id="cit6"><label>6</label><citation-alternatives><mixed-citation xml:lang="ru">D. DEJICA, “Understanding Technical and Scientific Translation: A Genre-based Approach”, Scientific Bulletin of the Politehnica University of Timisoara. Transactions on Modern Languages/Buletinul Stiintific al Universitatii Politehnica din Timisoara. Seria Limbi Moderne, vol. 19, no. 1, pp. 56–66, 2020.</mixed-citation><mixed-citation xml:lang="en">D. DEJICA, “Understanding Technical and Scientific Translation: A Genre-based Approach”, Scientific Bulletin of the Politehnica University of Timisoara. Transactions on Modern Languages/Buletinul Stiintific al Universitatii Politehnica din Timisoara. Seria Limbi Moderne, vol. 19, no. 1, pp. 56–66, 2020.</mixed-citation></citation-alternatives></ref><ref id="cit7"><label>7</label><citation-alternatives><mixed-citation xml:lang="ru">V. Thakur and A. C. Patel, “An Improved Dictionary Based Genre Classification Based on Title and Abstract of E-book Using Machine Learning Algorithms”, in Proceedings of Second International Conference on Computing, Communications, and Cyber-Security, Springer, 2021, pp. 323–337.</mixed-citation><mixed-citation xml:lang="en">V. Thakur and A. C. Patel, “An Improved Dictionary Based Genre Classification Based on Title and Abstract of E-book Using Machine Learning Algorithms”, in Proceedings of Second International Conference on Computing, Communications, and Cyber-Security, Springer, 2021, pp. 323–337.</mixed-citation></citation-alternatives></ref><ref id="cit8"><label>8</label><citation-alternatives><mixed-citation xml:lang="ru">A. Cimino, M. Wieling, F. Dell’Orletta, S. Montemagni, and G. Venturi, “Identifying predictive features for textual genre classification: the key role of syntax”, Proceedings of the Fourth Italian Conference on Computational Linguistics CLiC-it 2017, pp. 107–112, 2017.</mixed-citation><mixed-citation xml:lang="en">A. Cimino, M. Wieling, F. Dell’Orletta, S. Montemagni, and G. Venturi, “Identifying predictive features for textual genre classification: the key role of syntax”, Proceedings of the Fourth Italian Conference on Computational Linguistics CLiC-it 2017, pp. 107–112, 2017.</mixed-citation></citation-alternatives></ref><ref id="cit9"><label>9</label><citation-alternatives><mixed-citation xml:lang="ru">K. Lagutina, A. Poletaev, N. Lagutina, E. Boychuk, and I. Paramonov, “Automatic extraction of rhythm figures and analysis of their dynamics in prose of 19th-21st centuries”, Proceedings of the 26th Conference of Open Innovations Association FRUCT, pp. 247–255, 2020.</mixed-citation><mixed-citation xml:lang="en">K. Lagutina, A. Poletaev, N. Lagutina, E. Boychuk, and I. Paramonov, “Automatic extraction of rhythm figures and analysis of their dynamics in prose of 19th-21st centuries”, Proceedings of the 26th Conference of Open Innovations Association FRUCT, pp. 247–255, 2020.</mixed-citation></citation-alternatives></ref><ref id="cit10"><label>10</label><citation-alternatives><mixed-citation xml:lang="ru">K. Lagutina, N. Lagutina, E. Boychuk, V. Larionov, and I. Paramonov, “Authorship verification of literary texts with rhythm features”, Proceedings of the 28th Conference of Open Innovations Association FRUCT, pp. 240–251, 2021.</mixed-citation><mixed-citation xml:lang="en">K. Lagutina, N. Lagutina, E. Boychuk, V. Larionov, and I. Paramonov, “Authorship verification of literary texts with rhythm features”, Proceedings of the 28th Conference of Open Innovations Association FRUCT, pp. 240–251, 2021.</mixed-citation></citation-alternatives></ref><ref id="cit11"><label>11</label><citation-alternatives><mixed-citation xml:lang="ru">A. Onan, “An ensemble scheme based on language function analysis and feature engineering for text genre classification”, Journal of Information Science, vol. 44, no. 1, pp. 28–47, 2018.</mixed-citation><mixed-citation xml:lang="en">A. Onan, “An ensemble scheme based on language function analysis and feature engineering for text genre classification”, Journal of Information Science, vol. 44, no. 1, pp. 28–47, 2018.</mixed-citation></citation-alternatives></ref><ref id="cit12"><label>12</label><citation-alternatives><mixed-citation xml:lang="ru">A. M. El-Halees, “Arabic Text Genre Classification”, Journal of Engineering Research and Technology, vol. 4, no. 3, pp. 105–109, 2017.</mixed-citation><mixed-citation xml:lang="en">A. M. El-Halees, “Arabic Text Genre Classification”, Journal of Engineering Research and Technology, vol. 4, no. 3, pp. 105–109, 2017.</mixed-citation></citation-alternatives></ref><ref id="cit13"><label>13</label><citation-alternatives><mixed-citation xml:lang="ru">I. A. Batraeva, A. D. Nartsev, and A. S. Lezgyan, “Using the analysis of semantic proximity of words in solving the problem of determining the genre of texts within deep learning”, Vestnik Tomskogo gosudarstvennogo universiteta. Upravlenie vychislitelnaja tehnika i informatika, no. 50, pp. 14–22, 2020, In Russian.</mixed-citation><mixed-citation xml:lang="en">I. A. Batraeva, A. D. Nartsev, and A. S. Lezgyan, “Using the analysis of semantic proximity of words in solving the problem of determining the genre of texts within deep learning”, Vestnik Tomskogo gosudarstvennogo universiteta. Upravlenie vychislitelnaja tehnika i informatika, no. 50, pp. 14–22, 2020, In Russian.</mixed-citation></citation-alternatives></ref><ref id="cit14"><label>14</label><citation-alternatives><mixed-citation xml:lang="ru">V. B. Barahnin, O. Y. Kozhemyakina, E. V. Rychkova, I. S. Pastushkov, and Y. S. Borzilova, “Izvlechenie leksicheskih i metroritmicheskih priznakov, harakternyh dlya zhanra i stilya i ih kombinacij v processe avtomatizirovannoj obrabotki tekstov na russkom yazyke”, Sovremennye informacionnye tekhnologii i IT-obrazovanie, vol. 14, no. 4, pp. 888–895, 2018, In Russian.</mixed-citation><mixed-citation xml:lang="en">V. B. Barahnin, O. Y. Kozhemyakina, E. V. Rychkova, I. S. Pastushkov, and Y. S. Borzilova, “Izvlechenie leksicheskih i metroritmicheskih priznakov, harakternyh dlya zhanra i stilya i ih kombinacij v processe avtomatizirovannoj obrabotki tekstov na russkom yazyke”, Sovremennye informacionnye tekhnologii i IT-obrazovanie, vol. 14, no. 4, pp. 888–895, 2018, In Russian.</mixed-citation></citation-alternatives></ref><ref id="cit15"><label>15</label><citation-alternatives><mixed-citation xml:lang="ru">O. A. Mitrofanova and A. D. Moskvina, “On the Role of Prepositional Statistics for Genre Identification of Russian texts”, International Journal of Open Information Technologies, vol. 8, no. 11, pp. 91–96, 2020, In Russian.</mixed-citation><mixed-citation xml:lang="en">O. A. Mitrofanova and A. D. Moskvina, “On the Role of Prepositional Statistics for Genre Identification of Russian texts”, International Journal of Open Information Technologies, vol. 8, no. 11, pp. 91–96, 2020, In Russian.</mixed-citation></citation-alternatives></ref><ref id="cit16"><label>16</label><citation-alternatives><mixed-citation xml:lang="ru">L. G. Gorbich and A. A. Zhivoderov, “Using statistical indexes to distinguish between scientific and popular science texts on the example of the works of A. E. Fersman”, Software &amp; Systems, vol. 33, no. 4, pp. 720–725, 2020, In Russian.</mixed-citation><mixed-citation xml:lang="en">L. G. Gorbich and A. A. Zhivoderov, “Using statistical indexes to distinguish between scientific and popular science texts on the example of the works of A. E. Fersman”, Software &amp; Systems, vol. 33, no. 4, pp. 720–725, 2020, In Russian.</mixed-citation></citation-alternatives></ref><ref id="cit17"><label>17</label><citation-alternatives><mixed-citation xml:lang="ru">A. R. Dubovik, “Automatic text style identification in terms of statistical parameters”, Komp’yuternaya lingvistika i vychislitel’nye ontologii, no. 1, pp. 29–45, 2017, In Russian.</mixed-citation><mixed-citation xml:lang="en">A. R. Dubovik, “Automatic text style identification in terms of statistical parameters”, Komp’yuternaya lingvistika i vychislitel’nye ontologii, no. 1, pp. 29–45, 2017, In Russian.</mixed-citation></citation-alternatives></ref><ref id="cit18"><label>18</label><citation-alternatives><mixed-citation xml:lang="ru">A. Y. Antonova, E. S. Klyshinskij, and E. V. YAgunova, “Opredelenie stilevyh i zhanrovyh harakteristik kollekcij tekstov na osnove chasterechnoj sochetaemosti”, Otkrytye sistemy, vol. 3, pp. 80–85, 2011, In Russian.</mixed-citation><mixed-citation xml:lang="en">A. Y. Antonova, E. S. Klyshinskij, and E. V. YAgunova, “Opredelenie stilevyh i zhanrovyh harakteristik kollekcij tekstov na osnove chasterechnoj sochetaemosti”, Otkrytye sistemy, vol. 3, pp. 80–85, 2011, In Russian.</mixed-citation></citation-alternatives></ref><ref id="cit19"><label>19</label><citation-alternatives><mixed-citation xml:lang="ru">M. Sokolova and G. Lapalme, “A systematic analysis of performance measures for classification tasks”, Information processing &amp; management, vol. 45, no. 4, pp. 427–437, 2009.</mixed-citation><mixed-citation xml:lang="en">M. Sokolova and G. Lapalme, “A systematic analysis of performance measures for classification tasks”, Information processing &amp; management, vol. 45, no. 4, pp. 427–437, 2009.</mixed-citation></citation-alternatives></ref><ref id="cit20"><label>20</label><citation-alternatives><mixed-citation xml:lang="ru">L. Kozlova, “Sravnitel’naya tipologiya anglijskogo i russkogo yazykov”, Barnaul: AltGPU, no. 20019, p. 180, 2019, In Russian.</mixed-citation><mixed-citation xml:lang="en">L. Kozlova, “Sravnitel’naya tipologiya anglijskogo i russkogo yazykov”, Barnaul: AltGPU, no. 20019, p. 180, 2019, In Russian.</mixed-citation></citation-alternatives></ref><ref id="cit21"><label>21</label><citation-alternatives><mixed-citation xml:lang="ru">A. Wierzbicka, The semantics of grammar. John Benjamins Publishing, 1988, vol. 18, p. 617.</mixed-citation><mixed-citation xml:lang="en">A. Wierzbicka, The semantics of grammar. John Benjamins Publishing, 1988, vol. 18, p. 617.</mixed-citation></citation-alternatives></ref></ref-list><fn-group><fn fn-type="conflict"><p>The authors declare that there are no conflicts of interest present.</p></fn></fn-group></back></article>
