Extracting named entities from Russian-language documents with different expressiveness of structure

Maria D. Averina; Olga A. Levanova

doi:10.18255/1818-1015-2023-4-382-393

Extracting named entities from Russian-language documents with different expressiveness of structure

Maria D. Averina, Olga A. Levanova

https://doi.org/10.18255/1818-1015-2023-4-382-393

EDN: NVTLNK

Full Text:

PDF (Rus)

Generate QR code

Abstract

This work is devoted to solving the problem of recognizing named entities for Russian-language texts based on the CRF model. Two sets of data were considered: documents on refinancing with a good document structure, semi-structured texts of court records. The model was tested under various sets of text features and CRF parameters (optimization algorithms). In average for all entities, the best F-measure value for structured documents was 0.99, and for semi-structured ones 0.86.

Keywords

named entity extraction, CRF

MSC2020: 68T50

About the Authors

Maria D. Averina

P.G. Demidov Yaroslavl State University
Russian Federation

Olga A. Levanova

P.G. Demidov Yaroslavl State University
Russian Federation

References

1. E. Leitner, G. Rehm, and J. Moreno-Schneider, “Fine-grained Named Entity Recognition in legal documents,” in International Conference on Semantic Systems, 2019, pp. 272–287.

2. J. Strakov'a, M. Straka, and J. Hajivc, “Neural Architectures for Nested NER through Linearization,” Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5326–5331, 2019.

3. R. Yeshpanov, Y. Khassanov, and H. A. Varol, “KazNERD: Kazakh Named Entity Recognition Dataset.” 2022.

4. S. Zheng et al., “Conditional Random Fields as Recurrent Neural Networks,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1529–1537.

5. K. W. Church, “Word2Vec,” Natural Language Engineering, vol. 23, no. 1, pp. 155–162, 2017.

6. P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word vectors with subword information,” Transactions of the association for computational linguistics, vol. 5, pp. 135–146, 2017.

7. C. Sutton, A. McCallum, and others, “An Introduction to Conditional Random Fields,” Foundations and Trendstextregistered in Machine Learning, vol. 4, no. 4, pp. 267–373, 2012.

8. J. Lafferty, A. Mccallum, and F. Pereira, “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data,” in Proceedings of the Eighteenth International Conference on Machine Learning, 2001, pp. 282–289.

9. M. Collins, “Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms,” in Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), 2002, pp. 1–8.

10. S. Bird, “NLTK: The Natural Language Toolkit,” in Proceedings of the COLING/ACL on Interactive Presentation Sessions, 2006, pp. 69–72.

11. R. Řehůřek and P. Sojka, “Software Framework for Topic Modelling with Large Corpora,” in Proceedings of LREC 2010 workshop New Challenges for NLP Frameworks, 2010, pp. 46–50.

12. M. Korobov, “Morphological analyzer and generator for Russian and Ukrainian languages,” in Analysis of Images, Social Networks and Texts, 2015, pp. 320–332.

13. J. Li, A. Sun, J. Han, and C. Li, “A survey on deep learning for Named Entity Recognition,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 1, pp. 50–70, 2020.

Review

For citations:

Averina M.D., Levanova O.A. Extracting named entities from Russian-language documents with different expressiveness of structure. Modeling and Analysis of Information Systems. 2023;30(4):382-393. (In Russ.) https://doi.org/10.18255/1818-1015-2023-4-382-393. EDN: NVTLNK

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 1818-1015 (Print)
ISSN 2313-5417 (Online)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Modeling and Analysis of Information Systems

Extracting named entities from Russian-language documents with different expressiveness of structure

Full Text:

Abstract

Keywords

About the Authors

References

Review

For citations:

Cookies policy