Modern Approaches to Detect and Classify Comment Toxicity Using Neural Networks

Sergey V. Morzhov

doi:10.18255/1818-1015-2020-1-48-61

Modern Approaches to Detect and Classify Comment Toxicity Using Neural Networks

Sergey V. Morzhov

https://doi.org/10.18255/1818-1015-2020-1-48-61

Full Text:

PDF (Rus) |

Generate QR code

Abstract

The growth of popularity of online platforms which allow users to communicate with each other, share opinions about various events, and leave comments boosted the development of natural language processing algorithms. Tens of millions of messages per day are published by users of a particular social network need to be analyzed in real time for moderation in order to prevent the spread of various illegal or offensive information, threats and other types of toxic comments. Of course, such a large amount of information can be processed quite quickly only automatically. that is why there is a need to and a way to teach computers to “understand” a text written by humans. It is a non-trivial task even if the word “understand” here means only “to classify”. the rapid evolution of machine learning technologies has led to ubiquitous implementation of new algorithms. A lot of tasks, which for many years were considered almost impossible to solve, are now quite successfully solved using deep learning technologies. this article considers algorithms built using deep learning technologies and neural networks which can successfully solve the problem of detection and classification of toxic comments. In addition, the article presents the results of the developed algorithms, as well as the results of the ensemble of all considered algorithms on a large training set collected and tagged by Google and Jigsaw.

Keywords

toxicity, Natural Language Processing, NLP, deep learning, word embedding, GloVe, FastText, recurrent neural networks, convolutional neural networks, CNN, LSTM, GRU

MSC2020: 68T50

About the Author

Sergey V. Morzhov

P. G. Demidov Yaroslavl State University
Russian Federation
postgraduate student

References

1. Toxic Comment Classification Challenge. [Online]. Available: https://www.kaggle.com/c/jigsaw-toxiccomment-classification-challenge/overview.

2. S. V. Georgakopoulos, S. K. Tasoulis, A. G. Vrahatis, and V. P. Plagianakos, “Convolutional neural networks for toxic comment classification”, in Proceedings of the 10th Hellenic Conference on Artificial Intelligence, 2018, pp. 1–6. arXiv: https://arxiv.org/pdf/1802.09957.pdf.

3. M. Kohli, E. Kuehler, and J. Palowitch, Paying attention to toxic comments online. [Online]. Available: https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1184/reports/6856482.pdf.

4. T. Chu, J. K., and M. Wang, Comment Abuse Classification with Deep Learning. [Online]. Available: https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1174/reports/2762092.pdf.

5. K. Khieu and N. N., Detecting and Classifying Toxic Comments. [Online]. Available: https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1184/reports/6837517.pdf.

6. S. Hochreiter and J. Schmidhuber, “Long short-term memory”, Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.

7. K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, ¨“Learning phrase representations using RNN encoder-decoder for statistical machine translation”, arXiv preprint arXiv:1406.1078, 2014.

8. J. Pennington, R. Socher, and C. Manning, “Glove: Global vectors for word representation”, in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543.

9. A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, “Bag of tricks for efficient text classification”, Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, vol. 2, pp. 427–431, 2017.

10. J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling”, arXiv preprint arXiv:1412.3555, 2014.

11. D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate”, arXiv preprint arXiv:1409.0473, 2014.

12. Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy, “Hierarchical attention networks for document classification”, in Proceedings of NAACL-HLT, 2016, pp. 1480–1489. [Online]. Available: hps://www.cs.cmu.edu/%5C%20./hovy/papers/16HLT-hierarchical-attention-networks.pdf.

13. M. Hughes, I. Li, S. Kotoulas, and T. Suzumura, “Medical text classification using convolutional neural networks”, Stud Health Technol Inform, vol. 235, pp. 246–50, 2017.

14. K. Kowsari, K. Jafari Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, “Text classification algorithms: A survey”, Information, vol. 10, no. 4, p. 150, 2019.

Review

For citations:

Morzhov S.V. Modern Approaches to Detect and Classify Comment Toxicity Using Neural Networks. Modeling and Analysis of Information Systems. 2020;27(1):48-61. (In Russ.) https://doi.org/10.18255/1818-1015-2020-1-48-61

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 1818-1015 (Print)
ISSN 2313-5417 (Online)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Modeling and Analysis of Information Systems

Modern Approaches to Detect and Classify Comment Toxicity Using Neural Networks

Full Text:

Abstract

Keywords

About the Author

References

Review

For citations:

Cookies policy