Preview

Modeling and Analysis of Information Systems

Advanced search

Disambiguation of Regular Expressions with Backreferences via Term Rewriting

https://doi.org/10.18255/1818-1015-2024-4-426-445

Abstract

In this paper we focus on regular expressions with acyclic backreferences and treat them as a semiring satisfying certain theorems of Kleene algebra. Using these theorems as term rewriting rules, we introduce an algorithm for memory disambiguation of regular expressions. Furthermore, we demonstrate that the class of regexes with acyclic backreferences is closed under language reversal, in contrast to the generic backref-regexes, and provide the reversal algorithm, based on the disambiguation procedure. The results of our experiments revealed that, in certain cases, the matching time was significantly reduced when using the reversed expressions compared to the initial ones.

About the Authors

Daria N. Ismagilova
Bauman Moscow State Technical University
Russian Federation


Antonina N. Nepeivoda
Program Systems Institute of RAS
Russian Federation


References

1. K. R. Beesley, “Kleene, a Free and Open-Source Language for Finite-State Programming,” in Finite-State Methods and Natural Language Processing, 2012, pp. 50–54.

2. W. Gelade and F. Neven, “Succinctness of the Complement and Intersection of Regular Expressions,” ACM Transactions on Computational Logic, vol. 13, no. 1, 2012, doi: 10.1145/2071368.2071372.

3. V. M. Glushkov, “The abstract theory of automata,” Russian Mathematical Surveys, vol. 16, no. 5, pp. 3–62, 1961.

4. D. Angluin, “Finding patterns common to a set of strings,” Journal of Computer and System Sciences, vol. 21, no. 1, pp. 46–62, 1980, doi: 10.1016/0022-0000(80)90041-0.

5. M. L. Schmid, “Characterising REGEX languages by regular languages equipped with factor-referencing,” Information and Computation, vol. 249, pp. 1–17, 2016, doi: 10.1016/j.ic.2016.02.003.

6. J. Goodman, “Semiring Parsing,” Computational Linguistics, vol. 25, pp. 573–605, 1999.

7. D. Kozen, “A Completeness Theorem for Kleene Algebras and the Algebra of Regular Events,” Information and Computation, vol. 110, no. 2, pp. 366–390, 1994, doi: 10.1006/inco.1994.1037.

8. A. Bruggemann-Klein and D. Wood, “One-Unambiguous Regular Languages,” Information and Computation, vol. 140, no. 2, pp. 229–253, 1998, doi: 10.1006/inco.1997.2688.

9. M. Berglund and B. van der Merwe, “Re-examining regular expressions with backreferences,” Theoretical Computer Science, vol. 940, pp. 66–80, 2023, doi: 10.1016/j.tcs.2022.10.041.

10. D. D. Freydenberger and M. L. Schmid, “Deterministic regular expressions with back-references,” Journal of Computer and System Sciences, vol. 105, pp. 1–39, 2019, doi: 10.1016/j.jcss.2019.04.001.

11. Y. Li et al., “ReDoSHunter: A Combined Static and Dynamic Approach for Regular Expression DoS Detection,” in 30th USENIX Security Symposium, 2021, pp. 3847–3864.

12. C. Campeanu, K. Salomaa, and S. Yu, “A Formal Study Of Practical Regular Expressions,” International Journal of Foundations of Computer Science, vol. 14, pp. 1007–1018, 2003, doi: 10.1142/S012905410300214X.

13. N. Chida and T. Terauchi, “On Lookaheads in Regular Expressions with Backreferences,” IEICE Transactions on Information and Systems, vol. E106--D, no. 5, pp. 959–975, 2023, doi: 10.1587/transinf.2022EDP7098.

14. D. Reidenbach and M. L. Schmid, “Patterns with bounded treewidth,” Information and Computation, vol. 239, pp. 87–99, 2014, doi: 10.1016/j.ic.2014.08.010.

15. M. L. Schmid, “Inside the Class of REGEX Languages,” in Proceedings of the 16th International Conference on Developments in Language Theory, 2012, pp. 73–84, doi: 10.1007/978-3-642-31653-1_8.

16. A. Br"uggemann-Klein, “Regular Expressions into Finite Automata,” Theoretical Computer Science, vol. 120, no. 2, pp. 197–213, 1993, doi: 10.1016/0304-3975(93)90287-4.

17. S. Kahrs and C. Runciman, “Simplifying regular expressions further,” Journal of Symbolic Computation, vol. 109, pp. 124–143, 2022, doi: 10.1016/j.jsc.2021.08.003.

18. J. McClurg, M. Claver, J. Garner, J. Vossen, J. Schmerge, and M. E. Belviranli, “Optimizing Regular Expressions via Rewrite-Guided Synthesis,” in Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2023, pp. 426–438, doi: 10.1145/3559009.3569664.

19. Y. Li et al., “RegexScalpel: Regular Expression Denial of Service (ReDoS) Defense by Localize-and-Fix,” in 31st USENIX Security Symposium (USENIX Security 22), 2022, pp. 4183–4200.

20. N. Chida and T. Terauchi, “Repairing DoS Vulnerability of Real-World Regexes,” in Proceedings of the IEEE Symposium on Security and Privacy (SP), 2022, pp. 2060–2077, doi: 10.1109/SP46214.2022.9833597.

21. Y. Uezato, “Regular Expressions with Backreferences and Lookaheads Capture NLOG,” in 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024), 2024, pp. 155:1–155:20, doi: 10.4230/LIPIcs.ICALP.2024.155.


Review

For citations:


Ismagilova D.N., Nepeivoda A.N. Disambiguation of Regular Expressions with Backreferences via Term Rewriting. Modeling and Analysis of Information Systems. 2024;31(4):426-445. https://doi.org/10.18255/1818-1015-2024-4-426-445

Views: 148


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1818-1015 (Print)
ISSN 2313-5417 (Online)