Preview

Modeling and Analysis of Information Systems

Advanced search

Dataflow-Driven Crowdsourcing: Relational Models and Algorithms

https://doi.org/10.18255/1818-1015-2016-2-195-210

Abstract

Recently, microtask crowdsourcing has become a popular approach for addressing various data mining problems. Crowdsourcing workflows for approaching such problems are composed of several data processing stages which require consistent representation for making the work reproducible. This paper is devoted to the problem of reproducibility and formalization of the microtask crowdsourcing process. A computational model for microtask crowdsourcing based on an extended relational model and a dataflow computational model has been proposed. The proposed collaborative dataflow computational model is designed for processing the input data sources by executing annotation stages and automatic synchronization stages simultaneously. Data processing stages and connections between them are expressed by using collaborative computation workflows represented as loosely connected directed acyclic graphs. A synchronous algorithm for executing such workflows has been described. The computational model has been evaluated by applying it to two tasks from the computational linguistics field: concept lexicalization refining in electronic thesauri and establishing hierarchical relations between such concepts. The “Add–Remove–Confirm” procedure is designed for adding the missing lexemes to the concepts while removing the odd ones. The “Genus–Species–Match” procedure is designed for establishing “is-a” relations between the concepts provided with the corresponding word pairs. The experiments involving both volunteers from popular online social networks and paid workers from crowdsourcing marketplaces confirm applicability of these procedures for enhancing lexical resources. 

About the Author

D. A. Ustalov
N.N. Krasovskii Institute of Mathematics and Mechanics of the Ural Branch of the Russian Academy of Sciences, Sofia Kovalevskaya str., 16, Yekaterinburg, 620990, Russia
Russian Federation

graduate student



References

1. Estell ́es-Arolas E., Gonz ́alez-Ladr ́on-de Guevara F., “Towards an integrated crowdsourcing definition”, Journal of Information Science, 38:2 (2012), 189–200, http://jis.sagepub.com/content/38/2/189.

2. The People’s Web Meets NLP, eds. Gurevych I., Kim J., Springer Berlin Heidelberg, 2013, http://dx.doi.org/10.1007/978-3-642-35085-6.

3. Bocharov V., Alexeeva S., Granovsky D. et al., “Crowdsourcing morphological annotation”, Computational Linguistics and Intellectual Technologies: papers from the Annual conference “Dialogue”, 2013, 109–124, http://www.dialog21.ru/digests/dialog2013/materials/pdf/BocharovVV.pdf.

4. Pronoza E., Yagunova E., “Comparison of Sentence Similarity Measures for Russian Paraphrase Identification”, Proceedings of the AINL-ISMW FRUCT, 2015, 74–82, http://dx.doi.org/10.1109/AINL-ISMW-FRUCT.2015.7382973.

5. Braslavski P., Ustalov D., Mukhin M., “A Spinning Wheel for YARN: User Interface for a Crowdsourced Thesaurus”, Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics, 2014, 101–104, http://www.aclweb.org/anthology/E/E14/E14-2026.pdf.

6. Шуровьески Дж., Мудрость толпы, Манн, Иванов и Фербер, 2013, http://www.mannivanov-ferber.ru/books/paperbook/the-wisdom-of-crowds/; [Surowiecki J., The Wisdom of Crowds, Doubleday, 2004, (in Russian).]

7. Gadiraju U., Kawase R., Dietze S. et al., “Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of Online Surveys”, Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, 2015, 1631–1640, http://dx.doi.org/10.1145/2702123.2702443.

8. Kittur A., Smus B., Khamkar S., et al., “CrowdForge: Crowdsourcing Complex Work”, Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, 2011, 43–52, http://dx.doi.org/10.1145/2047196.2047202.

9. Franklin M.J., Kossmann D., Kraska T. et al., “CrowdDB: Answering Queries with Crowdsourcing”, Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, 2011, 61–72, http://dx.doi.org/10.1145/1989323.1989331.

10. Kucherbaev P., Daniel F., Tranquillini S. et al., “Crowdsourcing Processes: A Survey of Approaches and Opportunities”, IEEE Internet Computing, 20:2 (2016), 50–56, http://dx.doi.org/10.1109/MIC.2015.96.

11. Johnston W. M., Hanna J. R. P., Millar R. J., “Advances in Dataflow Programming Languages”, ACM Comput. Surv., 36:1 (2004), 1–34, http://dx.doi.org/10.1145/1013208.1013209.

12. Стрельцов Н.В., “Архитектура и реализация мультиклеточных процессоров”, Труды V Международной научной конференции «Параллельные вычисления и задачи управления» (Москва, 26–28 октября 2010 г.), 2010, 1087–1104; [Streltsov N.V., “Arkhitektura i realizatsiya multikletochnykh protsessorov”, Trudy V Mezhdunarodnoy nauchnoy konferentsii “Parallelnye vychisleniya i zadachi upravleniya” (Moskva, 26–28 oktyabrya 2010 g.), 2010, 1087–1104, (in Russian).]

13. Kittur A., Khamkar S., Andr ́e P., “CrowdWeaver: Visually Managing Complex Crowd Work”, Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work, 2012, 1033–1036, http://dx.doi.org/10.1145/2145204.2145357.

14. Giang N.K., Blackstock M., Lea R. et al., “Distributed Data Flow: A Programming Model for the Crowdsourced Internet of Things”, Proceedings of the Doctoral Symposium of the 16th International Middleware Conference, 4:1–4:4, http://dx.doi.org/10.1145/2843966.2843970.

15. Kahn G., “The semantics of a simple language for parallel programming”, Information Processing, 74 (1974), 471–475.

16. Murata T., “Petri Nets: Properties, Analysis and Applications”, Proceedings of the IEEE, 77:4 (1989), 541–580, http://dx.doi.org/10.1109/5.24143.

17. Bernstein M. S., Little G., Miller R. C. et al., “Soylent: A Word Processor with a Crowd Inside”, Proceedings of the 23Nd Annual ACM Symposium on User Interface Software and Technology, 2010, 313–322, http://dx.doi.org/10.1145/1866029.1866078.

18. Noronha J., Hysen E., Zhang H. et al., “PlateMate: Crowdsourcing Nutritional Analysis from Food Photographs”, Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, 2011, 1–12, http://dx.doi.org/10.1145/2047196.2047198.

19. Wang J., Kraska T., Franklin M. J. et al., “CrowdER: Crowdsourcing Entity Resolution”, Proc. VLDB Endow., 5:11 (2012), 1483–1494, http://dx.doi.org/10.14778/2350229.2350263.

20. Biemann C., “Creating a system for lexical substitutions from scratch using crowdsourcing”, Language Resources and Evaluation, 47:1 (2013), 97–122, http://dx.doi.org/10.1007/s10579-012-9180-5.

21. Schek H.-J., Scholl M.H., “The relational model with relation-valued attributes”, Information Systems, 11:2 (1986), 137–147, http://dx.doi.org/10.1016/03064379(86)90003-7.

22. Garcia-Molina H., Ullman J. D., Widom J., Database Systems: The Complete Book, 2nd edition, Prentice Hall Press, 2008.

23. Zhao G., Huang W., Liang S. et al., “Modeling MongoDB with Relational Model”, Emerging Intelligent Data and Web Technologies (EIDWT), 2013 Fourth International Conference on, 2013, 115–121, http://dx.doi.org/10.1109/EIDWT.2013.25.

24. Panchenko A., Loukachevitch N. V., Ustalov D. et al., “RUSSE: The First Workshop on Russian Semantic Similarity”, Computational Linguistics and Intellectual Technologies: papers from the Annual conference “Dialogue”, 2 (2015), 89–105, http://www.dialog21.ru/digests/dialog2015/materials/pdf/PanchenkoAetal.pdf.

25. McCartney W.P., Sridhar N., “Abstractions for Safe Concurrent Programming in Networked Embedded Systems”, Proceedings of the 4th International Conference on Embedded Networked Sensor Systems, 2006, 167–180, http://dx.doi.org/10.1145/1182807.1182825.

26. Ustalov D., Kiselev Y., Cleansing”, Application of (AICT), 2015 IEEE 9th http://dx.doi.org/10.1109/ICAICT.2015.7338534. “Add-Remove-Confirm: Crowdsourcing Synset Information and Communication Technologies International Conference on, 2015, 143–147,

27. Ustalov D., “Crowdsourcing Synset Relations with Genus-Species-Match”, Proceedings of the AINL-ISMW FRUCT, 2015, 118–124, http://dx.doi.org/10.1109/AINL-ISMWFRUCT.2015.7382980.

28. Ustalov D., “A Crowdsourcing Engine for Mechanized Labor”, Proceedings of the Institute for System Programming, 27:3 (2015), 351–364, http://dx.doi.org/10.15514/ISPRAS2015-27(3)-25.

29. Kiselev Y., Ustalov D., Porshnev S., “Eliminating Fuzzy Duplicates in Crowdsourced Lexical Resources”, Proceedings of the Eighth Global Wordnet Conference, 2016, 161– 167, http://gwc2016.racai.ro/procedings.pdf.

30. Fleiss J.L., Levin B., Paik M.C., Statistical Methods for Rates and Proportions, 3rd edition, John Wiley & Sons, 2003.

31. Powers D.M.W., “The Problem with Kappa”, Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, 2012, 345–355, http://aclweb.org/anthology/E12-1035.

32. Ляшевская О.Н., Шаров С.А., Частотный словарь современного русского языка (на материалах Национального корпуса русского языка), Азбуковник, 2009; [Lyashevskaya O. N., Sharoff S. A., Chastotnyy slovar sovremennogo russkogo yazyka (na materialakh Natsionalnogo korpusa russkogo yazyka), Azbukovnik, 2009, (in Russian).]

33. Healy G.F., Gurrin C., Smeaton A.F., “Informed Perspectives on Human Annotation Using Neural Signals”, MultiMedia Modeling: 22nd International Conference, Proceedings, Part II, Lecture Notes in Computer Science, 9517, 2016, 315–327, http://dx.doi.org/10.1007/978-3-319-27674-8 28.

34. Ignatov D.I., Kaminskaya A.Yu., Bezzubtseva A.A. et al., “FCA-Based Models and a Prototype Data Analysis System for Crowdsourcing Platforms”, Conceptual Structures for STEM Research and Education, Lecture Notes in Computer Science, 7735, 2013, 173– 192, http://dx.doi.org/10.1007/978-3-642-35786-2 13.


Review

For citations:


Ustalov D.A. Dataflow-Driven Crowdsourcing: Relational Models and Algorithms. Modeling and Analysis of Information Systems. 2016;23(2):195-210. (In Russ.) https://doi.org/10.18255/1818-1015-2016-2-195-210

Views: 1242


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1818-1015 (Print)
ISSN 2313-5417 (Online)