Preview

Modeling and Analysis of Information Systems

Advanced search

Application of the Fuzzy Classification for Linear Hybrid Prediction Methods

https://doi.org/10.18255/1818-1015-2013-3-108-120

Abstract

The paper discusses the problem of forecasting for samples with real-valued attributes. The goal is to estimate the effect of generated binary attributes on forecasting accuracy for the linear regression and the hybrid methods based on clustering. The initial set of attributes is expanded by binary attributes which are derived from the initial set by fuzzy classification. A comparative testing of the discussed forecasting methods on the initial samples and the resulting ones is performed. The test results on three different databases showed that the use of generated attributes for the classical linear regression resulted in the significant increase of the forecasting accuracy. In case of the linear regression with the clustering based on k-means the increase of forecasting accuracy was also observed. In case of the linear regression with the clustering based on the knn–method we registered a slight decrease, and an unstable result was obtained for the double linear regression.

About the Authors

A. S. Taskin
Siberian Federal University
Russian Federation

аспирант,

79, Svobodny Prospect, Krasnoyarsk, 660041, Russia



E. M. Mirkes
Siberian Federal University
Russian Federation

д-р техн. наук, профессор,

79, Svobodny Prospect, Krasnoyarsk, 660041, Russia



N. Y. Sirotinina
Siberian Federal University
Russian Federation

канд. техн. наук, доцент,

79, Svobodny Prospect, Krasnoyarsk, 660041, Russia



References

1. Haykin S. Neural Networks and Learning Machines. New York: Prentice Hall, 2009.

2. Левитин А.В. Алгоритмы: введение в разработку и анализ. М.: Вильямс, 2006. (Levitin A.V. Algoritmy: vvedenie v razrabotku i analiz. Moskva.: Vilyams, 2006 [in Russian].)

3. Motulsky H., Christopoulos A. Fitting models to biological data using linear and non-linear regression. A practical guide to curve fitting. Oxford: UniversityPress, 2004.

4. Дрейпер Н., Смит Г. Прикладной регрессионный анализ. М.: Вильямс, 2007. (Draper N.R., Smith H. Applied regression analysis. New York: Wiley, 1998.)

5. Ицхоки О. Выбор модели и парадоксы прогнозирования // Квантиль. 2006. №1. C. 43–51. (English transl.: Itskhoki O. Model selection and paradoxes of prediction // Quantile. 2006. No. 1. P. 43–51.)

6. Tondel K., Indahl U., Gjuvsland A., Vik J. et al. Hierarchical Cluster-based Partial Least Squares Regression (HC-PLSR) is an efficient tool for metamodelling of nonlinear dynamic models // BMC Systems Biology, 2011. V. 5(90).

7. Camps-Valls G. et al. Cyclosporine concentration prediction using clustering and support vector regression methods // Electronic Letters. 2002. V. 38(12). P. 568–570.

8. Ari B., Guvenir H.A. Clustered linear regression // Knowledge-Based Systems. 2002. V. 15(3). P. 169–175.

9. Таскин А.С., Неволина С.С. Применение предварительной кластеризации при заполнении пробелов в таблицах данных // Сборник трудов VIII Всероссийской научно-практической конференции студентов, аспирантов и молодых ученых «Молодежь и современные информационные технологии», 2010. Ч. 1. С. 223–224. (Taskin A.S., Nevolina S.S. Primenenie predvaritelnoy klasterizatsii pri zapolnenii probelov v tablitsakh dannykh // Sbornik trudov VIII Vserossiyskoy nauchno-prakticheskoy konferentsii studentov, aspirantov i molodykh uchenykh «Molodezh i sovremennye informatsionnye tekhnologii», 2010. Part 1. P. 223–224 [in Russian].)

10. Таскин А.С., Миркес Е.М. Линейная регрессия с кластеризацией по признаку на данных с действительными величинами // Вестник СибГАУ. 2012. Вып. 3(43). С. 71–75. (Taskin A.S., Mirkes E.M. Lineynaya regressiya s klasterizatsiey po priznaku na dannykh s deystvitelnymi velichinami // Vestnik SibGAU. 2012. V. 3(43). P. 71–75 [in Russian].)

11. Gorban A.N., Zinovyev A.Y. Principal Graphs and Manifolds, Ch. 2 // Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques. IGI Global, Hershey, PA, USA, 2009.

12. Gan G., Ma C., Wu J. Data Clustering: Theory, Algorithms, and Applications. SIAM, Philadelphia, ASA, Alexandria, VA, 2007.

13. Abidin T., Perrizo W. SMART-TV: A Fast and Scalable Nearest Neighbor Based Classifier for Data Mining // Proceedings of ACM SAC-06, 2006. P. 536–540.

14. Zhang J., Mani I. kNN approach to unbalanced data distributions: A case study involving Information Extraction // Workshop on learning from imbalanced datasets II, ICML, 2003.

15. David Arthur, Sergei Vassilvitskii. k-means++: the advantages of careful seeding // Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’07), 2007. P. 1027–1035.

16. Орлов А.И. Прикладная статистика. М.: Экзамен, 2004. (Orlov A.I. Prikladnaya statistika. Moskva: Ekzamen, 2004 [in Russian].)

17. Abonyi J., Feil J. Cluster Analysis for Data Mining and System identification. Basel: Birkhauser, 2007.

18. Bilkent University. Function Approximation Repository [Электронный ресурс]. URL: http://funapp.cs.bilkent.edu.tr.

19. UCI Machine Learning Repository [Электронный ресурс]. URL: http://archive.ics.uci.edu/ml.


Review

For citations:


Taskin A.S., Mirkes E.M., Sirotinina N.Y. Application of the Fuzzy Classification for Linear Hybrid Prediction Methods. Modeling and Analysis of Information Systems. 2013;20(3):108-120. (In Russ.) https://doi.org/10.18255/1818-1015-2013-3-108-120

Views: 1066


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1818-1015 (Print)
ISSN 2313-5417 (Online)