Tuesday, 22 December 2015

Rudnev S.G.1,2, Nikolaev D.V.1,3, Korostylev K.A.1,3, Starunova O.A.1,3, Schelykalina S.P.1,3, Eryukova T.A.1,3, Kolesnikov V.A.1,3, Starodubov V.I.1
1Federal Research Institute for Health Organization and Informatics of Ministry of Health of the Russian Federation, Moscow
2Institute of Numerical Mathematics of the Russian Academy of Sciences, Moscow
3“Medas” Scientific Research Centre, Moscow
Contacts: Sergey G. Rudnev, e-mail: This e-mail address is being protected from spam bots, you need JavaScript enabled to view it

This article was prepared under the framework of the Russian Science Foundation project ‘Development of methodology for population screening of physical growth and development, state of health and nutrition. Assessment of epidemiological risks’ (grant no. 14-15-01085).

Abstract. Significance. The national network of Health Centers is a complex distributed system that continuously, since 2010, generates mass data on preventive screening. Manual analysis of quality and reliability of the data collected in Health Centers is not possible, while official reporting of Health Centers may not, in some cases, reflect the real situation. So it is necessary to develop automated algorithms for quality control and enhancement of reliability of preventive screening data.

The purpose of the study was to implement elements of the big data technology for analyzing results of preventive screening in Health Centers exemplified by the bioimpedance measurement data, retrospectively evaluate quality and reliability of data, and explore their applicability for epidemiological monitoring.

Materials and methods. Bioimpedance data from the Federal Information Resource of Health Centers database was combined with the submitted data of bioimpedance measurements according to the letter by the Ministry of Health of the Russian Federation #14-1/10/2-3200 as of October 24, 2012, as well as with the submitted data according to the letter by the Federal Research Institute for Health Organization and Informatics of the Russian Health Ministry #7-5/434 as of July 2, 2015. The initial number of records in the bioimpedance database was 2.35 million. The data were obtained from 320 Health Centers in 62 Federal Subjects and eight Federal Districts of the Russian Federation.

Results. In half of the Health Centers the quality of bioimpedance data was 93.5% or higher. However, the proportion of incorrect data grew steadily reaching 28.1% in 2014. The incorrect data consisted mainly of frauds (50.6%) and measurement errors (48.5%). The number of records in the database after removal of incorrect data and repeated measurements equaled to 1.64 million. Based on calculated parameters of the distributions of body mass index using the software package GAMLSS, the prevalence of overweight, obesity and wasting in the study group was estimated among males according to the WHO criteria. The age-standardized obesity prevalence in males was 11.0% at the age of 5-17 years, and 17.7% at the age of 18-85 years.

Discussion. The use of big data technology allowed to evaluate quality of data and identify incorrect data of bioimpedance measurements. This offers an opportunity for taking managerial decisions to correct the identified violations. Results of the comparison with independent anthropometric data show representativeness of the Health Centers’ data for children and adolescents.

Conclusions. 1) Based on bioimpedance data, our mass data analysis showed that quality and accuracy of the raw data on preventive screening in Health Centers was gradually decreasing. This suggests ineffectiveness of control measures.

2) Effective quality management of Health Centers’ activities is possible through the use of big data technology.

3) Data of the Federal Information Resource of Health Centers may be suitable for epidemiological monitoring upon application of the selection criteria.

Keywords. Health Centers; Federal Information Resource of Health Centers; preventive screening; big data; frauds detection and removal; data compression; data standardization.


