Methodology and new data sources including big data - Census Methodology in Estonia - unece

Page created by Jaime Patterson
 
CONTINUE READING
Methodology and new data sources including big data - Census Methodology in Estonia - unece
Methodology and new
data sources including
big data
Census Methodology in Estonia
By Diana Beltadze and Ene-Margit Tiit from Statistics
Estonia
ROAD-MAP FOR REGISTER-BASED
      CENSUS IN ESTONIA
                    Preparatory work 2009–2020                                                     2021–
                                                                                                   2022
I STEP 2009–2015                  II STEP 2016–2017               III STEP 2018–2020               census
TEST IN 2014                      I PILOT IN 2016                 II PILOT IN 2019
                                                                                                   activities
1. Redesigning administrative     1. Development of register-     1. Improvement of register-
   data for census purposes          based census                    based census
2. Data acquisition test from        methodology                     methodology
   registers (contracts,          2. Setting up structures and    2. Specification, design and
   description of the data set,      procedures according to         development of software
   checks on data quality and        the needs of harmonized         and standards to support
   the acquisition procedure)        population statistics           statistical production
3. Formation of census            3. Development of a             3. Improvement of data
   characteristics,                  software system for the         quality by register holders
   programming of the                implementation of big data   4. Data assessment after
   necessary rules                   and its estimation              2nd pilot in 2019
4. Registers` data quality           methods
   assessment                     4. Data quality assessment
                                     after the 1st pilot
      20.09.2018
General prerequisites
 Availability of unique identifiers
    Unique address codes
       Including geo-coordinates
    Bussiness ID number of enterprise
    Personal ID numbers
 National legislation
    Access to administrative data
    Right to link microdata
       Free-of-charge delivery

20.09.2018
Action plan for register-based census rehersal in
2019
The main preparatory work for Pilot census II involves the following
tasks:
A. Data acquisition from registers (contracts, description of the
    data set, checks on data quality and the acquisition procedure);
B. Formation of census characteristics, programming of the
    necessary rules;
C. Testing the system as a whole.

20.09.2018
Data extraction from 24 registers

 New universal X-Road service and X-Gate service
 Automatic data transactions for 17 registers

20.09.2018
Results from 2016 pilot census

 The first pilot census of REGREL was successfully
  completed. The pilot census showed that a register-based
  population and housing census is feasible and the
  preparations for the census have been purposeful.
 In total, 38 census characteristics were formed, the following
  had the best quality ratings: sex, age, legal marital status,
  country of birth, country of citizenship, total population,
  ethnicity, native language, location of dwelling and living
  quarters by type of building.

20.09.2018
Problematic formation of family structure due to
wrong addresses
 Increasing number of lone parents
 Decreasing number of families who are legally married couples
  or cohabiting couples with children
 When we compare the numbers of PHC2011 and the first pilot
  register-based census, the differences were the following:
     The number of lone parents increased by 67%
      (86,000=>143,000)
     The number of registered or cohabiting partners decreased
      by 26% (548,000 => 407,000)
20.09.2018
Index-based methodology for using registers

 How is it possible to correct registers using only register
  data?
 The answer comes from an old tradition of statistics – using
  repeated measurements allows getting more precise
  measurement results. In a similar way, by using a large
  number of registers, it is possible to improve the quality of
  administrative data.

20.09.2018
Index-based methodology for register-based
census

             Residency index       Partnership index
             • 2015                • 2017

                         Placement index
                         • 2018

20.09.2018
Defining signs

 From each register containing information about people living
  in the country, it is possible to get signs which are useful for
  making decisions about persons.

20.09.2018
Why are these indexes necessary?
 People do not register their actual home address
WHY?
Child’s place in kindergarden and school;
    Free city transport;
    Taxation rules for land;
    Renter’s unwillingness to pay tax for rent;
    Some local bonuses for pensioners or groups of population;
    Laziness, negligence, etc.
 Population Register is over-covered
    People who have left Estonia do not register their leaving in
     PR (and coming back)
20.09.2018
Administrative signs of life
1) Estonian Population Register (marriages, divorces, changes of place of residence)
2) Estonian Education Information System (students, teachers)
3) Social Services and Benefits Registry
4) Health Insurance Information Database
5) National Defence Obligation Register
6) Estonian National Pension Insurance Register
7) Estonian Unemployment Information System
8) Register of Residence and Work Permits
9) E-file system (crime documents, court documents, etc.)
10) Estonian Traffic Register (changes of driver’s licenses, changes of vehicles)
11) Register of Employment
12) Register of Identity Documents
13) Estonian Medical Prescription Center
14) The State Human Resources Database
In 2017, we had 33 signs of life in total
20.09.2018
Who can be a partner to a lone parent?

 The conditions are similar to conditions necessary for
  partners in family’s algorithm:
    Adult
    Of opposite sex
    Not close relative
    Age difference less than 18 years
    Not a partner in another existing household

20.09.2018
Signs of partnership

 Legally married or registered partnership
 Legally divorced
 Common child
 Common ownership
 Common loan
 Common car
 Jointly filled tax return
 Shared parent leave / parent compensation
 No demand for child support
20.09.2018
Placement index
                  Registered
                  address for father
                  in county X

                                Registered
                                address for
                                mother and child
                                in county Y

                   Family
                   ownership in
20.09.2018         county C
Signs of placement
 The dwelling must be inhabited all year round (electricity
  consumption information)
 Enough living space (1 room per adult and 0.5 per child)
 Dwelling with amenities (water and kitchen)
 Occupied by somebody else

20.09.2018
Big data as an opportunity

 All Statistics Estonia can do is to adopt additional alternative
  data sources that would help to improve the quality of census
  results or validate the obtained results.

20.09.2018
Big data and census
 For census purposes, big data are not crucial, because the data
  are not accurate enough. By census, information is collected
  on each resident.
 In 2018, this is more of a research project on a potential
  additional data source.
 There is no usable methodology for census based on big data.
 As a possible data source we considered using big data in
  determining partnership, but this is not crucial and the use of
  such data would be possible only after serious work evaluating
  data quality, which has not been done so far.

20.09.2018
Why are big data needed in the census project?

 The use of big data has been accepted in census preparation
  by many countries, however, not for enumeration but for
  correcting data, finding errors and evaluating.
 The future direction is to use big data in the case of topics
  “not covered” by the census.

20.09.2018
Objectives of pilot survey I

 Test opportunities to specify the actual place of residence by
  using mobile positioning data
 Study problems related to linking mobile phone number to
  user

20.09.2018
Data analysis steps

 1. Linking the participant’s and close relatives’ places of
    residence and real estate in the population register and land
    register
 2. Comparison of addresses collected from databases to actual
    place of residence indicated during the survey
 3. Create models using anchor points from mobile positioning
    that would:
    decide whether the participant’s residence in the population
     register is the actual place of residence;
    find the likeliest place of residence from among other
     addresses.
20.09.2018
Objectives of pilot survey II

 Using electricity consumption data to specify dwelling
  occupancy
 After the implementation of the partnership index, there will
  be a need to rearrange individuals’ places of residence in the
  statistical register; electricity consumption data should be
  useful here as well.

20.09.2018
Next steps in using electricity consumption data

 Elering data for 2016, 2017 and 2018. Improving algorithms
  and methods.
 Using information of electrical consumption data
      - to create total population of dwellings;
      - in partnership index.

20.09.2018
Current stage of development in the area of census

 In connection to big data, the Bayes method, data mining, etc.
  are used, however, there are essentially no special big data
  analysis methods.
 The main focus of big data analysis is currently on organising
  data and preparing the data for analysis with classical
  methods.

20.09.2018
Conclusion

 Alongside data develops also the methodology of census
  statistics, i.e. new possibilities will emerge for processing
  data.
 New data categories and data formats require improvements
  in methodology and new methodological approaches.
 The data analysis methodology is significantly affected by
  calculation possibilities as well as opportunities to apply more
  and more complex and resource-demanding calculations.

20.09.2018
References

 Levenko, V.; Tiit, E.-M.; Visk, H. Partnership index, Quarterly Bulletin
  of Statistics Estonia 1/18, pages 29-42,
  https://www.stat.ee/publication-2018_quarterly-bulletin-of-statistics-
  estonia-1-18
 Tiit, E.-M., Maasing, E. (2016). Residency index and its applications
  in censuses and population statistics. Quarterly Bulletin of Statistics
  Estonia 3/16, pp 53–60
 Tiit, E.-M., Vähi, M. Indexes in demographic statistics: a methodology
  using nonstandard information for solving critical problems. Papers
  on Anthropology XXVI/1, 2017, pp. 72–87
20.09.2018
You can also read