SAS Government Analytics Leadership Forum - Anil Arora, Chief Statistician of Canada April 2018
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Statistics Canada • Translating data into evidence for 100 years • Using statistical science and sophisticated methods to produce reliable information about Canadians • A lot goes on behind the scenes to produce the census… 2
The data revolution is changing Canada’s society and the expectations of Canadians • New data sources and the sophistication of our users and their capacity underpin the need to modernize our methods and outputs • Leading-edge methods and data integration are a key pillar of our modernization agenda 4 4
Statistics Canada is undertaking a significant transformation and leading efforts to be more responsive to the data needs of policy leaders by: Moving beyond a survey-first approach with new methods and integrating data from a variety of existing sources Making data easier to access and use by adopting new tools to analyze and visualize data Enabling Canadians to use data to make evidence-based decisions 5
Statistical analysis is at the center of every step in the cycle of translating data to evidence Design and collection Optimize designs and processes (samples, collection, coding, record linkage) Processing and inference Consumption Statistical error detection and Supporting quality decisions by G-SAM, G-CODE, G-LINK correction, weighting, weight citizens, their governments and adjustments, use of statistical models businesses based on evidence Statistical analysis is critical to producing high quality information in the most cost BANFF. CANCEIS efficient manner Dissemination Analysis Measurement of accuracy, Time series analysis, statistical data statistical disclosure control validation and confrontation, data (privacy) interpretation G-CONFID G-SERIES 6 All processing systems (G-SAM, etc.) are coded in SAS
Integrating data to enable the Horizontal Review of Innovation and Clean Tech Basic descriptive Administrative data files statistics from departments, agencies and crown Statistics Before-after analysis corporations Canada’s linkable file Existing survey and environment Cohort analysis administrative data files at Statistics Canada Linked file for ongoing research ✓ Gathering data efficiently and strategically ✓ Leveraging existing data holdings across government ✓ Creating a new research dataset to allow 9 further analysis
Evolving with the times Moving to: SAS first From: Visual introduced at Character- Primitive Enterprise Analytics, Statistics based green Windows user Guide Enterprise Canada in the screens on the interfaces late 1980’s mainframe Miner and Viya Canadian Housing Statistics Program • Trans Union data (43 mil. records) linked to tax information (165 mil.) • 233 million possible pairs created • Runs in about 40 hours on the SAS Grid • Would not be possible on a dedicated Windows Server 10
StatCan SAS Grid - Started as a research project made up of 4 workstations - Evolved to be the largest SAS Grid implementation in Canada: - 16 Grid nodes each having 16 cores - 256 compute cores and 60 Terabytes (TBs) of Shared File System Continued improvement: using the StatCan SAS Grid and the new SAS Allows many processes to run concurrently: application G-Tab Census, one can see a reduction in time of 95% when compared to creating the same table large record linkages using the 2016 Tabulation system complex estimation processes 11 multi-dimensional tabulations
Pure Data Analytic (Netezza) • Capacity to store, process and analyze Big Data • Planned use-cases: • CPI alternate data source • Canadian Housing Statistics Program linkage • Admin Data Lake 12
Old and new: combining traditional and AI methods in the 2016 Census Immigration-related variables: Traditional: data was added through record linkage instead of collection - Result: 24,000 hour reduction of respondent burden OUTCOMES AI: to fill in missing values, More accurate data for Machine Learning identified best Now IRCC policymakers combination of respondent Proof of concept for characteristics to make Later Census 2021 corrections - Result: complete data; up to 13 10% more accurate
SAS and Statistics Canada 14
THANK YOU! For more information, please visit www.statcan.gc.ca #StatCan100
You can also read