Modeling the Impact of Demography on COVID-19 Dynamics in Hubei and Lombardy
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Modeling the Impact of Demography on COVID-19 Dynamics in Hubei and Lombardy The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Wilder, Bryan, Marie Charpignon, Jackson A. Killian, Han-Ching Ou, Aditya Mate, et al. Modeling the Impact of Demography on COVID-19 Dynamics in Hubei and Lombardy (2020) Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:42659940 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA
Modeling the Impact of Demography on COVID-19 Dynamics in Hubei and Lombardy Age, household structure, and comorbidity distributions influence outbreak dynamics and evaluation of policy interventions. Bryan Wilder1∗ , Marie Charpignon2 , Jackson A. Killian1 , Han-Ching Ou1 , Aditya Mate1 , Shahin Jabbari1 , Andrew Perrault1 , Angel N. Desai3 , Milind Tambe1∗ , Maimuna S. Majumder4,5∗ 1 School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA 2 MIT Institute for Data, Systems, and Society, Cambridge, MA, USA 3 International Society for Infectious Diseases, Brookline, MA, USA 4 Department of Pediatrics, Harvard Medical School, Boston, MA, USA 5 Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, USA *Corresponding author. Email: bwilder@g.harvard.edu (B.W.); milind tambe@harvard.edu (M.T.); Maimuna.Majumder@childrens.harvard.edu (M.S.M.) As the COVID-19 pandemic accelerates, understanding its dynamics will be crucial to formulating regional and national policy interventions. We develop an individual-level model for SARS-CoV2 transmission that accounts for location- dependent distributions of age, household structure, and comorbidities ap- plied to Hubei, China and Lombardy, Italy. Using these distributions together with age-stratified contact matrices for each country, our model estimates that the epidemic in Lombardy is both more transmissible and more deadly than 1
in Hubei, while rates of disease documentation in both locations may be lower than previously established. Evaluation of potential policies in Lombardy sug- gests that targeted “salutary sheltering” by 50% of a single age group may have a substantial impact when combined with the adoption of physical dis- tancing measures by the rest of the population. Since December 2019, the COVID-19 pandemic – caused by the the novel coronavirus SARS-CoV2 – has resulted in significant morbidity and mortality (1). As of March 28, 2020, an estimated 664,000 individuals have been infected, with over 30,000 fatalities worldwide (2). Key factors such as existing comorbidities and age have appeared to play a role in an increased risk of mortality (3). Epidemiological studies have provided significant insights into the disease to date; however, as the pandemic continues to accelerate around the world, understanding the transmission dynamics of SARS-CoV2 will be critical to mitigating its spread (4–7). As national and regional governments begin to implement broad-reaching policies in response to rising case counts and stressed healthcare systems, estimating the impact of these policies while accounting for how population-specific demography impacts outbreak dynamics will be vital. This study employs mathematical modeling to evaluate the impact of the distribution of age, comorbidities, and household contacts on COVID-19 dynamics using existing data from Hubei, China, and Lombardy, Italy, together with age-stratified contact matrices for each location. We then describe how these findings may affect the utility of potential non-pharmaceutical interventions at a country-level. Model We develop a stochastic agent-based model for SARS-CoV2 transmission which accounts for distributions of age, household types, comorbidities, and contact between different age groups in a given population. The model follows a susceptible-exposed-infectious-removed (SEIR) 2
template (8, 9) and proceeds in discrete time steps corresponding to days. Fig. 1 shows the progression of individuals through the model compartments. Each individual’s probability of progressing to severe disease states is determined by their age and comorbidities. We strat- ify age into ten-year intervals and incorporate hypertension and diabetes as comorbidities due to their worldwide prevalence (10) and association with higher risk of in-hospital death for COVID-19 patients (3). However, our model can be expanded to include other comorbidities of interest in the future. The disease is transmitted over a contact structure, which is divided into in-household and out-of-household groups. Each individual’s household is simulated using census data. Individ- uals infect members of their households at a higher rate than out-of-household individuals. We model out-of-household transmission using country-specific estimated contact matrices (11). These matrices state the mean number of daily contacts an individual of a particular age stratum has with individuals in other age strata. Validation and Inferred Parameters This section instantiates the model for two specific locations: Hubei, China (where the disease originated) and Lombardy, Italy (one of the most heavily-impacted areas thus far, along with Hubei). We show how our model is able to reproduce observed patterns in the number of reported deaths, examine how a range of possible underlying parameters are consistent with the data, and propose plausible ranges for r0 in Hubei and Lombardy and for the rate at which infected individuals are documented. There are three parameters for which values are not precisely estimated in the literature and are varied in-simulation. First is pinf , the probability of infection given contact with an infected individual. This determines the level of transmissibility of the disease. Second is t0 , the start time of the infection, which is not precisely characterized in most locations and has an impact 3
Infectious Susceptible Exposed Asymptomatic Removed Mild Recovered Contact Household matrix structure Severe Deceased Critical 1 − !!→# (#$ , %$ ) Mild !!→# (#$ , %$ ) ) ∼ Exp(.!→% ) Recovered ) ∼ Exp(.!→# ) Severe Fig. 1: We use a modified SEIR model, where the infectious states are subdivided into levels of disease severity. The transitions are probabilistic and there is a time lag for transitioning between states. For example, the magnified section shows the details of transitions between mild, recovered, and severe states. Each arrow consists of the probability of transition (e.g., pm→s (ai , ci ) denotes the probability of progressing from mild to severe) as well as the associated time lag (e.g., the time t for progression from mild to severe is drawn from an exponential distribution with mean λm→s ). ai and ci denote the age and set of comorbidities of the infected individual i. 4
due to rapid doubling times. Third is a parameter dmult , which accounts for differences in the rate of mortality between locations that are not captured by demographic factors in the model (e.g., the impact of limited ICU capacity in Lombardy). dmult is a multiplier applied to the baseline mortality rate estimated from China CDC data. Hubei, China We simulate the epidemic from a starting time t0 (varied around November 17 for the first iden- tified patient in Wuhan (12)) through March 21, with lockdown on January 23. Details of the modeled scenario can be found in the supplementary text. Validating the simulated results is complex because infections are likely underdocumented (13). Because deaths are believed to be better documented than infections, we evaluate our model by using the number of deaths on the simulation end date while considering the possibility of both underreporting and overre- porting (14, 15). Tables S2 and S3 show a sensitivity analysis of our estimates for hypothetical underreporting or overreporting of deaths. The results of this analysis do not alter our main conclusions. While we fit using only the final total of deaths, Fig. 2 shows that our simulations closely match the entire trajectory, thus guarding against overfitting. In addition, Figs. S1 and S2 show out-of-sample validation against reported deaths in the week after the model was cal- ibrated. All results, here and in the remainder of the paper, use 1000 independent runs of the simulation. There are several parameters that fit the data well. Fig. 3 shows two simulation outputs across the entire parameter range, with heatmap saturation representing the goodness of fit between the simulated and true number of reported deaths. We measure goodness of fit by evaluating whether the true number of reported deaths is well-contained within the simulated distribution. Specifically, if p is the percentile of the true number of reported deaths in the distribution of the simulation runs, the color of the cell reflects p(1 − p). A value of 0 represents 5
10000 10000 Total deaths Total deaths 7500 7500 5000 5000 2500 2500 0 0 15 10 4 29 23 21 1/ 2 1 1 1 2 2 1 / / 1/ 2/ 3/ 2 2/ 1 2 3/ 1 2 11 12 1/ 2/ 2/ 3/ 3/ Date Date Fig. 2: Simulated trajectories of the number of deaths over time in Hubei (left, pinf = 0.019, t0 = November 15) and Lombardy (right, pinf = 0.029, t0 = January 22, dmult = 4) compared to the true number of reported deaths. Light blue lines are individual trajectories, green is the median, and the black dots are the number of reported deaths. The red dashed line represents the lockdown in each location. The true number of reported deaths is contained within the simulated distribution and lies close to the median (Pearson r = 0.998 and 0.992 respectively). that the real number of deaths is either greater or smaller than all simulation runs, and a value of 0.25 (the maximum possible) represents that the real number of deaths is exactly at the 50th percentile of the simulated distribution. Assessing fit via the mean squared error yields a similar parameter order but is more difficult to interpret. The dark cells along the diagonal have p(1−p) close to 0.25, indicating that the true number of reported deaths is well-contained within the simulated distribution across a range of param- eter settings. We examine the implications of this range of possible parameter settings for two quantities of interest: the basic reproduction number r0 and the documentation rate for symp- tomatic infections. The left side of Fig. 3 shows the inferred pre-lockdown r0 given by each set of parameters. The values are consistent with existing estimates, which largely fall in the range 2–3 (16). We calculate a single plausible range for our results, which includes all values for parameter settings where the percentile p of the true number of reported deaths in the simulated distribution satisfies p(1 − p) ≥ 0.2. For r0 , the plausible range is 2.06–2.28. 6
dmult = 1 dmult = 1 0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25 0.016 1.858 1.892 1.876 1.889 1.892 1.875 1.911 0.016 0.561 0.651 0.858 - - - - 0.017 1.954 1.991 1.997 1.968 1.964 1.980 1.973 0.017 0.260 0.326 0.423 0.525 0.659 0.813 - 0.018 2.072 2.066 2.082 2.086 2.066 2.059 2.088 0.018 0.130 0.165 0.215 0.248 0.313 0.440 0.579 0.019 2.169 2.171 2.171 2.177 2.181 2.157 2.182 0.019 0.068 0.084 0.109 0.132 0.170 0.243 0.325 pinf pinf 0.019 2.214 2.214 2.217 2.225 2.223 2.223 2.222 0.019 0.050 0.063 0.078 0.100 0.129 0.185 0.252 0.020 2.273 2.268 2.259 2.272 2.271 2.273 2.276 0.020 0.037 0.048 0.061 0.071 0.097 0.130 0.171 0.021 2.363 2.360 2.358 2.347 2.350 2.364 2.360 0.021 0.023 0.029 0.037 0.041 0.054 0.076 0.101 0.022 2.443 2.455 2.455 2.469 2.448 2.463 2.464 0.022 0.016 0.019 0.024 0.026 0.036 0.044 0.060 /8 0 2 3 5 7 9 /8 0 2 3 5 7 9 /1 /1 /1 /1 /1 /1 /1 /1 /1 /1 /1 /1 11 11 11 11 11 11 11 11 11 11 11 11 11 11 t0 t0 Fig. 3: Fit of parameters for Hubei. Left: median r0 as a function of pinf (y-axis) and t0 (x-axis). Right: median fraction of symptomatic infections documented. Colors indicate the goodness of fit, where darker colors suggest better fit. A large set of parameter settings (the dark green diagonal entries) are consistent with the data, leading to a range of possible values for r0 and the rate of documentation. 7
The right side of Fig. 3 shows the inferred rate of documentation of symptomatic infections given each set of parameters. We calculate this by dividing the actual number of confirmed cases in Hubei on the simulation end date by the total number of symptomatic infections in the simulation. The plausible range is 10.9–21.5%, lower than the range 28–33% estimated by Russell et al. (13). One explanation is that earlier studies assumed an age-homogeneous attack rate (13, 17); however, Fig. S3 shows that attack rates in our simulation are substantially higher for younger groups, attributable to their larger number of daily contacts (11, 18). Since younger individuals also have lower risk of fatality (19, 20), and because we use the total number of deaths to calibrate the model, a larger number of undocumented infections is expected. This il- lustrates how age-dependent behavioral patterns may impact estimates of important parameters, motivating the inclusion of demographic information in COVID-19 modeling. Lombardy, Italy For Lombardy, we simulate the epidemic from January 22 through March 22, with a lockdown on March 8. We vary the two parameters pinf and t0 that we examined for Hubei, but now also consider the impact of dmult . Recall that dmult is a multiplier to the fatality rate across all ages and comorbidities, which captures location-dependent variation in fatalities in excess of differences due to demographic factors. Fig. 4 shows goodness of fit for a wide range of parameters, along with the associated estimates for r0 and the rate of infection documentation. Within each figure, each heatmap corresponds to a different value of dmult , the multiplier for fatality rates relative to Hubei. It is possible that fatality rates in Lombardy are higher than in Hubei (e.g., by a factor of 3-5), or that r0 is higher in Lombardy than in Hubei. Together, the simulations suggest that both factors likely contribute. We find broad support for the hypothesis that Lombardy has experienced a more transmissible and more deadly outbreak than Hubei (21, 22). 8
dmult = 1 dmult = 3 dmult = 5 0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25 0.020 0.170 0.282 0.510 0.804 - - 0.020 0.178 0.285 0.468 0.796 - - 0.020 0.177 0.303 0.513 0.820 - - 0.021 0.101 0.181 0.287 0.512 0.876 - 0.021 0.106 0.179 0.296 0.526 0.857 - 0.021 0.109 0.176 0.312 0.515 0.877 - 0.022 0.060 0.110 0.186 0.339 0.562 - 0.022 0.063 0.109 0.199 0.328 0.599 0.992 0.022 0.063 0.112 0.191 0.325 0.583 0.994 0.023 0.038 0.065 0.119 0.218 0.377 0.691 0.023 0.038 0.065 0.118 0.214 0.379 0.678 0.023 0.038 0.064 0.121 0.219 0.388 0.702 0.024 0.024 0.043 0.075 0.129 0.257 0.481 0.024 0.025 0.044 0.079 0.141 0.247 0.468 0.024 0.025 0.043 0.074 0.142 0.266 0.485 0.025 0.016 0.028 0.050 0.098 0.182 0.320 0.025 0.017 0.028 0.050 0.092 0.169 0.360 0.025 0.016 0.029 0.049 0.093 0.177 0.340 pinf pinf pinf 0.026 0.011 0.019 0.034 0.061 0.116 0.230 0.026 0.011 0.018 0.033 0.061 0.119 0.228 0.026 0.011 0.019 0.033 0.061 0.117 0.234 0.027 0.008 0.013 0.023 0.044 0.083 0.160 0.027 0.008 0.013 0.022 0.041 0.083 0.157 0.027 0.008 0.013 0.022 0.043 0.083 0.162 0.028 0.006 0.009 0.016 0.029 0.057 0.113 0.028 0.007 0.009 0.016 0.030 0.057 0.112 0.028 0.007 0.010 0.016 0.028 0.056 0.116 0.029 0.005 0.007 0.012 0.020 0.040 0.081 0.029 0.005 0.007 0.012 0.021 0.040 0.081 0.029 0.005 0.007 0.012 0.021 0.040 0.082 0.030 0.005 0.006 0.009 0.015 0.029 0.057 0.030 0.005 0.006 0.009 0.015 0.029 0.056 0.030 0.005 0.006 0.009 0.015 0.029 0.058 0.031 0.004 0.005 0.007 0.011 0.020 0.043 0.031 0.004 0.005 0.007 0.011 0.022 0.043 0.031 0.004 0.005 0.007 0.011 0.021 0.041 10 13 16 19 22 25 10 13 16 19 22 25 10 13 16 19 22 25 1/ 1/ 1/ 1/ 1/ 1/ 1/ 1/ 1/ 1/ 1/ 1/ 1/ 1/ 1/ 1/ 1/ 1/ t0 t0 t0 dmult = 1 dmult = 3 dmult = 5 0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25 0.020 2.313 2.314 2.305 2.322 2.309 2.320 0.020 2.327 2.316 2.318 2.311 2.322 2.311 0.020 2.308 2.309 2.310 2.308 2.316 2.321 0.021 2.408 2.416 2.419 2.415 2.420 2.410 0.021 2.403 2.410 2.410 2.410 2.419 2.417 0.021 2.414 2.407 2.410 2.408 2.408 2.413 0.022 2.514 2.512 2.518 2.513 2.520 2.496 0.022 2.500 2.511 2.506 2.509 2.500 2.500 0.022 2.502 2.510 2.503 2.517 2.504 2.507 0.023 2.598 2.610 2.613 2.604 2.600 2.605 0.023 2.604 2.616 2.602 2.602 2.607 2.614 0.023 2.600 2.604 2.600 2.606 2.596 2.598 0.024 2.702 2.700 2.703 2.707 2.696 2.708 0.024 2.699 2.708 2.697 2.706 2.711 2.700 0.024 2.703 2.699 2.706 2.695 2.695 2.689 0.025 2.794 2.809 2.800 2.801 2.788 2.810 0.025 2.794 2.802 2.806 2.797 2.801 2.786 0.025 2.806 2.790 2.794 2.796 2.790 2.787 pinf pinf pinf 0.026 2.901 2.900 2.901 2.906 2.897 2.888 0.026 2.893 2.896 2.890 2.894 2.889 2.884 0.026 2.888 2.892 2.895 2.886 2.895 2.884 0.027 2.993 2.998 2.995 2.989 2.990 2.991 0.027 2.988 2.996 2.987 2.985 2.984 2.977 0.027 2.990 2.986 2.987 2.981 2.979 2.983 0.028 3.095 3.083 3.089 3.087 3.085 3.076 0.028 3.080 3.079 3.094 3.077 3.077 3.073 0.028 3.081 3.089 3.075 3.077 3.087 3.068 0.029 3.180 3.187 3.182 3.181 3.174 3.179 0.029 3.171 3.180 3.177 3.179 3.173 3.163 0.029 3.174 3.173 3.175 3.171 3.174 3.171 0.030 3.274 3.283 3.283 3.280 3.271 3.269 0.030 3.276 3.272 3.277 3.281 3.265 3.269 0.030 3.277 3.267 3.276 3.275 3.275 3.263 0.031 3.367 3.369 3.379 3.365 3.376 3.363 0.031 3.367 3.369 3.365 3.373 3.366 3.355 0.031 3.368 3.362 3.367 3.367 3.363 3.360 10 13 16 19 22 25 10 13 16 19 22 25 10 13 16 19 22 25 1/ 1/ 1/ 1/ 1/ 1/ 1/ 1/ 1/ 1/ 1/ 1/ 1/ 1/ 1/ 1/ 1/ 1/ t0 t0 t0 Fig. 4: Top row: fraction of symptomatic infections that become documented in Lombardy as a function of pinf (y axis) and t0 (x axis). Bottom row: r0 during the pre-lockdown phase. From left to right: a mortality multiplier dmult relative to China of 1, 3, 5. Colors indicate goodness of fit of the parameter settings to the true number of reported deaths on March 21 in Lombardy, with darker cells indicating a better fit. We again find a wide range of possible parameterizations given the observed data, with a plausible range of 1.14-6.68% for the documentation rate and of 2.50-3.37 for r0 . 9
Documentation rates are lower under most well-fitting parameter settings than previously reported for Italy overall. Russell et al. (13) reported a 95% confidence interval of 4.6–5.9%, while our model yields a plausible range of 1.14–6.68%. Our plausible ranges broadly agree with (13) in that documentation rates are lower in Lombardy than in China; however, our range contains substantially lower values. By accounting for demographic factors, greater disease prevalence in Lombardy is possible. For r0 , we obtain the plausible range 2.50–3.37, which is disjointed with (and higher than) the corresponding plausible range for Hubei. Containment Policies in Lombardy: Salutary Sheltering and Physical Distancing Various interventions – from complete lockdown to physical distancing recommendations – have been implemented worldwide in response to COVID-19. Within these are a range of al- ternatives. For example, a government could encourage some percentage of a given age group to remain sheltered in place, while the rest of the population could continue in-person work and social activities. Age-specific policies are particularly relevant because they have already been employed in some countries (e.g., US CDC recommendations that people above 65 years old shelter in place (23)) and because older age groups are more likely to be able to telecom- mute (24, 25). Here, we investigate to what extent the epidemic in Lombardy can be mitigated by encour- aging a single age group to engage in “salutary sheltering” – a term we coin here to describe individuals who shelter in place irrespective of their exposure or infectious state – or whether the entire population must also be asked to adopt physical distancing. Accordingly, we compare two scenarios. First, we simulate salutary sheltering for a fraction of a single age cohort while leaving the rest of the population’s behavior unaltered. Second, we simulate salutary sheltering for a fraction of a single age cohort and physical distancing measures among the rest of the 10
population. We model physical distancing as reducing daily contacts by a factor of two for all individuals (who are not sheltering). The supplementary text discusses concrete suggestions to implement such physical distancing in various settings along with details of the experiments described. While this case study applies to Lombardy, it could be extended to other locations using population-specific demographic data. Fig. 5 shows the first policy scenario, where a portion of a single age group engages in salu- tary sheltering but the rest of the population in Lombardy does not adopt physical distancing. The left plots (Figs. 5(a) and 5(c)) show salutary sheltering of 50% of a single age group, while the right plots (Figs. 5(b) and 5(d)) show salutary sheltering of 100% of a single age group. We evaluate these interventions in Lombardy according to two metrics: the percentage of the population infected (top row) and the total number of deaths (bottom row). Each colored line shows the impact of salutary sheltering of a single given age group. Additionally, we consider three baseline policies: the gray dotted curve reflects a baseline scenario with no intervention (no salutary sheltering, no physical distancing) and the blue and pink dotted curves correspond to baselines where 50% and 100% of the entire population engages in salutary sheltering, re- spectively (with no additional physical distancing). For Lombardy, we observe that even complete sheltering in place of any single age group leaves at least 60% of the population infected (Fig. 5b), and nearly 70-80% in the more realistic scenario where only half the group engages in sheltering (Fig. 5a). Notably, the only policies that limit the percentage of infected, or the total number of deaths, would require sheltering 50% ((a) and (c)) to 100% ((b) and (d)) of the population in Lombardy. Except for these two scenarios, illustrated by the blue and pink dotted curves respectively, the total number of deaths as projected by our model in Lombardy is correspondingly large (above 200 thousand) in the absence of any level of physical distancing and can only be significantly ameliorated by completely sheltering the entire population of 70 years and older (Fig. 5d). 11
100 100 Percentage of infected Percentage of infected Absence of Absence of intervention intervention 80 Ages 0-14 80 Ages 0-14 Ages 15-29 Ages 15-29 Ages 30-49 Ages 30-49 60 Ages 50-69 60 Ages 50-69 Ages 70+ Ages 70+ 40 All ages 40 All ages 50% sheltered 50% sheltered All ages All ages 100% sheltered 100% sheltered 20 20 0 0 0 20 40 60 80 100 120 0 20 40 60 80 100 120 Days since patient zero Days since patient zero (a) (b) 400 400 Total deaths (thousands) Total deaths (thousands) Absence of Absence of 350 intervention 350 intervention Ages 0-14 Ages 0-14 300 Ages 15-29 300 Ages 15-29 250 Ages 30-49 250 Ages 30-49 Ages 50-69 Ages 50-69 200 Ages 70+ 200 Ages 70+ All ages All ages 150 50% sheltered 150 50% sheltered All ages All ages 100 100% sheltered 100 100% sheltered 50 50 0 0 0 20 40 60 80 100 120 0 20 40 60 80 100 120 Days since patient zero Days since patient zero (c) (d) Fig. 5: Effect of salutary sheltering for a fraction of each age group on infections in Lombardy (top row, (a) and (b)) and deaths (bottom row, (c) and (d)), without physical distancing by others. Left-hand side, (a) and (c): salutary sheltering of 50% of a single age group. Right-hand side, (b) and (d): salutary sheltering of 100% of a single age group. The red vertical dashed line indicates the start of the intervention. Each solid colored curve represents sheltering a specific age cohort, the gray dotted curve presents the “no intervention” scenario, and the blue and pink dotted curves correspond to baselines where 50% and 100% of the population is sheltered, respectively. 12
100 100 Percentage of infected Percentage of infected Absence of Absence of intervention intervention 80 Ages 0-14 80 Ages 0-14 Ages 15-29 Ages 15-29 Ages 30-49 Ages 30-49 60 Ages 50-69 60 Ages 50-69 Ages 70+ Ages 70+ 40 All ages 40 All ages 50% sheltered 50% sheltered All ages All ages 100% sheltered 100% sheltered 20 20 0 0 0 20 40 60 80 100 120 0 20 40 60 80 100 120 Days since patient zero Days since patient zero (a) (b) 400 400 Total deaths (thousands) Total deaths (thousands) Absence of Absence of 350 intervention 350 intervention Ages 0-14 Ages 0-14 300 Ages 15-29 300 Ages 15-29 250 Ages 30-49 250 Ages 30-49 Ages 50-69 Ages 50-69 200 Ages 70+ 200 Ages 70+ All ages All ages 150 50% sheltered 150 50% sheltered All ages All ages 100 100% sheltered 100 100% sheltered 50 50 0 0 0 20 40 60 80 100 120 0 20 40 60 80 100 120 Days since patient zero Days since patient zero (c) (d) Fig. 6: Effect of salutary sheltering for a fraction of each age group on infections in Lombardy (top row, (a) and (b)) and deaths (bottom row, (c) and(d)), when combined with physical dis- tancing by the rest of the population. Left-hand side, (a) and (c): salutary sheltering of 50% of a single age group. Right-hand side, (b) and (d): salutary sheltering of 100% of a single age group. The red vertical dashed line indicates the start of the intervention. Each solid colored curve represents sheltering a specific age cohort, the gray dotted curve presents the “no inter- vention” scenario, and the blue and pink dotted curves correspond to baselines where 50% and 100% of the population is sheltered, respectively. 13
However, combining partial salutary sheltering of a single age group with physical distanc- ing by the rest of the population has a substantially greater impact for Lombardy, as shown in Fig. 6. The fraction of the population infected drops to 50% or below (depending on which age group and what percentage is sheltered) four months following the beginning of the outbreak. The total number of deaths projected for this time horizon decreases even more substantially, to the range of 100–150 thousand (as compared to 275–360 thousand for the previous scenario with 50% sheltering by a single age group and no population-wide physical distancing). In a scenario where physical distancing is adopted by everyone in Lombardy, there is only a small benefit to sheltering 100% of an age group (as opposed to 50%), particularly with respect to total deaths (6b and 6d), instead of 50% (6a and 6c). The age group sheltered need not be 70+ either; in fact, 50% salutary sheltering of the 30–49 age group in Lombardy has a somewhat larger impact. Differences in the effectiveness of sheltering different age groups in a particular location stem from the interplay of contact patterns (the 30–49 group has many daily phys- ical interactions in Italy (11)), risk of mortality from the disease (older groups are at higher risk (3, 23)), and the size of the age group (the 30–49 group is the second largest generation in Italy (26)). However, they are not reducible to any single factor. Figs. S4–S7 show a sensitivity analysis incorporating changes in model parameters due to potential underreporting or overre- porting of deaths. Our main conclusions for Lombardy remain unaltered even after accounting for reporting discrepancies. Building upon Lombardy, our model suggests that hybrid policies that combine targeted salutary sheltering by one part of the population and physical distancing by the rest could be as effective at limiting the final outbreak size as salutary sheltering of an entire sub-population, while preserving social ties and avoiding complete disruption of the economy. Our analysis can be readily extended to other locations by parameterizing our model for a new population using existing demographic data and age-stratified contact patterns. 14
Discussion and Future Work In this study, we developed a model of SARS-CoV2 transmission that incorporates household structure, age distributions, comorbidities, and age-stratified contact patterns in Hubei, China, and Lombardy, Italy, and created simulations using available demographic information from these two locations. Our findings suggest that population-wide sheltering in place may not be necessary to mitigate the spread of COVID-19. Instead, targeted salutary sheltering of specific age groups combined with adherence to physical distancing by the rest of the population may be sufficient to thwart a substantial fraction of infections and deaths. Physical distancing could be achieved by engaging in activities such as staggered work schedules, increasing spacing in restaurants, and prescribing times to use the gym or grocery store. Specific mechanisms and considerations for implementing physical distancing are documented in the supplementary text. Moreover, we note that depending on the demography and contact structure of the population, the groups targeted for salutary sheltering need not be the oldest in the population to be effective. From a pragmatic perspective, targeted salutary sheltering may not be realistic for all pop- ulations. Its feasibility relies on access to safe shelter, which does not reflect reality for all individuals. In addition, sociopolitical realities may render this recommendation more feasi- ble in some populations than in others. Concerns for personal liberty, discrimination against sub-segments of the population, and societal acceptability may prevent the adoption of targeted salutary sheltering in some regions of the world. Allowing salutary sheltering to operate on a voluntary basis using a shift system (rather than for indefinite time periods) may address some of these issues. Future work should formulate targeted recommendations about salutary shelter- ing and physical distancing by age group or other stratification adapted to a specific country’s workforce. Existing modeling work of COVID-19 has largely focused on simpler compartmental or 15
branching process models which do not allow for the simulation of targeted policies as de- scribed above. While these models have played an important role in estimating key parameters such as r0 (5, 7) or the rate at which infections are documented (27), and in the evaluation of prospective non-pharmaceutical interventions (28, 29), they do not characterize how differ- ences in demography impact the course of an epidemic in a particular location. Our focus on population-specific demography allows for further refinement of current mortality estimates and is a strength of this study. r0 estimates in this study are generally comparable to other estimates in the literature (30), although our model yields higher estimates for Lombardy than Hubei – possibly due to differential mask-wearing practices (31) or adoption of behavioral interventions such as hand hygiene (32). Similarly, our model suggests a higher mortality rate in Lombardy than Hubei for reasons beyond demography. Possible explanations include variability in ICU capacity (33), coordination challenges in Lombardy (33), documented prevalence of antibiotic resistance in secondary bacterial infections (34), and perhaps greater background population exposure to other novel coronaviruses in China (35). Reporting rates estimated in this study were generally lower than those in prior studies (13). This is likely due the fact that our simula- tions produce infection rates that are not homogeneous across age groups, attributable to higher reported contacts for younger groups (11, 18). These differences highlight the potential impact of demography on COVID-19 dynamics, a key feature of our model. Another advantage of our framework is its flexibility. The model is modifiable to test differ- ent policies or simulate additional features with greater fidelity across a variety of populations. Examples of future work that can be accommodated include analysis of contact tracing and testing policies, health system capacity, and multiple waves of infection after lifting physical distancing restrictions. Our model includes the necessary features to simulate these scenarios while remaining otherwise parsimonious, a desirable feature given uncertainties in data report- ing. 16
This study has several limitations. While several comorbidities associated with mortality in COVID-19 were accounted for, the availability of existing data limited the incorporation of all relevant comorbidities. Most notably, chronic pulmonary disease was not included although it has been associated with mortality in COVID-19 (19), nor was smoking, despite its preva- lence in both China and Italy (36, 37). Gender-mediated differences were also excluded, which may be important for both behavioral reasons (e.g., adoption of hand-washing (38, 39)) and biological reasons (e.g, the potential protective role of estrogen in SARS-CoV infections (40)). Nevertheless, these factors can all be incorporated into the model as additional data becomes available. It is also worth noting that we have not yet attempted to model super-spreader events in our existing framework. Such events may have been especially consequential in South Korea (41), and future work could attempt to more closely model the epidemic there by incorporating a dispersion parameter into the contact distribution, a method which has been used in other models (5). Despite these limitations, this study demonstrates the importance of considering population and household demographics when attempting to better define outbreak dynamics for COVID- 19. Furthermore, this model highlights potential policy implications for non-pharmaceutical interventions that account for population-specific demographic features and may provide alter- native strategies for national and regional governments moving forward. References and Notes 1. D. Baud, et al., The Lancet (2020). 2. Center for Systems Science and Engineering at Johns Hopkins University, Coronavirus COVID-19 Global Cases (2020). https://coronavirus.jhu.edu/map.html. 17
3. F. Zhou, et al., The Lancet (2020). 4. B. Xu, et al., Scientific Data 7 (2020). 5. J. Riou, C. Althaus, Eurosurveillance 25 (2020). 6. R. Li, et al., Science (2020). 7. A. Kucharski, et al., The Lancet Infectious Diseases (2020). 8. P. Van den Driessche, M. Li, J. Muldowney, Canadian Applied Mathematics Quarterly 7, 409 (1999). 9. F. Ball, E. Knock, P. O’Neill, Mathematical Biosciences 266, 23 (2015). 10. G. Roth, et al., The Lancet 392, 1736 (2018). 11. K. Prem, A. Cook, M. Jit, PLoS Computational Biology 13, e1005697 (2017). 12. South China Morning Post, Coronavirus: China’s first confirmed COVID-19 case traced back to November 17 (2020). https:// www.scmp.com/news/china/society/article/3074991/ coronavirus-chinas-first-confirmed-covid-19-case-traced-back. 13. T. Russell, et al., Using a delay-adjusted case fatality ratio to estimate under-reporting (2020). https://cmmid.github.io/topics/covid19/severity/global_ cfr_estimates.html. Accessed 03-26-20. 14. H. Bai, et al., Radiology (2020). 15. G. Onder, G. Rezza, S. Brusaferro, The Journal of the American Medical Association (2020). 18
16. M. S. Majumder, K. D. Mandl, The Lancet Global Health . 17. R. Verity, et al., medRxiv (2020). 18. J. Mossong, et al., PLoS medicine 5 (2008). 19. C. C. for Disease Control, Prevention, China CDC Weekly 2, 113 (2020). http://weekly.chinacdc.cn//article/id/ e53946e2-c6c4-41e9-9a9b-fea8db1a8f51. 20. W. H. O. China, Report of the WHO-China Joint Mis- sion on Coronavirus Disease 2019 (COVID-19) (2020). https: //www.who.int/docs/default-source/coronaviruse/ who-china-joint-mission-on-covid-19-final-report.pdf. 21. J. Horowitz, E. Bubola, E. Polvedo, The New York Times (2020). https://www.nytimes.com/2020/03/21/world/europe/ italy-coronavirus-center-lessons.html. 22. V. Di Donato, S. McKenzie, L. Borghese, CNN (2020). https://www.cnn.com/2020/03/28/europe/ italy-coronavirus-cases-surpass-china-intl/index.html. 23. C. for Disease Control, Prevention, People who are at higher risk for se- vere illness (2020). https://www.cdc.gov/coronavirus/2019-ncov/ need-extra-precautions/people-at-higher-risk.html. 24. P. Mateyka, M. Rapino, L. C. Landivar, Home-based workers in the United States (2012). https://www.census.gov/prod/2012pubs/p70-132.pdf. 19
25. U. B. of Labor Statistics, Labor force statistics from the current population survey (2019). https://www.bls.gov/cps/cpsaat08.htm. 26. United Nations (2019). https://population.un.org/wpp/. 27. P. De Salazar, R. Niehus, A. Taylor, C. Buckee, M. Lipsitch, medRxiv (2020). 28. S. Kissler, C. Tedijanto, M. Lipsitch, Y. Grad, medRxiv (2020). 29. J. Hellewell, et al., The Lancet Global Health (2020). 30. M. Majumder, K. Mandl, SSRN (2020). 31. S. Feng, et al., The Lancet Respiratory Medicine (2020). 32. G. Di Giuseppe, R. Abbate, L. Albano, P. Marinelli, I. Angelillo, BMC Infectious Diseases 8, 36 (2008). 33. G. Grasselli, A. Pesenti, M. Cecconi, The Journal of the American Medical Association (2020). 34. A. Cassini, et al., ECDC country visit to Italy to discuss antimicrobial resis- tance issues (2017). https://www.ecdc.europa.eu/sites/default/files/ documents/AMR-country-visit-Italy.pdf. 35. V. Cheng, S. Lau, P. Woo, K. Y. Yuen, Clinical Microbiology Reviews 20, 660 (2007). 36. M. Parascandola, L. Xiao, Translational Lung Cancer Research 8, S21 (2019). 37. A. Lugo, et al., Tumori Journal 103, 353 (2017). 38. M. Guinan, M. McGuckin-Guinan, A. Sevareid, American Journal of Infection Control 25, 424 (1997). 20
39. D. Johnson, D. Sholcosky, K. Gabello, R. Ragni, N. Ogonosky, Perceptual and Motor Skills 97, 805 (2003). 40. R. Channappanavar, et al., The Journal of Immunology 198, 4046 (2017). 41. British Broadcasting Corporation, Coronavirus: South Korea emergency measures as infec- tions increase (2020). https://www.bbc.com/news/world-asia-51582186. 42. Y. Bai, et al., The Journal of the American Medical Association (2020). 43. C. Rothe, et al., New England Journal of Medicine (2020). 44. Z. Du, et al., Emerging Infectious Diseases (2020). 45. P. Allison, Survival analysis using SAS: a practical guide (SAS Institute, 2010). 46. D. Collett, Modelling survival data in medical research (CRC Press, 2015). 47. S. Lauer, et al., Annals of Internal Medicine (2020). 48. Y. Liu, R. Eggo, A. Kucharski, The Lancet (2020). 49. Z. Hu, X. Peng, The Journal of Chinese Sociology 2, 9 (2015). 50. D. He, X. Zhang, Z. Wang, Y. Jiang, China Population and Development Studies 2, 430 (2019). 51. Statista, Household structures in Italy in 2018 (2018). https://www.statista. com/statistics/730604/family-structures-italy/, Last Accessed: 2020-03-28. 52. Statista, Number of single-person households in Italy from 2012 to 2018 (2018). https://www.statista.com/statistics/728061/ 21
number-of-single-person-households-italy/, Last Accessed: 2020- 03-28. 53. Statista, Number of couples with children in Italy from 2012 to 2018, by num- ber of children (2018). https://www.statista.com/statistics/570106/ number-of-couples-with-children-italy/, Last Accessed: 2020-03-28. 54. Statista, Biennial average number of household members in Italy from 2012 to 2018 (2018). https://www.statista.com/statistics/671945/ biennial-average-number-of-families-with-children-italy/, Last Accessed: 2020-03-28. 55. Statista, Number of single parents in Italy from 2011 to 2018, by number of children (2018). https://www.statista.com/statistics/570234/ number-of-single-parents-with-children-in-italy-by-number-of-children/, Last Accessed: 2020-03-28. 56. E. Carrà, M. Lanz, S. Tagliabue, Journal of Comparative Family Studies 45, 235 (2014). 57. Y. Xu, et al., The Journal of the American Medical Association 310, 948 (2013). 58. Z. Wang, et al., Circulation 137, 2344 (2018). 59. P. Modesti, et al., International Journal of Hypertension (2017). 60. Y. Tatsumi, T. Ohkubo, Hypertension Research 40, 795 (2017). 61. C. C.-. R. Team, Severe outcomes among patients with coronavirus disease 2019 (COVID- 19)—United States, February 12–March 16, 2020 (2020). https://www.cdc.gov/ mmwr/volumes/69/wr/mm6912e2.htm. 22
62. X. Peng, Science 333, 581 (2011). 63. New York Times, China tightens Wuhan lockdown in wartime battle with coron- avirus (2020). https://www.nytimes.com/2020/02/06/world/asia/ coronavirus-china-wuhan-quarantine.html. 64. J. Zhang, et al., medRxiv (2020). 65. Italian National Institute of Statistics, Median age in Lombardy (2019). https: //www4.istat.it/it/lombardia/dati?qt=gettable&dataset=DCIS_ INDDEMOG1&dim=21,0,0, Last Accessed: 2020-03-28. 66. C. W. Factbook, Field listing: median age (2020). https://www.cia.gov/ library/publications/the-world-factbook/fields/343.html, Last Accessed: 2020-03-28. 67. L. Tondo, The Guardian (2020). https://www.theguardian.com/world/ 2020/mar/18/italy-charges-more-than-40000-people-violating-lockdown-coro 68. Wikipedia, 2020 coronavirus pandemic in italy (2020). https://en.wikipedia. org/wiki/2020_coronavirus_pandemic_in_Italy, Last Accessed: 2020-03- 29. 69. Wikipedia, Timeline of the 2019–20 coronavirus pandemic in November 2019 – Jan- uary 2020 (2020). https://en.wikipedia.org/wiki/Timeline_of_the_ 2019-20_coronavirus_pandemic_in_November_2019_-_January_ 2020, Last Accessed: 2020-03-28. 70. F. Carinci, British Medical Journal (2020). 71. J. Miller, Bulletin of Mathematical Biology 74, 2125 (2012). 23
72. A. Remuzzi, G. Remuzzi, The Lancet (2020). 73. J. Lourenco, et al., medRxiv (2020). 74. T. I. C. C.-. R. Team, Report 12: The global impact of COVID-19 and strategies for mitigation and suppression (2020). https://www. imperial.ac.uk/mrc-global-infectious-disease-analysis/ news--wuhan-coronavirus/. 75. U.S. Bureau of Labor Statistics, American time use survey (2018). https://www.bls. gov/tus/. 76. N. Qualls, et al., CDC MMWR Recommendations and Reports 66, 1 (2017). 77. Oklahoma County Department of Health, Social distancing fact sheet (2019). https://www.occhd.org/application/files/3715/7013/9751/ Social_Distancing.pdf, Last Accessed: 2020-04-05. 78. F. Ahmed, N. Zviedrite, A. Uzicanin, BMC Public Health 18, 518 (2018). 79. P. Totterdell, Handbook of work stress, J. Barling, E. K. Kelloway, M. R. Frone, eds. (Sage publications, 2004), chap. 3. 80. L. DeRigne, P. Stoddard-Dare, L. Quinn, Health Affairs 35, 520 (2016). 81. M. W. Fong, et al., Emerging Infectious Diseases 26 (2020). 82. F. Aimone, Public Health Reports 125, 71 (2010). 83. L. Ecola, T. Light, RAND Corporation pp. 1–45 (2009). 24
84. NBC News, Walmart will limit customers and create one-way traffic in- side its stores (2020). https://www.nbcnews.com/news/us-news/ walmart-will-limit-customers-create-one-way-traffic-inside-its-n1176461 85. NBC Connecticut, New social distancing measures take effect at stores across Connecticut (2020). https://www.nbcconnecticut.com/news/local/ new-social-distancing-measures-enacted-at-stores-across-connecticut/ 2249937/. 86. A. Selyukh, Supermarkets add ’senior hours’ for vulnera- ble shoppers (2020). https://www.npr.org/sections/ coronavirus-live-updates/2020/03/19/818488098/ supermarkets-add-senior-hours-for-vulnerable-shoppers. 87. Centers for Disease Control and Prevention (2007). 88. WGNTV, This student created a network of ‘shopping angels’ to help the elderly get groceries during the COVID-19 pandemic (2020). https://wgntv.com/news/ this-student-created-a-network-of-shopping-angels-to-help-the-elderly-g 89. American Broadcasting Company, Santa Monica volunteers protect se- niors with new grocery delivery service (2020). https://abc7.com/ coronavirus-seniors-covid-19-groceries/6045815/. 90. K. Gardner, K. Lister, 2017 state of telecommuting in the U.S. employee workforce (2017). https://www.flexjobs.com/2017-State-of-Telecommuting-US/. 91. T. Noah, ‘it makes me very angry’: Coronavirus damage ripples across the workforce (2020). https://www.politico.com/news/2020/03/09/ america-workers-outbreak-uncertainty-124191. 25
92. Drew Desilver, As coronavirus spreads, which U.S. work- ers have paid sick leave – and which don’t? (2020). https://www.pewresearch.org/fact-tank/2020/03/12/ as-coronavirus-spreads-which-u-s-workers-have-paid-sick-leave-and-which 93. D. Card, Carnegie-Rochester Conference Series on Public Policy (Elsevier, 1990), vol. 33, pp. 137–168. 94. C. for Disease Control, Prevention, Interim pre-pandemic planning guidance: Commu- nity strategy for pandemic influenza mitigation in the united states— early, targeted, lay- ered use of nonpharmaceutical interventions (2007). https://www.cdc.gov/flu/ pandemic-resources/pdf/community_mitigation-sm.pdf. 95. Q. Li, et al., New England Journal of Medicine (2020). Acknowledgments Funding: This work was supported in part by the Army Research Office by grant MURI W911NF1810208 and in part by grant T32HD040128 from the Eunice Kennedy Shriver Na- tional Institute of Child Health and Human Development, National Institutes of Health. Killian was supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE1745303. Perrault and Jabbari were supported by the Harvard Center for Research on Computation and Society. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Author contributions: In accordance with the ICMJE authorship guidelines, BW, MC, JAK, HCO, AM, SJ, and AP acquired, analyzed, and interpreted data for the work. BW, MC, SJ, and AP also drafted the work. AND, MT, and MSM made substantial contributions to the concep- tion or design of the work and revised the work critically for important intellectual content. 26
Competing interests: All authors declare no competing interests. Data and materials availability: All code and data are available at https://github. com/bwilder0/COVID19-Demography. Supplementary Materials Materials and Methods Supplementary Text Figs. S1 to S7 Tables S1 to S3 References (41-95) 27
Supplementary Materials for Modeling the Impact of Demography on COVID-19 Dynamics in Hubei and Lombardy Bryan Wilder1∗ , Marie Charpignon2 , Jackson A. Killian1 , Han-Ching Ou1 , Aditya Mate1 , Shahin Jabbari1 , Andrew Perrault1 , Angel N. Desai3 , Milind Tambe1∗ , Maimuna S. Majumder4,5∗ 1 School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA 2 MIT Institute for Data, Systems, and Society, Cambridge, MA, USA 3 International Society for Infectious Diseases, Brookline, MA, USA 4 Department of Pediatrics, Harvard Medical School, Boston, MA, USA 5 Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, USA *Corresponding author. Email: bwilder@g.harvard.edu (B.W.); milind_tambe@harvard.edu (M.T.); Maimuna.Majumder@childrens.harvard.edu (M.S.M.) This PDF file includes: Materials and Methods Supplementary Text Figs. S1 to S7 Tables S1 to S3 1
Materials and Methods Model description We develop an agent-based model for COVID-19 spread which accounts for the distributions of age, household types, comorbidities, and contact between different age groups in a given population. The model follows a susceptible-exposed- infectious-removed (SEIR) template [8, 9]. Specifically, we simulate a population of n agents (or individuals), each with an age ai , a set of comorbidities ci , and a household (a set of other agents). We stratify age into ten-year intervals and incorporate hypertension and diabetes as comorbidities. These comorbidities are common worldwide [10] and have been associated with a higher risk of in-hospital death for COVID-19 patients [3]. However, our model can be expanded to include other comorbidities of interest in the future. The specific procedure we use to sample agents from the joint distribution of age, household structures, and comorbidities is described below. The simulation tracks two states for each individual: the infection state and the isolation state. The infection state is divided into {susceptible, exposed, infectious, removed}. Susceptible individuals are those who have never been contacted by an infectious individual. Exposed individuals are those who have had contact with an infectious individual, though not all exposed individuals become infectious. If an exposed individual contracts the disease, they proceed to the infectious state.1 Infectious is further subdivided into severity levels {asymptomatic, mild, severe, critical}. We interpret mild severity as symptomatic (but not requiring hospitalization), severe as requiring hospitalization, and critical as eligible for intensive care unit (ICU) care. The removed state is further subdivided into {recovered, deceased}. Individuals in all severity levels can transmit the disease, but those in the asymptomatic state do so at a rate α < 1 times that of symptomatic cases. The decision to incorporate reduced transmission for asymptomatic individuals is based on the fact that, though infection by asymptomatic individuals has been observed in case clusters and in examinations of serial intervals [42, 43, 44], available evidence suggests that individuals with no or limited symptoms are less infectious than those with severe symptoms [6]. Currently, our simulation incorporates two levels of infectiousness (before and after the onset of symptoms), but it can be adjusted as better information on how viral shedding increases with severity of illness becomes available. We acknowledge that our assumptions surrounding transmissibility and disease severity – as derived from existing literature – may serve as a limitation of our model, as many of these factors are evolving over time. Each individual has a separate isolation state {isolated, not isolated}. If isolated, the individual is unable to infect others. We assume that (1) asymptomatic individuals are never isolated, (2) mild individuals become isolated over a mean time of λisolate days (see Table 1) after the onset of symptoms, and (3) all severe and critical individuals are isolated. However, our simulation framework can easily accommodate different sets of assumptions about isolation (for example, preemptively isolating exposed individuals if they are known to have had contact with an infectious agent). The disease is transmitted over a contact structure, which is divided into in-household and out-of-household groups. Each agent has a household consisting of a set of other agents. Individuals infect members of their households at a higher rate than out-of-household agents. We model out-of-household transmission using country-specific estimated contact matrices [11]. These matrices state the mean number of daily contacts an individual of a particular age strata has with individuals from each of the other age strata. We assume demographics (including age and household distribution) in Hubei and Lombardy are well-approximated by country-level data. The model iterates over a series of discrete time steps, each representing a single day, from a starting time t0 to an end time T . There are two main components to each time step: disease progression and new infections. The progression component is modeled by drawing two random variables for each individual each time they change severity levels (e.g. on entering the mild state). The first random variable is Bernoulli and indicates whether the individual will recover or progress to the next severity level. The second variable represents the amount of time until progression to the next severity level. We use exponential distributions for almost all time-to-event distributions, a common choice in the absence of specific distributional information [45, 46]. The exception is the incubation time between asymptomatic and 2 mild states, where more specific information is available; here, we use a log-normal distribution (see µe→m and σe→m in Table 1) based on estimates by [47]. Table 1 summarizes all distributions and their parameters. In the new infections component, individuals in the susceptible state may enter the exposed state. Infected individuals infect each of their household members with probability ph at each time step. ph is calibrated so that the total probability of infecting a household member before either isolation or recovery matches the estimated secondary attack rate for household members of COVID-19 patients (i.e., the average fraction of household members infected) [48]. Infected individuals draw outside-of-household contacts from the general population using the country-specific contact matrix. For an infected individual of age group i, we sample wij ∼ Poisson(Mij ) contacts for each age group j, 1 Currently, our simulation implementation does not separately track individuals who are exposed but do not become infected, and instead groups them with the susceptible population. This is because we assume that, if exposed again, they will become infected with the same probability as an individual who has never been exposed. However, the implementation can be modified to support either differing probabilities of contracting the disease after first exposure or policies that treat exposed and susceptible individuals differently. 2
where M is the country-specific contact matrix. Poisson distributions are a standard choice for modeling contact distributions [11]. Then, we sample wij contacts of age j uniformly with replacement, and each contact is infected with the probability pinf , the probability of infection given contact. This probability is an unknown; our experiments test a range of values, and we report which values give results consistent with observations from Lombardy and Hubei. Sampling agents Our process for sampling agents follows three steps that successively sample households, individual agents within households, and comorbidities for each agent. Because the full joint distributions over all of these quantities are not known, we implement a sampling procedure that respects the marginal distributions of household structure and age, as well as the marginal distribution for the occurrence of comorbidities within each age group. First, we use information on the distribution of household structures to draw a type of household (e.g., single person, couple, nuclear family, or multigenerational family). Second, we sample the ages of the individual agents according to their role in the household (e.g., parent, child, or grandparent) combined with information about the age distribution of the population and the intergenerational interval. For China, we use household distributions from the 2010 Chinese census [49], intergenerational intervals from [50], and the age distribution provided by UN population statistics [26]. For Italy, we use demographic statistics from Statista online portal about the following: household structure distribution [51], single-person households [52], couples with children [53] and corresponding family size [54], and single parents with children [55]. Furthermore, we assume that children could stay within the family until the age of 30 and that couples without children were aged 30+, to account for societal patterns reported in familial studies which may have affected household distribution metrics [56]. Third, we sample comorbidities from the corresponding country- and age-specific distributions. For China, we use estimates on age-specific prevalence of diabetes [57] and hypertension [58]. For Italy, we use estimates from the Global Burden of Disease study on diabetes [10] and a recent study of age-stratified hypertension prevalence [59]. We ensure that diabetes and hypertension are appropriately correlated using a single global estimate for the probability of hypertension in individuals with diabetes [60]. Estimating disease progression from age and comorbidities Many of the parameters for this model are assigned values based on estimates in the literature, shown in Table 1. However, we currently lack a detailed understanding of the joint impact of age and comorbidities on disease progression and mortality. Currently, case fatality rates (CFRs) are available either by age or by individual comorbidity, but not for each specific combination of age and comorbidities. To obtain these estimates, we model the CFR with a logistic regression fit to CFRs from the Chinese Center for Disease Control and Prevention (China CDC) [19], discussed in the next section. This model yields pm→d (ai , ci ), the country-independent probability that an individual i of age ai and comborbidity status ci will die if infected with SARS-CoV-2. Corrections for country-specific differences in mortality are handled via the parameter dmult . The simulation also requires specific values for the probabilities of transitioning between the disease states mild, severe, critical, and death. However, there is currently insufficient information available to infer the probabilities of these individual transitions for each combination of age and comorbidity. We assume that while the absolute values of these probabilities may vary based on age and comorbidity, the ratios between them do not exhibit such strong dependency. In particular, we assume that there are coefficients γs→c (ai ) and γc→d such that ps→c (ai , ci ) = γs→c (ai )pm→s (ai , ci ) and pc→d (ai ) = γc→d pm→s (ai , ci ). We allow γs→c (ai ) to be age-specific while assuming that γc→d is age-homogeneous because of the information currently available to estimate them; namely, we estimate γs→c (ai ) based on the relative probabilities of hospitalization and ICU admission by age group in the US [61] and γc→d based on the probability of death for all critical patients in China [19]. Note that we assume both coefficients to be independent of the comorbodities ci . Then, we can solve for pm→s (ai , ci ) such that pm→s (ai , ci ) · γs→c (ai )pm→s (ai , ci ) · γc→d pm→s (ai , ci ) = pm→d (ai , ci ), and set ps→c (ai , ci ) and pc→d (ai , ci ) accordingly. Future work can relax the assumptions in this process as more information becomes available about how age and comorbidity impact the progression between disease states. Estimating mortality from age and comorbidities We require a model of pm→d (ai , ci ), however existing data sources only specify pm→d (ai ) and pm→d (ci ). To infer the joint distribution, we assume a linear (logistic) interaction between age bracket, diabetes status, and hypertension status. Specifically, we assume pm→d (ai , ci ) = σ βage (ai ) + βdiabetes 1 [diabetes ∈ ci ] + βhypertension 1 [hypertension ∈ ci ] , 3
You can also read