Modeling age-specific incidence of colon cancer via niche competition

Page created by Claude Moss
 
CONTINUE READING
Modeling age-specific incidence of colon cancer via niche
                                           competition
                                           Steffen Lange1,2* , Richard Mogwitz2 , Denis Hünniger1,2 , Anja Voß-Böhme1,2

                                           1 DataMedAssist, HTW Dresden, 01069 Dresden, Germany
                                           2 Faculty of Informatics/Mathematics, HTW Dresden - University of Applied Sciences, 01069 Dresden
arXiv:2109.06031v1 [q-bio.TO] 4 Sep 2021

                                           * steffen.lange@tu-dresden.de

                                           Abstract
                                           Cancer development is a multistep process often starting with a single cell in which a number of
                                           epigenetic and genetic alterations have accumulated thus transforming it into a tumor cell. The progeny
                                           of such a single benign tumor cell expands in the tissue and can at some point progress to malignant
                                           tumor cells until a detectable tumor is formed. The dynamics from the early phase of a single cell to a
                                           detectable tumor with billions of tumor cells are complex and still not fully resolved, not even for the
                                           well-known prototype of multistage carcinogenesis, the adenoma-adenocarcinoma sequence of colorectal
                                           cancer. Mathematical models of such carcinogenesis are frequently tested and calibrated based on
                                           reported age-specific incidence rates of cancer, but they usually require calibration of four or more
                                           parameters due to the wide range of processes these models aim to reflect. We present a cell-based
                                           model, which focuses on the competition between wild-type and tumor cells in colonic crypts, with
                                           which we are able reproduce epidemilogical incidence rates of colon cancer. Additionally, the fraction
                                           of cancerous tumors with precancerous lesions predicted by the model agrees with clinical estimates.
                                           The match between model and reported data suggests that the fate of tumor development is dominated
                                           by the early phase of tumor growth and progression long before a tumor becomes detectable. Due to
                                           the focus on the early phase of tumor development, the model has only a single fit parameter, the
                                           replacement rate of stem cells in the crypt. We find this rate to be consistent with recent experimental
                                           estimates.

                                           Introduction
                                           Cancer development is a multistep process [1,2] often originating from a single mutated cell [3]. Potential
                                           tumor progenitor cells in the tissue accumulate sequentially epigenetic and genetic alterations, which
                                           transform them into tumor cells [4]. Initially, these tumor cells can be benign [5, 6], meaning that they
                                           do not possess a proliferative fitness advantage, and consequently compete with the original wild-type
                                           cells within normal tissue homeostasis [7]. When the first tumor cell acquires a sufficient number of
                                           alterations and progresses to a malignant type, i.e., gains a considerable proliferative advantage to the
                                           original wild-type cells, a cancer develops [8]: The progeny of the malignant tumor cell spreads via
                                           clonal expansion until a sufficiently large cell population is reached to be clinically detectable [9]. A
                                           well-known prototype of such a multistage carcinogenesis is the adenoma-adenocarcinoma sequence of
                                           colorectal cancer [10, 11], whose intra-tumor heterogeneity suggests that this cancer particularly arises
                                           as a single expansion event [12].
                                               Besides this general framework of carcinogenesis, the exact processes by which a tumor develops
                                           in the early phase are not known as a tumor is usually only detected after it consists of billions
                                           of cells. Mathematical models have been extensively used to elucidate fundamental mechanisms
                                           of cancer development and progression on the basis of biological data. One frequently employed
                                           interface to link these dynamical models to real-world observations are age-specific cancer incidences
                                           from cancer registries. Starting with the multistage Armitage-Doll model [13–15], suggesting that

                                                                                                                                                 1/14
cancer generation is governed by a sequence of rate-limiting events, and multistage clonal expansion
model (MSCE) [15–21], which are based on the initiation-promotion-malignant conversion paradigm in
carcinogenesis, multitype branching process models [22–24], frailty models [25] as well as other stochastic
or regression models [26–30] have been applied to age-specific incidences of various types of cancer. This
includes colon [2,17,22–24,29] and colorectal cancer [16,18,22,30], pancreatic cancer [16,18,21,26], gastric
cancer [18, 30], esophageal adenocarcinomas (EAC) [18], oral squamous cell carcinomas (OSCCs) [20],
prostate cancer [29], gonadal germ cell cancer (most common form of testicular cancer) [28], lung
cancer [27], kidney cancer [26], thyroid cancer [19] as well as Hodgkin lymphoma (HL) [25]. Parameters
of these models are determined by fitting the models hazard function to the age-specific incidences.
A match of the fit and the data is considered to support the validity of the corresponding model.
Furthermore, the fitted parameters quantify kinetics of malignant progression or clonal expansion
before clinical detection, such as tumor growth rates, extinction probabilities, sojourn or dwell times as
well as stochastic transition probabilities between cell types. However, the listed models have at least
two [30] but usually four [17,18,20,27–30] or more [2,16,24,25] parameters and thus there may be issues
of identifiability of the parameters from the data for many of these models [21]. The amount of fit
parameters results from the wide range of mechanisms these models aim to reflect, often incorporating
both the early phase of tumor initiation and the early-to-late phase of clonal expansion. At the same
time the spatial structure of the tissue the tumor originates from is often neglected, although tissue
architecture is known to be relevant for tumor evolution [31].
    Our goal is to demonstrate that the competition between wild-type and tumor cells in niches in
the early phase of tumor initiation may be the dominant mechanism governing tumor development.
For this, we develop a cell-based stochastic model which represents the pretumor competition between
wild-type and tumor stem cells in colonic crypts and thus both focuses on the early-phase of tumor
initiation as well as explicitely incorporates the spatial structure of the tissue. The model reproduces
the age-specific incidences of colon cancer from the Surveillance Epidemiology and End Results (SEER)
database quantitatively, see Fig. 1. The model predictions match the epidemilogical data not only in
the regime of older ages but also down to young ages, where incidences are several orders of magnitude
smaller. This match supports the recently proposed notion that the fate of tumor development may be
determined in the early phase of tumor development long before a tumor becomes detectable [33]. Since
the model explicitely distinguishes between (i) adenocarcinoma and (ii) adenoma, which progressed
to adenocarcinoma, we additionally predict the fraction of incidences corresponding to either type.
These fractions are in agreement with the common clinical estimates that more than 95% of colon
adenocarcinomas arise from colonic polyps and that the fraction of benign tumor is over 99% [34]. In
particular, we quantify how the fraction of incidences resulting from progressed adenoma increases with
age, which supports the relevance of colorectal cancer screening at older age. We emphasize that all
parameters of the model correspond directly to known physiological parameters and are consequently
set by previously reported values. Only the stem cell replacement rate within a crypt, whose value
varies over several orders of magnitude in the literature, is used as a single fit parameter. The fitted
replacement rate is consistent with a recent experiment concluding a much slower stem cell turnover
than previously reported.

Materials and Methods
Cancer screening data
The Surveillance Epidemiology and End Results (SEER) research database comprises cancer incidences
and at-risk population data in the US [32]. We consider age-specific incidence rates of colon cancer as
reported in the most recent SEER dataset 2013-2017.
   For the age-specific incidence rates in the SEER report the number of diagnosed cancers, as a
function of age, is compiled and then divided by the total population at risk, as a function of age. Note
that in the SEER registry, only detected cancers will be recorded and thus the database may both
miss cancer incidences at a given age or assign a later age to an incidence. The latter may cause lower
rates at young ages and higher rates at older ages. For colon cancer in particular, it is reasonable to

                                                                                                      2/14
Colon cancer
                                                 Model prediction
                                    102          Tunneling Progression
                                                 Sequential Progression
age-specific incidence [10 5 y 1]

                                                 Parameter range
                                                 SEER 2013-2017
                                    101

                                    100                                        200

                                                                               100

                                    10   1                                        0
                                                                                      0                 40               80
                                             0   10      20         30        40      50           60        70         80        90
                                                                                age [y]
                              Fig 1. Niche competition in colonic crypts models age-specific incidence rates of colon
                              cancer. Age-specific incidence rates of colon cancer predicted by the model (green line) match with
                              epidemilogical data (SEER database [32], red dots, error bars correspond to 95% confidence interval
                              assuming Poisson distribution and are often smaller than symbol size) over several orders of magnitude
                              (see inset for linear y-scale). For the model prediction the parameters niche size N = 7, number of
                              crypts K = 1.5 · 107 , mutation probabilities u, v = 4.4 · 10−6 , and probability of adenoma progressing
                              to adenocarcinoma γ = 5% are taken from the literature, see expected values in Tab. 1, while the time
                              scale set by the effective replacement rate λ is calibrated based on the epidemilogical data. The
                              effective replacement rate λ = 0.12 ± 0.004 y−1 crypt−1 is consistent with recently reported
                              replacement rates, i.e., smaller but on the same order of magnitude, see Tab. 1. Decomposition of
                              incidence rates predicted by the model into incidences with and without precancerous lesions
                              (sequential and tunneling progression, orange dashed and dotted blue line respectively) confirms
                              different origins of cancer at young and older ages. Sensitivity on parameters is illustrated by variation
                              of the parameters within the reported ranges (gray lines), see Tab. 1.

                               assume that incidence rates from more current reports are closer to the real rates due to improved
                               early diagnostics and increasingly used cancer screening. Raw data from cancer registries is sometimes
                               preprocessed and filterted including adjustments for birth cohort and time period effects, for instance
                               via age-period-cohort (APC) epidemiological models [15, 20, 24], to refine the reported age-specific
                               cancer incidences. In particular for colon cancer, such refinements are rarely invoked [2, 22, 23, 29].
                               Colon cancer is overall a relatively minor cause of death and has a low lifetime incidence, thus it is
                               reasonable to directly use the age-specific incidences reported in SEER.

                                                                                                                                   3/14
Pretumor progression model based on niche competition
In order to test the hypothesis, that the fate of tumor development is determined by niche competition in
the early phase of tumor development, we predict the incidence rates of colon cancer by a mathematical
model based on the tumor initiation of the adenoma-adenocarcinoma sequence. The tumor-originating
cells of colon cancer are known to be multipotential stem cells located in a niche at the base of
intestinal crypts, also called intestinal glands, which are finger-like invaginations on the surface
of the intestine [11, 12, 35]. On average such a stem cell divides asymmetrically, resulting in one
daughter cell which remains as stem cell in the niche and another daughter cell, which is committed
to differentiation and leaves the crypt, such that the number of stem cells in the niche is always
roughly conserved. However, the progeny of one stem cell can effectively replace the progeny of another
whenever symmetric divisions occur, that is one stem cell divides into two stem cells while another
divides into two differentiated cells, which leave the crpyt. During division stem cells can mutate and
thus gradually accumulate alterations which turn them into tumor cells. Tumor cells compete with
the original stem cells via replacement in the niche and thus may at some point dominate the entire
niche (niche succession or clonal conversion). The so fixated tumor cells may then grow across the
intestinal epithelium by crypt fission and clonal expansion of tumor cells from the converted crypt
and consequently a tumor develops. If tumor cells are benign at that point, a premalignant lesion
(adenoma) develops, which may advance to cancer (adenocarcinoma) given a tumor cell progresses to a
malignant type (sequential progression). If tumor cells are already malignant during fixation in the
niche, an adenocarcinoma grows directly (tunneling progression).
    We emulate this multistep process using a cell-based Moran model representing the competition
between wild-type stem cells and tumor cells in the niche [33, 36], see Fig. 2 for illustration and SI
Text A for details: We consider a fixed number N of cells in the niche, all of which are initially
wild-type stem cells. During proliferation a wild-type cell can mutate with probability u to a benign
tumor cell, which in turn can mutate with probability v to a malignant tumor cell. Wild-type cells
and benign tumor cells compete by replacing each other in the niche with rate λ. Eventually, via this
competition, existing benign tumor cells either go extinct or they replace every wild-type cell in the
niche. In the latter case, the tumor cells are considered to spread and develop a clinically detectable
adenoma. From the Moran model corresponding to these dynamics, we obtain the adenoma probability
Pa (t) that an adenoma has developed from the crypt until age t, see Eq. (S5). Alternatively, during
competition in the niche, any benign tumor cell can acquire malignancy. Due to the proliferative fitness
advantage thus acquired, a malignant tumor cell rapidly dominates the niche and consequently an
adenocarcinoma develops directly. From the Moran model, we obtain the carcinoma probability Pc (t)
that an adenocarcinoma has developed from the crypt until age t, see Eq. (S6).
    Thus, there are two possible origins of colon cancer in the model: (i) Firstly, tunneling progression,
which means an adenocarcinoma develops directly from a single niche with probability Pc (t). (ii)
Secondly, an adenoma develops in a single niche with probability Pa (t) and progresses with probability
γ to an adenocarcinoma during clonal expansion. Therefore, the probability that cancer has developed
from the crypt until age t via this sequential progression is γPa (t). Consequently, the combined
probability that an adenocarcinoma develops from a single crypt until age t, either with benign
precursor stage or directly, is Pc (t) + γPa (t). Inversely, the probability that a crypt has not given rise
to a cancerous tumor until age t is 1 − Pc (t) − γPa (t). Taking into account that there exist K crypts
in the colon and a single one suffices as origin of a tumor, the probability S(t) that a human has not
contracted colon cancer until age t is

                             S(t) = (1 − Pc (t) − γPa (t))K ,      S(0) = 1 .                           (1)

From this survival function S(t) the models age-specific incidence rate R(t), or hazard function, is
computed
                                       d
                                         S(t) = −R(t)S(t)
                                      dt                                                        (2)
                                                  d
                                         R(t) = − ln S(t)
                                                  dt

                                                                                                      4/14
cellular scale                                              tissue scale

                                                                          ...              adenom

                                                                             γ

                                                                          ...              carcinom

                        u          v               wild­type cell               sequential progression
      mutations:
                                                   benign tumor cell            tunneling progression
      fitness
      advantage: none       none       large       malignant tumor cell         crypt

Fig 2. Illustration of the dynamics of the niche-competition-based pretumor progression model.
Wild-type cells within the niche of a colonic crypt can progress to benign tumor cells, which can
further progress to malignant tumor cells, during proliferation with mutation probabilities u, v. While
wild-type and benign tumor cells neutrally compete by replacing each other in the niche with rate λ,
malignant tumor cells rapidly dominate the niche. We assume that once no more wild-type cells are
present in the niche (fixation or clonal inversion) tumor cells establish within the tissue via clonal
expansion. Consequently, there are two possible progression pathways: (i) The niche consists of
malignant tumor cells at fixation and an adenocarcinoma grows from the niche (tunneling progression).
(ii) The niche consists of benign tumor cells at fixation and the resulting tumor progresses with
probability γ at some point during growth to an adenocarcinoma (sequential progression, precancerous
lesion).

The six parameters N , K, u, v, λ and γ of the model are all directly measurable, independently of the
emerging age-specific incidence rates, and their values have already been determined or estimated in
previous experiments, see Tab. 1. Note that a simpler version of the model has already been used to
estimate the competition range for several types of solid tumors from the fraction of observed benign
tumors [33] and to evaluate the spontaneous tumor regression in pilocytic astrocytoma [36].
    The model employs some reasonable simplifications: While colon cancer increases malignancy
gradually in seven steps [37], we only regard the last benign alteration and the first malignant alteration.
In particular for colon cancer, more than two steps or more than one type of genomic instability
are not expected to enhance the agreement between a progression model and the epidemiological
data [2]. Furthermore, we do not assume a particular spatial arrangement of the stem cells in the niche,
but rather use two limiting cases of either space-free (all-to-all competition) or one-dimensional cell
arrangement (competition only with two neighbors). Since the models probabilities Pa (t) and Pc (t)
depend monotonously on the number of cells which can compete with each other [38], the incidence
rates for any cell arrangement should lie in between these two limiting cases. However, for the small
number N of stem cells in a colonic crypt these two limiting cases yield almost matching results.
Finally, the model neglects the growth kinetics after tumor cells fixate in a niche. The details of
this growth may be very complex and the length of the adenoma-adenocarcinoma interval depends
on size, morphology and pathological type of the adenoma [39]. However, these kinetics have been
quantified by models with clonal expansion and the growth time between fixation of tumor cells and a
detectable, macroscopic tumor is estimated to be much smaller than the time until fixation [18]. Thus,

                                                                                                         5/14
Parameter                                                        previously reported range
 niche size                          N       7                     5 . . . 7 . . . 15 [40, 41]
 number of crypts                    K      10−6                   1 . . . 1.5 . . . 2 [42, 43]
 mutation probabilities             u, v     10                    1.75 . . . 4.4 . . . 7.13 [40]
 fraction of adenomas progress-      γ      [%]                     1.5 . . . 5 . . . 9.4 [44, 45]
 ing to carcinomas
                                                y−1 crypt−1
                                                             
 stem cell replacement rate          λ                            0.65 . . . 1.3 . . . 2.0 [40, 46];
                                                                      116 − 204 [41, 47]

Table 1. Table of previously reported ranges of the parameters used in the niche model. The
expected values (N = 7, K = 1.5 · 107 , u = v = 4.4 · 10−6 , γ = 5%) are used for the model prediction,
see green line in Fig. 1. The upper and lower boundaries are used for the parameter variation, see gray
lines in Fig. 1, by which the effect of the parameter range on the prediction of the model is estimated.

we assume that incidence rates are dominated by the dynamics until fixation. We take the growth time
implicitely into account by using the replacement rate λ in the model as an effective parameter, which
is smaller than the actual replacement rate to reflect the additional time between fixation and diagnosis.
This smaller effective replacement rate also allows to compensate for the simplified competition after
occurrene of a single malignant tumor cell within the niche. This is again reasonable due to the small
number N of cells a malignant cell has to outcompete in a colonic crypt. Finally, note that the model,
while conceptually very different, produces age-specific incidence rates with features typically known
from multistage clonal expansion models [16, 17, 21], that is incidence rates exhibit a power law at
young age and transition with increasing age to an exponential increase, followed by a linear increase
until asymptotically becoming constant, see SI Text A for details.

Results
We apply our pretumor progression model to age-specific incidence rates of colon cancer, see Fig. 1. We
set the models parameters as reported for the colon in the literature, see expected values in Tab. 1, and
calibrated the effective replacement rate λ by the epidemiological data, see SI Text B. The resulting
prediction of the model matches the age-specific incidence rate of the most recent SEER data not only
in the regime of older ages, see rates on linear scale in inset of Fig. 1, but also down to young ages,
where incidences are more than two orders of magnitude smaller, see rates on logarithmic scale in the
main panel of Fig. 1.
     For the effective replacement rate λ, we obtain λ = 0.12 ± 0.004 y−1 crypt−1 . This value is smaller
than the replacement rates reported in the literature, see Tab. 1, but on the same order of magnitude
as the most recent measurement of this physiological parameter [40]. This is in agreement with our
theoretical expectation, as discussed above, that the effective replacement rate λ, which reflects both
replacement in the niche as well as the growth time of the tumor, has to be smaller than the actual
replacement rate, as considered in the literature, due to the neglected timescale between fixation in
the crypt and clinical detection. Note that the replacement rate reported in Ref. [40] is two orders of
magnitude smaller than the previously estimated values, which estimate it to be comparable to the cell
division rate [41, 47].
     Furthermore, we estimate from the model the life-time fraction of adenocarcinoma, which arise from
colonic crypts, as γPa (T )/(Pc (T ) + γPa (T )) = 95.22% (T = 85 y) and the life-time fraction of benign
tumor as Pa (T )/(Pc (T ) + Pa (T )) = 99.75% in agreement with the corresponding clinical estimates of
> 95% and 99% [34].
     In the main panel of Fig. 1, the incidence rates from the model are additionally decomposed into
the contribution from (i) tunneling progression without precancerous lesions −d/dt ln(1 − Pc (t))K and
(ii) sequential progression −d/dt ln(1 − γPa (t))K . The incidences at younger age are dominated by
the former type, while they result almost solely from the latter one for ages beyond 40. Furthermore,
the incidence rates from the tunneling progression exhibit a linear slope ∼ t while the sequential

                                                                                                       6/14
progression leads to a much steeper course, which resembles a Weibull distribution. These features of
the models dynamics are consistent with the fact that early onset gastrointestinal cancers commonly
arise without precancerous lesions while adenomatous lesions are more common in older patients and
that carcinogenesis accelerates after the age of 40 [30, 48, 49]. Note that the probability γPa (t) should
be understood as an upper limit for sequential progression from a single crypt as some tumors may turn
malignant during tumor growth or go extinct before the adenoma becomes detectable. Consequently,
the probability Pc (t) is a lower limit of the fraction of tunneling progression. This implies that the age
at which the contributions of both progressions match Pc (t) = γPa (t) is underestimated by the model
in Fig. 1.
    The sensitivity of the model prediction on the parameters taken from the literature is additionally
checked. The sensitivity is illustrated in Fig. 1 by additional grey lines showing the forecast of the model
for variations of the parameters. These variations cover for each parameter the lower and upper limit as
well as the expected value as reported in the literature, see Tab. 1. The ensemble of model predictions
covers a range of effective replacement rates 0.16 ± 0.1 y−1 crypt−1 as well as life-time fractions of
adenocarcinoma, which arise from colonic crypts 87 ± 17% and life-time fraction of benign tumor
99±2%. Within their range all parameters only modify the models predictions slightly with exception of
the niche size N . Choosing any of the parameters K, v, u, or γ and fixing it either to its lower or upper
limit while varying all other parameters, leads to an ensemble of models predictions whose average
effective replacement rate and life-time fraction of benign tumor is virtually the same in either case. In
contrast, choosing either N = 5, 7, or 10, the variation of all other parameters leads to an ensemble of
model predictions with very different effective replacement rates, i.e. λN =5 = 0.06 ± 0.02 y−1 crypt−1 ,
λN =7 = 0.14 ± 0.04 y−1 crypt−1 , or λN =10 = 0.28 ± 0.09 y−1 crypt−1 , respectively. The fact that the
niche size N has such an impact on the predictions but not the number of crypts K, which also affects
the total number of stem cells, highlights the crucial role of the competition in the niche, which is only
affected by N but not K. Also the standard deviation of the fitted effective replacement rate is on
average smaller for N = 5 or 7 than for N = 10, which means the model fits better for the average or
lower limit of the niche size, supporting the notion that the size of the niche is smaller than previously
estimated [40].
    The model is additionally applied to rectal cancer, whose tumor-originating cells are also compart-
mentalized into niches, see Fig. 3. For rectal cancer, it is reasonable to assume the same parameters
as for the colon, see Tab. 1, except for the number of crypts K. Since the rectum is on average one
order of magnitude shorter than the colon, the number of niches K is assumed to be ten times smaller
K = 106 − 2 · 106 . The resulting prediction of the model agrees well with the epidemilogical incidence
rates, see Fig. 3, and the obtained effective replacement rate λ = 0.16 ± 0.02 y−1 crypt−1 is consistent
both with the range reported for the colon, see Tab. 1, as well as the value obtained from the model
for the colon, see Fig. 1.

Discussion
We employed a Moran model representing the pretumor competition between wild-type and tumor stem
cells in colonic crypts to quantitatively reproduce age-specific incidences of colon cancer. Additionally,
the model predicts the fraction of incidences corresponding to either (i) adenocarcinoma or (ii) benign
polyps, which progressed to adenocarcinoma, and we find these fractions in agreement with common
clinical estimates. Furthermore, the age-dependency of these fractions is consistent with the occurrence
of precancerous lesions in different age regimes. In particular, this highlights the importance of cancer
screening after the age of 40. Note that all growth and clonal expansion processes after the fixation of
tumor cells in the tumor originating niche are not explicitely incorporated into the model, but only
implicitely contained as a time delay between fixation and detection via an effective replacement rate.
Thus, the agreement between model and epidemiological data supports the notion that the fate of
tumor development may be primarily determined in the early phase of tumor development.
    In contrast to most previous pretumor models, which usually introduce four or more parameters
that are calibrated using the epidemiological age-specific cancer incidences, both for colon [2, 17, 24, 29]

                                                                                                      7/14
Rectal cancer
                                                 Model prediction
                                                 Tunneling Progression
                                                 Sequential Progression
age-specific incidence [10 5 y 1]

                                    101          Parameter range
                                                 SEER 2013-2017

                                    100                                          60
                                                                                 40
                                                                                 20
                                    10   1
                                                                                   0
                                                                                       0                 40               80
                                             0   10        20        30        40      50           60        70         80        90
                                                                                 age [y]
                               Fig 3. Niche competition models age-specific incidence rates of rectal cancer. Analogous
                               to colon cancer in Fig. 1, age-specific incidence rates of rectal cancer predicted by the model agree
                               with epidemilogical data (SEER database [32]). Same parameters as for the the colon are assumed, see
                               Tab. 1, except for the number of crypts K which is assumed to be ten times smaller K = 106 − 2 · 106
                               reflecting that the rectum has on average a tenth of the length of the colon. The effective replacement
                               rate λ = 0.16 ± 0.02 y−1 crypt−1 is consistent with the replacement rate of the colon.

                                or other types of cancer [16, 18, 20, 25, 27, 28, 30], our model is solely based on directly measurable
                                parameters, which are already known from clinical and biological studies on the colon independent
                                of the epidemiological incidence rates. Consequently, the model only requires a single fit parameter,
                                the effective replacement rate, which both sets the time scale of the niche-competition as well as
                                compensates for the neglected time-span between fixation in the niche and clinical detection. We find
                                this replacement rate in agreement with recent experiments on colonic crypts [40].
                                    Note that for colorectal cancer, two-step models have been considered before to capture the dynamics
                                of adenomatous and malignant tumor cells [2, 22–24]. Furthermore, the distribution of adenoma and
                                adenomacarcinoma sizes at different ages has been inferred from a model [24], clonal expansion dynamics
                                of colorectal cancer has been derived from intra-tumor heterogeneity [46] and even the competition
                                in colonic crypts has been discussed [22, 23]. In contrast to these models, we explicitely incorporate
                                competition of benign tumor cells with wild-type cells taking into account their spatial structure, while
                                neglecting other processes considered in these models. The match of the prediction of our model and
                                the epidemiological incidence rates emphasizes the dominant role that competition in niches plays
                                in tumor development. This dominant role is supported by previous investigations of colorectal and
                                gastric cancer using a multistage clonal expansion model, which suggest that the form of the age-specific
                                incidence rates is determined by the dynamics until the fixation of tumor cells, while the growth
                                dynamics after fixation only cause a slight time shift [18].

                                                                                                                                   8/14
While the model is inspired by colonic crypts, it is potentially relevant for tumor development in a
wide range of tissues. Firstly, the model can be applied to cancer types with similar spatial structure
of the tumor-originating cells like rectal or gastric cancer [50]. Secondly, the model may also apply to
types of cancer whose tumor-originating cells are not explicitely compartmentalized into niches. The
Moran model within the niche has already been used to estimate an upper limit for the number N
of cells competing with each other from the fraction of clinically observed benign tumors for several
types of cancer [33]. This so called competition range N is suprisingly small for a wide range of tumor
types, less than 3000 cells, even for tumors in tissue without explicit stem cell compartments like the
crypts. Note that for a large competition range N the competition between the malignant tumor cells
and the other cells in the niche may not be negligible anymore, but increase the time scale until clonal
conversion in the niche. Thus, the model has to be extented in the future by this competition to be
applicable to types of cancer with large competition ranges, such as hepatocellular carcinoma [51] or
glioblastoma [52].
    Application of the model of niche competition to cancer in tissues other than the colon is reasonable
for two more reasons: Firstly, most other solid tumors have an age-specific incidence qualitatively very
similar to colon carcinoma [29]. Secondly, incidence rates resulting from the model share typical features
with incidence rates from multistage clonal expansion models [16, 17, 21], which already achieve good
matches with epidemiological data for several types of cancer [15–21]. Note that for most tissues, much
less is known about tumor initiation and development until detection than for the colon. Applying the
model to such types of cancer may give additional insight into the hardly observable dynamics of early
cancer development.

Acknowledgments
We acknowledge support by the EU, the European Social Fund (ESF) and by tax funds on the basis of
the budget passed by the Saxon state parliament.

                                                                                                     9/14
Supporting Information (SI)
A     Moran Model of niche competition
We use a three-type Moran model with mutation to quantify the stem cell dynamics in the niche of a
single colonic crypt as described in the main manuscript. The niche consists of N cells, each of which
may be wild-type, benign or malignant. A cell in the niche is replaced by the offspring of another
cell with replacement rate λ. At replacement the offspring can additionally acquire mutations, i.e. a
wild-type cells mutates with probability u, 0 ≤ u < 1 into a benign tumor cell and a benign tumor cell
mutates with probability v, 0 ≤ v < 1 into malignant tumor cell. Formally, the model is a Markov
process (Xt )t≥0 on the state space A = {0, 1, 2, . . . , N, E}. Here, the states 0, . . . , N correspond to
the number of benign tumor cells in the niche, while the remaining cells in the niche are wild-type
stem cells, and the state E represents the presence of a single malignant tumor cell in the niche. The
dynamics is determined by a rate matrix Q := (q(k, l))k,l∈A ∈ RA×A . For the space-free limit, i.e.,
when every cell is able to replace any other cell in the niche, the rate matrix Qsf is given by
           
                             λ k(N − k)
             q(k, k − 1) =                (1 − u)                                ,1 ≤ k ≤ N − 1
           
           
                            N    N −1
           
           
           
                                                                            
                             λ (N − k)(N − k − 1)            k(N − k)
           
           
           
           
            q(k, k + 1) =                              u+            (1 − v)    ,0 ≤ k ≤ N − 1
                            N             N − 1               N −1
           
           
           
           
           
                               N +1
    Qsf =                      X                                                                      . (S1)
           
            q(k, k)     =   −      q(k, i)                                           ,0 ≤ k ≤ E
           
           
           
                              i=0
                             λ
           
           
                                                                                 ,1 ≤ k ≤ N − 1
           
           q(k, E)
                        = kv
           
           
           
                           N
             q(k, l)     =0                                                                    , else
           

The factors containing N and k result from combinatorics [33,36,53]. The other geometric limiting case
is the 1D model in which cells are considered to be ordered on a ring, i.e. each cell can only replace
one of its two neighbors. Analogously, the rate matrix Q1D for this case is given by [33, 53]
                                             λ
                           
                           
                            q(0, 1)      = Nu                                   ,
                           
                           
                           
                                            N
                           
                                            λ
                           
                            q(k, E)      = kv                  ,1 ≤ k ≤ N − 1
                                             N
                           
                           
                           
                           
                           
                                            λ
                           q(k, k + 1) = (1 − v) + O(u) , 1 ≤ k ≤ N − 1
                           
                           
                                             N
                     Q1D =                                                         ,              (S2)
                                             λ
                             q(k, k  − 1) =     (1 − u)         , 1 ≤ k ≤ N   −  1
                           
                           
                           
                           
                           
                                            N
                                                N +1
                           
                           
                           
                                                X
                           q(k, k)
                           
                           
                                         =  −       q(k, i)         ,0 ≤ k ≤ E
                           
                                                i=0
                           
                           
                             q(k, l)      =0                                , else
                           

where, due to u  1, we neglect the term O(u), which describes an additional benign mutation of
a wild type cell. This reasonable approximation implies that benign mutants only occur on a single
connected interval of the ring of cells. A real crypt will have some geometric restrictions that lie in
between these to limiting cases Q1D and Qsf .
     For either rate matrix Q, the dynamics of the niche, i.e. the distribution p = (p0 , p1 , . . . , pN −1 , pN , pE ) ∈
[0, 1]A on the state space A at age t which evolved from an initial state p(0) at age 0, is given by
                                                  d
                                                     p(t) = p(t)Q                                                (S3)
                                                  dt
which yields the solution
                                               p(t) = p(0) exp(Qt),                                              (S4)

                                                                                                               10/14
for t ≥ 0, with the transition matrix exp(Qt) given by a matrixexponential function. The initial state
is set to p(0) := (1, 0, . . . , 0), i.e. the niche starts at birth with only wild-type cells in the crypt. The
probability Pa (t) that the complete niche consists of benign tumor cells until age t is
                                          Pa (t) = (p(0) exp(Qt))N                                        (S5)
and the probability Pc (t) that the first malignant cells emerges until age t is
                                         Pc (t) = (p(0) exp(Qt))E .                                       (S6)
    We further list some analytic and empiric properties of the model, which demonstrate that the
incidence rates generated by the model display features typical for multistage clonal expansion models,
whose incidence rates transition (from young to old age) from power-law to exponential to linear
to constant asymptote [16, 17, 21]: Previously, an analytic expression for the asymptotic absorbtion
probabilities lim Pa (t) and lim Pc (t) into state N and state E, respectively, has been derived [33, 36].
              t→∞             t→∞
These asymptotic probabilities are predominantly determined by the relation of the niche size√N and
the√probability v to progress from a benign tumor cell to a malignant one. A small niche N √v  1
(N √3
      v  1 for 1D model) is primarily absorbed in the benign state, whereas a large niche N v  1
(N 3 v  1 for 1D model) is primarily absorbed in the malignant state E, while intermediate niche
sizes lead to finite probabilities for both states. This property has already been exploited to estimate
upper limits for the effective niche sizes of several types of cancer from their recored benign tumor
fraction [33]. Furthermore, the probability S̃(t) = (1 − Pc (t) − Pa (t)) to not be in either state N or E
until time t decreases exponentially S̃(t  1) ∼ exp(αt) for sufficiently large times t, where the rate
α < 0 is the largest eigenvalue of the rate matrix that results from limiting the rate matrix Q only to
the non-absorbing states {0, 1, 2, . . . , N − 1}. Since the probability S̃(t) is qualitatively very similar
to the survival probability S(t) in Eq. (1), one may infer that the age-specific incidence rates R(t)
generated by the model according to Eq. (2) should become constant asymptotically R(t  1) ≈ |α|K.
Moreover, the probability of tunneling progression Pc (t), whose contribution is maximal at young ages,
follows a power law Pc (t) ∼ t. This is in accordance with the argument of Armitage and Doll, that in
a pure multistage model the incidence rates follow a power law R(t) ∼ tβ−1 , with the number β of
mutations necessary for a cell to become malignant [13, 14], which is β = 2 in our model. In contrast,
we observe numerically that the probability of sequential progression      Pa (t) matches the cumulative
distribution function of a Weibull distribution Pa (t) = 1 − exp −atb , which implies for barely changing
survival probabilities, see Eq. (2),
                                                        d
                                         d                 S(t)                d
                              R(t) = −      ln S(t) = − dt          ≈      −      S(t)                    (S7)
                                         dt              S(t)     S(t)≈1       dt
that the contribution of the sequential progression to the incidence rates follows a Weibull distribution.
Note that it is common for models of cancer development that the models incidence rates (or hazard
function) effectively represent a Weibull distribution [22, 25–29].

B    Fit of replacement rate
We determine the least known parameter, the replacement rate λ, by fitting the model to the epidemio-
logical age-specific incidence rates RSEER (t). First, we compute for given set of parameters N, u, v, K,
and γ from the literature the survival probability S(t̃) of the model in dimensionless time t̃ = t/λ, using
Eq. (1), Eq. (S5), and Eq. (S6) up until a sufficient long time t̃max such that the probability 1 − S(t̃max )
is larger than the lifetime risk of colon cancer (≥ 4.8% [32]) to ensure that the corresponding incidence
rates R(t̃), see Eq. (2), cover the range of ages of the epidemiological data. Since the epidemiological
rates RSEER (t) are monotonously increasing with age t, we can identify each data point (ti , RSEER (ti ))
to a simulated point (t̃, R(t̃)) with RSEER (ti ) = R(t̃). Thus, each data point i gives an estimate of the
replacementp rate λi = t̃/ti and we fit the replacement rate as their mean λ =< λi >. The standard
deviation < (λi − < λi >)2 > of the replacement rates is used as a measure of the quality of a single
fit. For the variation of the parameters N, K, u, v, γ the effective replacement rate is reported as average
and standard deviation of the ensemble of fits.

                                                                                                        11/14
References
  1. Vogelstein B, Kinzler KW. The multistep nature of cancer. Trends in Genetics. 1993;9(4):138–141.
     doi:10.1016/0168-9525(93)90209-Z.
  2. Little MP, Vineis P, Li G. A stochastic carcinogenesis model incorporating multiple types
     of genomic instability fitted to colon cancer data. J Theor Biol. 2008;254(2):229–238.
     doi:10.1016/j.jtbi.2008.05.027.
  3. Fearon ER, Hamilton, Vogelstein B. Clonal analysis of human colorectal tumors. Science.
     1987;238(4824):193–197. doi:10.1126/science.2889267.
  4. Reya T, Morrison SJ, Clarke MF, Weissman IL. Stem cells, cancer, and cancer stem cells. Nature.
     2001;414(6859):105–111. doi:10.1038/35102167.
  5. Williams MJ, Werner B, Barnes CP, Graham TA, Sottoriva A. Identification of neutral tumor
     evolution across cancer types. Nature Genetics. 2016;48(3):238–244. doi:10.1038/ng.3489.
  6. Wu CI, Wang HY, Ling S, Lu X. The Ecology and Evolution of Cancer: The Ultra-
     Microevolutionary Process. Annu Rev Genet. 2016;50(1):347–369. doi:10.1146/annurev-genet-
     112414-054842.
  7. Amoyel M, Bach EA. Cell competition: how to eliminate your neighbours. Development.
     2014;141(5):988–1000. doi:10.1242/dev.079129.
  8. Lodish H, Berk A, Zipursky SL, Matsudaira P, Baltimore D, Darnell J. Tumor Cells and the
     Onset of Cancer. Molecular Cell Biology 4th edition. 2000;.
  9. Kuang Y, Nagy JD, Eikenberry SE. Introduction to Mathematical Oncology. CRC Press; 2016.
 10. Fox JG, Wang TC. Inflammation, atrophy, and gastric cancer. J Clin Invest. 2007;117(1):60–69.
     doi:10.1172/JCI30111.

 11. Zeuner A, Todaro M, Stassi G, De Maria R. Colorectal Cancer Stem Cells: From the Crypt to
     the Clinic. Cell Stem Cell. 2014;15(6):692–705. doi:10.1016/j.stem.2014.11.012.
 12. Sottoriva A, Kang H, Ma Z, Graham TA, Salomon MP, Zhao J, et al. A Big Bang model of
     human colorectal tumor growth. Nature Genetics. 2015;47(3):209–216. doi:10.1038/ng.3214.
 13. Armitage P, Doll R. The Age Distribution of Cancer and a Multi-stage Theory of Carcinogenesis.
     Br J Cancer. 1954;8(1):1–12. doi:10.1038/bjc.1954.1.
 14. Armitage P, Doll R. The age distribution of cancer and a multi-stage theory of carcinogenesis.
     Br J Cancer. 2004;91(12):1983–1989. doi:10.1038/sj.bjc.6602297.
 15. Moolgavkar SH, Meza R, Turim J. Pleural and peritoneal mesotheliomas in SEER: age effects and
     temporal trends, 1973–2005. Cancer Causes Control. 2009;20(6):935–944. doi:10.1007/s10552-
     009-9328-9.
 16. Meza R, Jeon J, Moolgavkar SH, Luebeck EG. Age-specific incidence of cancer: Phases,
     transitions, and biological implications. Proc Natl Acad Sci USA. 2008;105(42):16284–16289.
     doi:10.1073/pnas.0801151105.

 17. Meza R, Jeon J, Renehan AG, Luebeck EG. Colorectal Cancer Incidence Trends in the United
     States and United Kingdom: Evidence of Right- to Left-Sided Biological Gradients with Implica-
     tions for Screening. Cancer Res. 2010;70(13):5419–5429. doi:10.1158/0008-5472.CAN-09-4417.
 18. Luebeck EG, Curtius K, Jeon J, Hazelton WD. Impact of Tumor Progression on Cancer Incidence
     Curves. Cancer Res. 2013;73(3):1086–1096. doi:10.1158/0008-5472.CAN-12-2198.

                                                                                              12/14
19. Meza R, Chang JT. Multistage carcinogenesis and the incidence of thyroid cancer in the US by sex,
    race, stage and histology. BMC Public Health. 2015;15(1):789. doi:10.1186/s12889-015-2108-4.
20. Brouwer AF, Eisenberg MC, Meza R. Age Effects and Temporal Trends in HPV-Related
    and HPV-Unrelated Oral Cancer in the United States: A Multistage Carcinogenesis Modeling
    Analysis. PLoS ONE. 2016;11(3):e0151098. doi:10.1371/journal.pone.0151098.
21. Brouwer AF, Meza R, Eisenberg MC. Parameter estimation for multistage clonal expansion
    models from cancer incidence data: A practical identifiability analysis. PLoS Comput Biol.
    2017;13(3):e1005431. doi:10.1371/journal.pcbi.1005431.
22. Calabrese P, Tavaré S, Shibata D. Pretumor Progression: Clonal Evolution of Human Stem Cell
    Populations. Am J Pathol. 2004;164(4):1337–1346. doi:10.1016/S0002-9440(10)63220-8.
23. Kim KM, Calabrese P, Tavaré S, Shibata D. Enhanced Stem Cell Survival in Familial Adenoma-
    tous Polyposis. Am J Pathol. 2004;164(4):1369–1377. doi:10.1016/S0002-9440(10)63223-3.
24. Lang BM, Kuipers J, Misselwitz B, Beerenwinkel N. Predicting colorectal cancer risk from ade-
    noma detection via a two-type branching process model. PLoS Comput Biol. 2020;16(2):e1007552.
    doi:10.1371/journal.pcbi.1007552.
25. Grotmol T, Bray F, Holte H, Haugen M, Kunz L, Tretli S, et al. Frailty Modeling of the
    Bimodal Age–Incidence of Hodgkin Lymphoma in the Nordic Countries. Cancer Epidem Biomar.
    2011;20(7):1350–1357. doi:10.1158/1055-9965.EPI-10-1014.
26. Mdzinarishvili T, Gleason MX, Kinarsky L, Sherman S. A Generalized Beta Model for the
    Age Distribution of Cancers: Application to Pancreatic and Kidney Cancer. Cancer Inform.
    2009;7:CIN.S3050. doi:10.4137/CIN.S3050.
27. Mdzinarishvili T, Sherman S. Weibull-like Model of Cancer Development in Aging. Cancer Inform.
    2010;9:CIN.S5460. doi:10.4137/CIN.S5460.
28. Brody JP. Age-Specific Incidence Data Indicate Four Mutations Are Required for Human
    Testicular Cancers. PLoS ONE. 2011;6(10):e25978. doi:10.1371/journal.pone.0025978.
29. Soto-Ortiz L, Brody JP. A theory of the cancer age-specific incidence data based on extreme
    value distributions. AIP Adv. 2012;2(1):011205. doi:10.1063/1.3699050.
30. Rhyu MG, Oh JH, Kim TH, Kim JS, Rhyu YA, Hong SJ. Periodic Fluctuations in the Incidence
    of Gastrointestinal Cancer. Front Oncol. 2021;11. doi:10.3389/fonc.2021.558040.
31. Noble R, Burri D, Kather JN, Beerenwinkel N. Spatial structure governs the mode of tumour
    evolution. bioRxiv. 2019; p. 586735. doi:10.1101/586735.
32. Surveillance, Epidemiology, and End Results (SEER) Program (www.seer.cancer.gov) Research
    Data (1975-2017), National Cancer Institute, DCCPS, Surveillance Research Program, released
    April 2020, based on the November 2019 submission.;.
33. Buder T, Deutsch A, Klink B, Voss-Böhme A. Patterns of Tumor Progression Predict Small and
    Tissue-Specific Tumor-Originating Niches. Front Oncol. 2019;8. doi:10.3389/fonc.2018.00668.
34. Driman DK, Riddell RH. Flat adenomas and flat carcinomas: Do you see what I see? Gastroin-
    test Endosc. 1994;40(1):106–109. doi:10.1016/S0016-5107(94)70028-1.
35. Humphries A, Wright NA. Colonic crypt organization and tumorigenesis. Nat Rev Cancer.
    2008;8(6):415–424. doi:10.1038/nrc2392.
36. Buder T, Deutsch A, Klink B, Voss-Böhme A. Model-Based Evaluation of Spontaneous
    Tumor Regression in Pilocytic Astrocytoma. PLoS Comput Biol. 2015;11(12):e1004662.
    doi:10.1371/journal.pcbi.1004662.

                                                                                              13/14
37. Vogelstein B, Kinzler KW. The Genetic Basis of Human Cancer. Subsequent edition ed.
    Mcgraw-Hill Professional; 2002.
38. Durrett R, Moseley S. Spatial Moran models I. Stochastic tunneling in the neutral case.
    Ann Appl Prob. 2015;25(1):104–115. doi:10.1214/13-AAP989.
39. Paggi S, Radaelli F, Repici A, Hassan C. Advances in the removal of diminutive colorectal polyps.
    Expert Rev Gastroenterol Hepatol. 2015;9(2):237–244. doi:10.1586/17474124.2014.950955.
40. Nicholson AM, Olpe C, Hoyle A, Thorsen AS, Rus T, Colombé M, et al. Fixation and Spread of
    Somatic Mutations in Adult Human Colonic Epithelium. Cell Stem Cell. 2018;22(6):909–918.e8.
    doi:10.1016/j.stem.2018.04.020.
41. Baker AM, Cereser B, Melton S, Fletcher AG, Rodriguez-Justo M, Tadrous PJ, et al. Quantifi-
    cation of Crypt and Stem Cell Evolution in the Normal and Neoplastic Human Colon. Cell Rep.
    2014;8(4):940–947. doi:10.1016/j.celrep.2014.07.019.
42. Tomasetti C, Vogelstein B. Variation in cancer risk among tissues can be explained by the
    number of stem cell divisions. Science. 2015;347(6217):78–81. doi:10.1126/science.1260825.
43. Hounnou G, Destrieux C, Desmé J, Bertrand P, Velut S. Anatomical study of the length of the
    human intestine. Surg Radiol Anat. 2002;24(5):290–294. doi:10.1007/s00276-002-0057-y.
44. Grahn SW, Varma MG. Factors that increase risk of colon polyps. Clin Colon Rectal Surg.
    2008;21(4):247–255. doi:10.1055/s-0028-1089939.
45. Steele SR, Johnson EK, Champagne B, Davis B, Lee S, Rivadeneira D, et al. Endoscopy and
    polyps-diagnostic and therapeutic advances in management. World Journal of Gastroenterology.
    2013;19(27):4277–4288. doi:10.3748/wjg.v19.i27.4277.
46. Siegmund KD, Marjoram P, Woo YJ, Tavaré S, Shibata D. Inferring clonal expansion and cancer
    stem cell dynamics from DNA methylation patterns in colorectal cancers. Proc Natl Acad Sci USA.
    2009;106(12):4828–4833. doi:10.1073/pnas.0810276106.
47. Lopez-Garcia C, Klein AM, Simons BD, Winton DJ. Intestinal Stem Cell Replacement Follows
    a Pattern of Neutral Drift. Science. 2010;330(6005):822–825. doi:10.1126/science.1196236.
48. Bedenne L, Faivre J, Boutron MC, Piard F, Cauvin JM, Hillon P. Adenoma-carcinoma se-
    quence or “de novo” Carcinogenesis?. A study of adenomatous remnants in a population-based
    series of large bowel cancers. Cancer. 1992;69(4):883–888. doi:https://doi.org/10.1002/1097-
    0142(19920215)69:4¡883::AID-CNCR2820690408¿3.0.CO;2-B.
49. Meining A, Morgner A, Miehlke S, Bayerdörffer E, Stolte M.                             Atro-
    phy–metaplasia–dysplasia–carcinoma sequence in the stomach: a reality or merely an
    hypothesis? Best Pract Res Clin Gastroenterol. 2001;15(6):983–998. doi:10.1053/bega.2001.0253.
50. Hayakawa Y, Fox JG, Wang TC. The Origins of Gastric Cancer From Gastric Stem
    Cells: Lessons From Mouse Models. Cell Mol Gastroenterol Hepatol. 2017;3(3):331–338.
    doi:10.1016/j.jcmgh.2017.01.013.
51. Ohgaki H, Kleihues P. The Definition of Primary and Secondary Glioblastoma. Clin Cancer Res.
    2013;19(4):764–772. doi:10.1158/1078-0432.CCR-12-3002.
52. Vijay A, Elaffandi A, Khalaf H. Hepatocellular adenoma: An update. World J Hepatol.
    2015;7(25):2603–2609. doi:10.4254/wjh.v7.i25.2603.
53. Komarova NL. Spatial Stochastic Models for Cancer Initiation and Progression. Bull Math Biol.
    2006;68(7):1573–1599. doi:10.1007/s11538-005-9046-8.

                                                                                              14/14
You can also read