Parallel computing efficiency of SWAN 40.91 - GMD

Page created by Darlene Barnett
CONTINUE READING
Parallel computing efficiency of SWAN 40.91 - GMD
Geosci. Model Dev., 14, 4241–4247, 2021
https://doi.org/10.5194/gmd-14-4241-2021
© Author(s) 2021. This work is distributed under
the Creative Commons Attribution 4.0 License.

Parallel computing efficiency of SWAN 40.91
Christo Rautenbach1,2,3,4 , Julia C. Mullarney4 , and Karin R. Bryan4
1 Institute
          for Coastal and Marine Research, Nelson Mandela University, Port Elizabeth, South Africa
2 Department  of Oceanography and Marine Research Institute, University of Cape Town, Cape Town, South Africa
3 Research and development, MetOcean (a division of the MetService), Raglan, New Zealand
4 Environmental Research Institute, University of Waikato, Hamilton, New Zealand

Correspondence: Christo Rautenbach (rautenbachchristo@gmail.com)

Received: 18 September 2020 – Discussion started: 2 December 2020
Revised: 18 May 2021 – Accepted: 10 June 2021 – Published: 6 July 2021

Abstract. Effective and accurate ocean and coastal wave          1   Introduction
predictions are necessary for engineering, safety and recre-
ational purposes. Refining predictive capabilities is increas-
                                                                 The computational efficiency of metocean (meteorology and
ingly critical to reduce the uncertainties faced with a chang-
                                                                 oceanography) modelling has been the topic of ongoing de-
ing global wave climatology. Simulating WAves in the
                                                                 liberation for decades. The applications range from long-
Nearshore (SWAN) is a widely used spectral wave modelling
                                                                 term atmospheric and ocean hindcast simulations to the
tool employed by coastal engineers and scientists, including
                                                                 fast-responding simulations related to operational forecast-
for operational wave forecasting purposes. Fore- and hind-
                                                                 ing. Long-duration simulations are usually associated with
casts can span hours to decades, and a detailed understand-
                                                                 climate-change-related research, with simulation periods of
ing of the computational efficiencies is required to design
                                                                 at least 30 years across multiple spatial and temporal reso-
optimized operational protocols and hindcast scenarios. To
                                                                 lutions needed to capture key oscillations (Babatunde et al.,
date, there exists limited knowledge on the relationship be-
                                                                 2013). Such hindcasts are frequently used by coastal and off-
tween the size of a SWAN computational domain and the
                                                                 shore engineering consultancies for purposes such as those
optimal amount of parallel computational threads/cores re-
                                                                 related to infrastructure design (Kamphuis, 2020) or environ-
quired to execute a simulation effectively. To test the scala-
                                                                 mental impact assessments (Frihy, 2001; Liu et al., 2013).
bility, a hindcast cluster of 28 computational threads/cores
                                                                    Operational (or forecasting) agencies are usually con-
(1 node) was used to determine the computation efficien-
                                                                 cerned with achieving simulation speeds that would allow
cies of a SWAN model configuration for southern Africa.
                                                                 them to accurately forewarn their stakeholders of immediate,
The model extent and resolution emulate the current op-
                                                                 imminent and upcoming metocean hazards. The main stake-
erational wave forecasting configuration developed by the
                                                                 holders are usually other governmental agencies (e.g. disas-
South African Weather Service (SAWS). We implemented
                                                                 ter response or environmental affairs departments), commer-
and compared both OpenMP and the Message Passing Inter-
                                                                 cial entities, and the public. Both atmospheric and marine
face (MPI) distributing memory architectures. Three sequen-
                                                                 forecasts share similar numerical schemes that solve the gov-
tial simulations (corresponding to typical grid cell numbers)
                                                                 erning equations and thus share a similar need in computa-
were compared to various permutations of parallel compu-
                                                                 tional efficiency. Fast simulation times are also required for
tations using the speed-up ratio, time-saving ratio and effi-
                                                                 other forecasting fields such as hydrological dam-break mod-
ciency tests. Generally, a computational node configuration
                                                                 els (e.g. Zhang et al., 2014). Significant advancement in op-
of six threads/cores produced the most effective computa-
                                                                 erational forecasting can be made by examining the way in
tional set-up based on wave hindcasts of 1-week duration.
                                                                 which the code interfaces with the computation nodes, and
The use of more than 20 threads/cores resulted in a decrease
                                                                 how results are stored during simulation.
in speed-up ratio for the smallest computation domain, owing
                                                                    Numerous operational agencies (both private and public)
to the increased sub-domain communication times for lim-
                                                                 make use of Simulating Waves in the Nearshore (SWAN)
ited domain sizes.
                                                                 to predict nearshore wave dynamics (refer to Genseberger

Published by Copernicus Publications on behalf of the European Geosciences Union.
4242                                                      C. Rautenbach et al.: Parallel computing efficiency of SWAN 40.91

and Donners, 2020, for details regarding the SWAN nu-              (1) when using SWAN, is it always better to have as many
merical code and solution schemes). These agencies include         threads/cores as possible available to solve the problem at
the South African Weather Service (e.g. Rautenbach et al.,         hand? (2) What is the speed-up relationship between num-
2020), MetOcean Solutions (a division of the MetService of         ber of threads/cores and computational grid size? (3) At
New Zealand) (e.g. de Souza et al., 2020), the United King-        what point (number of threads/cores) do the domain sub-
dom Met Office (e.g. O’Neill et al., 2016) and the Norwe-          communications start to make the whole computation less
gian Meteorological Institute (e.g. Jeuring et al., 2019). In      effective? (4) What is the scalability of a rectangular compu-
general, these agencies have substantial computational fa-         tational grid in a typical SWAN set-up?
cilities but nonetheless still face the challenge of optimiz-
ing the use of their computational clusters between vari-
ous models (being executed simultaneously). These models           2   Methodology and background
may include atmospheric models (e.g. the Weather Research
and Forecasting (WRF) model), Hydrodynamic models (e.g.            Details of the model configuration can be found in Rauten-
Regional Ocean Modeling System (ROMS) and the Semi-                bach et al. (2020a, b). The computational domain (refer to
implicit Cross-scale Hydroscience Integrated System Model          Fig. 1) and physics used here were the same as presented in
(SCHISM)) and spectral wave models (e.g. Wave Watch III            those studies.
(WW3) and SWAN; Holthuijsen, 2007; The SWAN team,                     All computations were performed on Intel Xeon Gold
2006, 2019b, a). Holthuijsen (2007) presents a theoretical         E5-2670 with 2.3 GHz computational nodes. A total of 28
background to the spectral wave equations, wave measure-           threads/cores each with 96 GB RAM was used with 1 Gbyte
ment techniques and statistics as well as a concluding chap-       inter-thread communication speed. Given that the present
ter on the theoretical analysis of the SWAN numerical model.       study was performed using a single computational node,
There must also be a balance between hindcast and forecast         inter-node communication speeds are not considered. Thus,
priorities and client needs. Some of these agencies use a reg-     given a computational node with similar processing speed,
ular grid (instead of irregular grids, e.g. Zhang et al., 2016),   the present study should be reproducible. In general, these
with nested domains in many of their operational and hind-         node specifications are reasonably standard, and therefore the
cast projects. Here, we focus only on the computational per-       present study is an acceptable representation of the SWAN
formance of a structured regular grid (typically implemented       scalability parameters.
for spectral wave models). Kerr et al. (2013) performed               SWAN 40.91 was implemented with the Van der West-
an inter-model comparison of computational efficiencies by         huysen whitecapping formulation (van der Westhuysen et al.,
comparing SWAN, coupled with ADCIRC, and the NOAA                  2007) and Collins bottom friction correlation (Collins, 1972)
official storm surge forecasting model, Sea, Lake, and Over-       with a coefficient value of 0.015. Fully spectral wave bound-
land Surges from Hurricanes (SLOSH); however, they did             ary conditions were extracted from a global Wave Watch III
not investigate the optimal thread usage of a single model.        model at 0.5 geographical degree resolution.
Other examples of coupled wave and storm surge model                  Here, the main aim was not the validation of the model
computational benchmarking experiments include Tanaka et           but rather to quantify the relative computational scalabilities,
al. (2011) and Dietrich et al. (2012), who used a unstruc-         as described at the end of the previous section. However, it
tured meshes to simulate waves during Hurricanes Katrina,          should be noted that no nested domains were employed dur-
Rita, Gustav and Ike in the Gulf of Mexico. Results from           ing the present study. Only the parent domain was used as
these models were presented on a log–log scale, and their          a measure for scalability. The computational extent given in
experimental design tested computational thread numbers            Rautenbach et al. (2020a, b) contains numerous non-wet grid
not easily obtainable by smaller agencies and companies.           cells that are not included in the computational expense of
The latter rather require sequential versus paralleled com-        the current study. In Table 1, the size of the computational
putational efficiencies using smaller-scale efficiency metrics.    domain and resolution together with the labelling conven-
Genseberger and Donners (2015) explored the scalability of         tion is given. For clarity, we define the resolutions as low,
SWAN using a case study focused on the Wadden Sea in               medium and high, denoted L, M and H, respectively, in the
the Netherlands. By investigating the efficiency of both the       present study (noting that given the domain size, these res-
OpenMP (OMP) and MPI version of the then current SWAN,             olutions would be classified as intermediate to high regional
they found that the OpenMP was more efficient on a single          resolution for operational purposes).
node. They also proposed a hybrid version of SWAN, to com-            The test for scalability ability of a model used here was
bine the strengths of both implementations of SWAN: using          the ability to respond to an increased number of computa-
OpenMP to more optimally share memory and MPI to dis-              tions with an increasing amount of resources. In the present
tribute memory over the computational nodes.                       study these resources are computational threads/cores. Com-
   Here we build on the case study of Genseberger and              putations were performed for an arbitrary week to assess
Donners using results produced in the present study for            model performance. Model spin-up was done via a single
southern Africa, to answer the following research questions:       stationary computation. The rest of the computation was per-

Geosci. Model Dev., 14, 4241–4247, 2021                                            https://doi.org/10.5194/gmd-14-4241-2021
C. Rautenbach et al.: Parallel computing efficiency of SWAN 40.91                                                             4243

Figure 1. SWAN model extent and associated bathymetry. The location of all the major coastal towns are also provided via acronyms as
follows: Port Nolloth (PN), Saldanha Bay (SB), Cape Town (CT), Mossel Bay (MB), Port Elizabeth (PE), East London (EL), Durban (DN)
and Richards Bay (RB).

Table 1. SWAN grid resolution, grid cell numbers and reference         The time-saving ratio is given by
labels.
                                                                    T1 Sp = (T1 − Tp )/T1 ,                                     (2)
           Label   SWAN grid     Computational grid
                    resolution        cell number                   and the efficiency ratio is defined as
           L            0.1000               31 500                 Ep = Sp /p.                                                 (3)
           M            0.0625               91 392
           H            0.0500              142 800                    The scalability of SWAN was tested based on the speed-up
                                                                    ratios for the grid resolutions in Table 1.
                                                                       Zafari et al. (2019) recently presented some of the first
formed using a non-stationary computation using an hourly           results investigating the effect of different compilers on the
time step, which implied wind-wave generation within the            scalability of a shallow water equation solver. Their experi-
model occurred on the timescale of the wind forcing reso-           ments compared a model compiled with GNU Compiler Col-
lution. The grid resolutions used in the present study corre-       lection (gcc) 7.2.0 and linked with OpenMPI and Intel C++
sponded to 0.1, 0.0625 and 0.05 geographical degrees. Local         compilers with Intel MPI for relatively small computational
bathymetric features were typically resolved through down-          problems. Their numerical computation considered models
scaled, rotated, rectangular grids, following the methodol-         with 600, 300 and 150 K grid cell sizes (what they called
ogy employed by Rautenbach et al. (2020a). A nested res-            matrix size). These computational grid sizes were deemed
olution increase of more than 5 times is also not recom-            “small”, but they still acknowledged the significant compu-
mended (given that the regional model is nested in the global       tational resources required to execute geographical models
Wave Watch III output at 0.5 geographical degree resolution,        of this size due to the large number of time steps undertaken
(Rautenbach et al., 2020a). Given these constraints, these res-     to solve these problems.
olutions represent a realistic and typical SWAN model set-             From a practical point of view, regular SWAN grids will
up, for both operational and hindcast scenarios.                    rarely be used in dimensions exceeding the resolutions pre-
   The three main metrics for estimating computational ef-          sented in the previous section. The reason for this statement
ficiency are the speed-up, time-saving and efficiency ratios.       is twofold: (1) to downscale a spectral wave model from a
A fourth parameter, and arguably the most important, is the         global resolution to a regional resolution should not exceed a
scalability and is estimated using the other three parameters       5-fold refinement factor and (2) when reasonably high reso-
as metrics.                                                         lutions are required in the nearshore (to take complex bathy-
   The speed-up ratio is given as                                   metric features into account), nested domains are preferred.
                                                                    The reasoning will be different for an unstructured grid ap-
Sp = T1 /Tp ,                                               (1)
                                                                    proach (Dietrich et al., 2012). Given these limitations with
where T1 is the time in seconds it takes for a sequential com-      the widely used structured SWAN grid approach, SWAN
putation on one thread and Tp is the time a simulation takes        grids will almost exclusively be deemed as a low spatial com-
with p computational threads/cores (Zhang et al., 2014).            putational demand model. Small tasks create a sharp drop in

https://doi.org/10.5194/gmd-14-4241-2021                                               Geosci. Model Dev., 14, 4241–4247, 2021
4244                                                       C. Rautenbach et al.: Parallel computing efficiency of SWAN 40.91

performance via the Intel C++ compiler due to the “work             MPI SWAN and 64 nodes (64 × 24 threads/cores) for the hy-
stealing” algorithm, aimed at balancing out the computa-            brid SWAN. These results were based on using the Cartesius
tional load between threads/cores (Zafari et al., 2019). In this    2690 v3 (Genseberger and Donners, 2020). With the hybrid
scenario, the threads/cores compete against each other result-      SWAN, the optimal wall-clock time turn point, for iterations,
ing in an unproductive simulation. In our experiments, each         increased with an increased number of computational cells.
task performed via Intel was approximately 13 times faster          All of the reported turn points (optimal points) occurred at
but the overall performance was 16 times slower than the            node counts well above 4 nodes (4 × 24 threads/cores). The
equivalent gcc version of the compiled shallow water model          wall-clock performance estimation of Genseberger and Don-
presented by Zafari et al. (2019).                                  ners (2015) did, however, indicate similar results to those
                                                                    presented in Fig. 2a, with OMP running faster than MPI
                                                                    for small thread/core counts. For larger thread/core counts
3   Results                                                         MPI performs better in the present study. This difference
                                                                    in performance is probably related to the particular hard-
In Fig. 2, the computational scalability of SWAN is given
                                                                    ware configuration (Genseberger and Donners, 2015). It must
as a function of number of computational threads/cores. Fig-
                                                                    still be noted that with an increased number of nodes, and
ure 2a shows the computational time in seconds and here the
                                                                    thus threads/cores, the total computational time should con-
model resolutions grouped together with not much differen-
                                                                    tinue to decrease up until the point where the internal do-
tiation between them. These results also highlight the need
                                                                    main decomposition, communication efficiencies, starts to
for performance metrics, as described in the previous sec-
                                                                    outweigh the gaining of computational power. Based on re-
tion. From Fig. 2b the MPI version of SWAN is more effi-
                                                                    sults of Genseberger and Donners (2020), we can estimate
cient for all the computational domain sizes. There is also a
                                                                    that, for our node configuration and region of interest, the
clear grouping between OMP and MPI. Figure 2c presents
                                                                    communication inefficiencies will become dominant at ap-
the speed-up ratios and clearly indicates that the MPI ver-
                                                                    proximately 16 nodes (16 × 24 threads/cores). One of the
sion of SWAN outperforms the OMP version. The closer the
                                                                    possible explanations for the non-perfect speed-up observed
results are to the 1 : 1 line, the better the scalability.
                                                                    in Fig. 2c is related to the computational domain partition
   Near-linear speed-up is observed for a small number of
                                                                    methods used, and the wet and dry (or active and inac-
computational threads/cores. This result agrees with those re-
                                                                    tive) point definitions in the model. In the present study the
ported by Zafari et al. (2019). In Fig. 2d the results are ex-
                                                                    dry points were the bathymetry or topography values above
pressed via the time-saving ratio. In this case, the curves start
                                                                    mean sea level (MSL). The employed partition method is
to asymptote with thread counts larger than approximately 6.
                                                                    currently stripwise because of the underlying parallel tech-
                                                                    nique, namely the wavefront method (Genseberger and Don-
4   Discussion                                                      ners, 2015; Zijlema, 2005). The stripwise partition is thus po-
                                                                    tentially not the most effective method to optimize speed-up.
The behaviour noted in the results is similar to the                In the present study this partition leads to an optimal point
dam-breaking computational results reported by Zhang et             around six threads/cores without losing too much parallel ef-
al. (2014). Genseberger and Donners (2020) present the lat-         ficiency. In general, increasing the number of threads/cores
est finding on the scalability and benchmarking of SWAN.            will still produce results faster, but in an increasingly ineffi-
However, their focus was quantifying the performance of             cient manner. This trend is clear from Fig. 2c and d where
their new hybrid version of SWAN. In their benchmark-               the total computational time (speed-up ratio, time-saving ra-
ing experiments (for the Wadden Sea, in the Netherlands),           tio) does not scale linearly with the increasing number of
they obtained different results to Fig. 2a, with OMP gener-         threads/cores. The ideal scenario (linear scalability) would be
ally producing faster wall-clock computational times. They          if the computational results followed the 1 : 1 line in Fig. 2c.
also considered the physical distances between computa-             In Fig. 2d the non-linear flattening of the time-saving ratio is
tional threads/cores and found that this parameter has a negli-     also evident, although the ratio still slightly increases beyond
gible effect compared to differences between OMP and MPI,           six threads/cores. This result implies the total computational
over an increasing number of threads/cores. Their bench-            time will marginally decrease with an increasing number of
marking also differed from the results presented here as they       threads/cores. This marginal decrease in computational time
only provided results as a function of node number. Each one        could, however, still make significant differences in the to-
of their nodes consisted of 24 threads/cores. In the present        tal simulation times when extensive simulation periods are
study, the benchmarking of a single node (28 threads/cores)         considered.
is evaluated compared with a serial computation on a sin-
gle thread. For benchmarking, without performance metrics,
they found that the wall clock times, for the iterations and
not a full simulation, reached a minimum (for large compu-
tational domains) at 16 nodes (16 × 24 threads/cores) for the

Geosci. Model Dev., 14, 4241–4247, 2021                                              https://doi.org/10.5194/gmd-14-4241-2021
C. Rautenbach et al.: Parallel computing efficiency of SWAN 40.91                                                               4245

Figure 2. Model performance as a function of the number of computational threads/cores. (a) Computing time in seconds, (b) efficiency
(Eq. 3), (c) speed-up ratio (Eq. 1) and (d) the time-saving ratio (Eq. 2).

5   Conclusions                                                      computational configuration. Ultimately, the efficiencies rec-
                                                                     ommended here can improve operational performance sub-
The present study investigated the scalability of SWAN, a            stantially, particularly when implemented over the range of
widely used spectral wave model. Three typical wave model            modelling software needed to produce useful metocean fore-
resolutions were used for these purposes. Both the OpenMP            casts. Future studies might consider investigating the scala-
(OMP) and the Message Passing Interface (MPI) implemen-              bility employing a gcc compiler.
tations of SWAN were tested. The scalability is presented
via three performance metrics: the efficiency, speed-up ratio
and the timesaving ratio. The MPI version of SWAN outper-            Code and data availability. The open-source version of SWAN
formed the OMP version based on all three metrics. The MPI           was run for the purposes of the present study. SWAN may
version of SWAN performed best with the largest computa-             be downloaded from http://swanmodel.sourceforge.net/ (SWAN,
tional domain resolution, resulting in the highest speed-up          2020). To ensure a compatible version of SWAN remains
                                                                     available, the current, latest version of SWAN is permanently
ratios. The time-saving ratio indicated a decrease after ap-
                                                                     archived at https://hdl.handle.net/10289/14269 (SWAN, 2021). The
proximately six computational threads/cores. This result sug-        bathymetry used for the present study may be downloaded from
gests that six threads/cores are the most effective configura-       https://www.gebco.net/ (GEBCO, 2020), and the wind forcing
tion for executing SWAN. The largest increases in speed-up           may be found from https://climatedataguide.ucar.edu/climate-data/
and efficiency were observed with small thread counts. Ac-           climate-forecast-system-reanalysis-cfsr (National Center for At-
cording to Genseberger and Donners (2020), computational             mospheric Research Staff, 2017).
times decrease up to ∼ 16 nodes (16 × 24 threads/cores), in-
dicating the wall-clock optimal computational time for their
case study. This result suggests that multiple nodes will be         Author contributions. CR conceptualized the study, executed the
required to reach the optimal wall-clock computational time          experiments and wrote the paper. He also secured the publication
– even though this turn point might not be the most efficient        funding. JCM and KRB reviewed the paper.

https://doi.org/10.5194/gmd-14-4241-2021                                               Geosci. Model Dev., 14, 4241–4247, 2021
4246                                                              C. Rautenbach et al.: Parallel computing efficiency of SWAN 40.91

Competing interests. The authors declare that they have no conflict         Holthuijsen, L. H.: Waves in Oceanic and Coastal Waters,
of interest.                                                                   Cambridge University Press, Technische Universiteit Delft,
                                                                               The Netherlands, https://doi.org/10.1017/CBO9780511618536,
                                                                               2007.
Disclaimer. Publisher’s note: Copernicus Publications remains               Jeuring, J., Knol-kauffman, M., and Sivle, A.: Toward valu-
neutral with regard to jurisdictional claims in published maps and             able weather and sea-ice services for the marine Arc-
institutional affiliations.                                                    tic: exploring user – producer interfaces of the Nor-
                                                                               wegian Meteorological, Polar Geogr., 43, 139–159,
                                                                               https://doi.org/10.1080/1088937X.2019.1679270, 2019.
Financial support. This research was funded by the National Re-             Kamphuis, J. W.: Introduction to coastal engineering and manage-
search Foundation of South Africa (grant number 116359).                       ment – Advanced series on ocean engineering, World scientific
                                                                               publishing Co. Pte. Ltd., Singapore, Vol. 48, 2020.
                                                                            Kerr, P. C., Donahue, A. S., Westerink, J. J., Luettich, R. A., Zheng,
                                                                               L. Y., Weisberg, R. H., Huang, Y., Wang, H. V., Teng, Y., Forrest,
Review statement. This paper was edited by Julia Hargreaves and
                                                                               D. R., Roland, A., Haase, A. T., Kramer, A. W., Taylor, A. A.,
reviewed by two anonymous referees.
                                                                               Rhome, J. R., Feyen, J. C., Signell, R. P., Hanson, J. L., Hope,
                                                                               M. E., Estes, R. M., Dominguez, R. A., Dunbar, R. P., Semeraro,
                                                                               L. N., Westerink, H. J., Kennedy, A. B., Smith, J. M., Powell, M.
                                                                               D., Cardone, V. J., and Cox, A. T.: U.S. IOOS coastal and ocean
References                                                                     modeling testbed: Inter-model evaluation of tides, waves, and
                                                                               hurricane surge in the Gulf of Mexico, J. Geophys. Res.-Oceans,
Babatunde, A., Pascale, B., Sin, C. C., William, C., Peter, C., Fa-            118, 5129–5172, https://doi.org/10.1002/jgrc.20376, 2013.
   tima, D., Seita, E., Veronika, E., Forest, C., Peter, G., Eric, G.,      Liu, T. K., Sheu, H. Y., and Tseng, C. N.: Environmental im-
   Christian, J., Vladimir, K., Chris, R., and Markku, R.: Evalua-             pact assessment of seawater desalination plant under the frame-
   tion of Climate Models, in: Climate Change 2013: The Physi-                 work of integrated coastal management, Desalination, 326, 10–
   cal Science Basis. Contribution of Working Group I to the Fifth             18, https://doi.org/10.1016/j.desal.2013.07.003, 2013.
   Assessment Report of the Intergovernmental Panel on Climate              National Center for Atmospheric Research Staff (Eds.): The
   Change, edited by: Stocker, T. F., Qin, D., Plattner, G.-K., Tig-           Climate Data Guide: Climate Forecast System Reanaly-
   nor, M., Allen, S. K., Boschung, J., Nauels, A., Xia, Y., Bex,              sis (CFSR), available at: https://climatedataguide.ucar.edu/
   V., and Midgley, P. M., Cambridge University Press, Cambridge,              climate-data/climate-forecast-system-reanalysis-cfsr (last ac-
   UK, 2013.                                                                   cess: 1 June 2020), 2017.
Collins, J. I.: Prediction of shallow-water spectra, J. Geophys.            O’Neill, C., Saulter, A., Williams, J., and Horsburgh, K.: NEMO-
   Res., 77, 2693–2707, https://doi.org/10.1029/JC077i015p02693,               surge: Application of atmospheric forcing and surge evaluation,
   1972.                                                                       Technical report 619, available at: http://www.metoffice.gov.uk/
de Souza, J. M. A. C., Couto, P., Soutelino, R., and                           binaries/content/assets/mohippo/pdf/library/frtr_619_2016p.pdf
   Roughan, M.: Evaluation of four global ocean reanalysis                     (last access: 1 February 2020), 2016.
   products for New Zealand waters – A guide for regional                   Rautenbach, C., Daniels, T., de Vos, M., and Barnes, M. A.: A
   ocean modelling, New Zeal. J. Mar. Fresh., 55, 132–155,                     coupled wave, tide and storm surge operational forecasting sys-
   https://doi.org/10.1080/00288330.2020.1713179, 2020.                        tem for South Africa: validation and physical description, Nat.
Dietrich, J. C., Tanaka, S., Westerink, J. J., Dawson, C. N.,                  Hazards, 103, 1407–1439, https://doi.org/10.1007/s11069-020-
   Luettich, R. A., Zijlema, M., Holthuijsen, L. H., Smith, J.                 04042-4, 2020a.
   M., Westerink, L. G., and Westerink, H. J.: Performance of               Rautenbach, C., Barnes, M. A., Wang, D. W. and Dykes, J.: South-
   the Unstructured-Mesh, SWAN+ADCIRC Model in Comput-                         ern African wave model sensitivities and accuracies, J. Mar. Sci.
   ing Hurricane Waves and Surge, J. Sci. Comput., 52, 468–497,                Eng., 8, 77, https://doi.org/10.3390/jmse8100773, 2020b.
   https://doi.org/10.1007/s10915-011-9555-6, 2012.                         SWAN (Simulating WAves Nearshore): a third generation wave
Frihy, O. E.: The necessity of environmental impact assessment                 model, Delft University of Technology, available at: http://
   (EIA) in implementing coastal projects: lessons learned from the            swanmodel.sourceforge.net/, last access: 18 September 2020.
   Egyptian Mediterranean Coast, Ocean Coast. Manage., 44, 489–             SWAN (Simulating WAves Nearshore): a third generation wave
   516, https://doi.org/10.1016/S0964-5691(01)00062-X, 2001.                   model, Delft University of Technology, available at: https://hdl.
GEBCO: General Bathymetric Chart of the Oceans, available at:                  handle.net/10289/14269, last access: 1 April 2021.
   https://www.gebco.net/, last access: 2 February 2020.                    Tanaka, S., Bunya, S., Westerink, J. J., Dawson, C., and Luettich,
Genseberger, M. and Donners, J.: A Hybrid SWAN Version for Fast                R. A.: Scalability of an unstructured grid continuous Galerkin
   and Efficient Practical Wave Modelling, Procedia Comput. Sci.,              based hurricane storm surge model, J. Sci. Comput., 46, 329–
   51, 1524–1533, https://doi.org/10.1016/j.procs.2015.05.342,                 358, https://doi.org/10.1007/s10915-010-9402-1, 2011.
   2015.                                                                    The SWAN Team: USER MANUAL SWAN Cycle III ver-
Genseberger, M. and Donners, J.: Hybrid SWAN for Fast and                      sion 40.51, Cycle, available at: http://falk.ucsd.edu/modeling/
   Efficient Practical Wave Modelling – Part 2, edited by:                     swanuse.pdf (last access: 1 March 2021), 2006.
   Krzhizhanovskaya, V. V., Závodszky, G., Lees, M. H., Dongarra,
   J. J., Sloot, P. M. A., Brissos, S., and Teixeira, J., Springer Inter-
   national Publishing, Cham, vol. 12139, 87–100, 2020.

Geosci. Model Dev., 14, 4241–4247, 2021                                                       https://doi.org/10.5194/gmd-14-4241-2021
C. Rautenbach et al.: Parallel computing efficiency of SWAN 40.91                                                                4247

The SWAN team: Implementation Manual Swan Cycle III version        Zhang, S., Xia, Z., Yuan, R., and Jiang, X.: Parallel com-
  41.31, available at: http://swanmodel.sourceforge.net/online_       putation of a dam-break flow model using OpenMP
  doc/swanimp/swanimp.html (last access: 1 March 2021), 2019a.        on a multi-core computer, J. Hydrol., 512, 126–133,
The SWAN team: User Manual Swan Cycle III version                     https://doi.org/10.1016/j.jhydrol.2014.02.035, 2014.
  41.3, available at: http://swanmodel.sourceforge.net/download/   Zhang, Y. J., Ye, F., Stanev, E. V., and Grashorn, S.: Seamless
  zip/swanimp.pdf (last access: 2 July 2021), 2019b.                  cross-scale modeling with SCHISM, Ocean Model., 102, 64–81,
van der Westhuysen, A. J., Zijlema, M., and Battjes, J. A.:           https://doi.org/10.1016/j.ocemod.2016.05.002, 2016.
  Nonlinear saturation-based whitecapping dissipation in SWAN      Zijlema, M.: Parallelization of a nearshore wind wave model for dis-
  for deep and shallow water, Coast. Eng., 54, 151–170,               tributed memory architectures, in: Parallel Computational Fluid
  https://doi.org/10.1016/j.coastaleng.2006.08.006, 2007.             Dynamics 2004, Elsevier, 207–214, 2005.
Zafari, A., Larsson, E., and Tillenius, M.: DuctTeip:
  An efficient programming model for distributed task-
  based parallel computing, Parallel Comput., 90, 102582,
  https://doi.org/10.1016/j.parco.2019.102582, 2019.

https://doi.org/10.5194/gmd-14-4241-2021                                              Geosci. Model Dev., 14, 4241–4247, 2021
You can also read