Parallel computing efficiency of SWAN 40.91 - GMD
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Geosci. Model Dev., 14, 4241–4247, 2021 https://doi.org/10.5194/gmd-14-4241-2021 © Author(s) 2021. This work is distributed under the Creative Commons Attribution 4.0 License. Parallel computing efficiency of SWAN 40.91 Christo Rautenbach1,2,3,4 , Julia C. Mullarney4 , and Karin R. Bryan4 1 Institute for Coastal and Marine Research, Nelson Mandela University, Port Elizabeth, South Africa 2 Department of Oceanography and Marine Research Institute, University of Cape Town, Cape Town, South Africa 3 Research and development, MetOcean (a division of the MetService), Raglan, New Zealand 4 Environmental Research Institute, University of Waikato, Hamilton, New Zealand Correspondence: Christo Rautenbach (rautenbachchristo@gmail.com) Received: 18 September 2020 – Discussion started: 2 December 2020 Revised: 18 May 2021 – Accepted: 10 June 2021 – Published: 6 July 2021 Abstract. Effective and accurate ocean and coastal wave 1 Introduction predictions are necessary for engineering, safety and recre- ational purposes. Refining predictive capabilities is increas- The computational efficiency of metocean (meteorology and ingly critical to reduce the uncertainties faced with a chang- oceanography) modelling has been the topic of ongoing de- ing global wave climatology. Simulating WAves in the liberation for decades. The applications range from long- Nearshore (SWAN) is a widely used spectral wave modelling term atmospheric and ocean hindcast simulations to the tool employed by coastal engineers and scientists, including fast-responding simulations related to operational forecast- for operational wave forecasting purposes. Fore- and hind- ing. Long-duration simulations are usually associated with casts can span hours to decades, and a detailed understand- climate-change-related research, with simulation periods of ing of the computational efficiencies is required to design at least 30 years across multiple spatial and temporal reso- optimized operational protocols and hindcast scenarios. To lutions needed to capture key oscillations (Babatunde et al., date, there exists limited knowledge on the relationship be- 2013). Such hindcasts are frequently used by coastal and off- tween the size of a SWAN computational domain and the shore engineering consultancies for purposes such as those optimal amount of parallel computational threads/cores re- related to infrastructure design (Kamphuis, 2020) or environ- quired to execute a simulation effectively. To test the scala- mental impact assessments (Frihy, 2001; Liu et al., 2013). bility, a hindcast cluster of 28 computational threads/cores Operational (or forecasting) agencies are usually con- (1 node) was used to determine the computation efficien- cerned with achieving simulation speeds that would allow cies of a SWAN model configuration for southern Africa. them to accurately forewarn their stakeholders of immediate, The model extent and resolution emulate the current op- imminent and upcoming metocean hazards. The main stake- erational wave forecasting configuration developed by the holders are usually other governmental agencies (e.g. disas- South African Weather Service (SAWS). We implemented ter response or environmental affairs departments), commer- and compared both OpenMP and the Message Passing Inter- cial entities, and the public. Both atmospheric and marine face (MPI) distributing memory architectures. Three sequen- forecasts share similar numerical schemes that solve the gov- tial simulations (corresponding to typical grid cell numbers) erning equations and thus share a similar need in computa- were compared to various permutations of parallel compu- tional efficiency. Fast simulation times are also required for tations using the speed-up ratio, time-saving ratio and effi- other forecasting fields such as hydrological dam-break mod- ciency tests. Generally, a computational node configuration els (e.g. Zhang et al., 2014). Significant advancement in op- of six threads/cores produced the most effective computa- erational forecasting can be made by examining the way in tional set-up based on wave hindcasts of 1-week duration. which the code interfaces with the computation nodes, and The use of more than 20 threads/cores resulted in a decrease how results are stored during simulation. in speed-up ratio for the smallest computation domain, owing Numerous operational agencies (both private and public) to the increased sub-domain communication times for lim- make use of Simulating Waves in the Nearshore (SWAN) ited domain sizes. to predict nearshore wave dynamics (refer to Genseberger Published by Copernicus Publications on behalf of the European Geosciences Union.
4242 C. Rautenbach et al.: Parallel computing efficiency of SWAN 40.91 and Donners, 2020, for details regarding the SWAN nu- (1) when using SWAN, is it always better to have as many merical code and solution schemes). These agencies include threads/cores as possible available to solve the problem at the South African Weather Service (e.g. Rautenbach et al., hand? (2) What is the speed-up relationship between num- 2020), MetOcean Solutions (a division of the MetService of ber of threads/cores and computational grid size? (3) At New Zealand) (e.g. de Souza et al., 2020), the United King- what point (number of threads/cores) do the domain sub- dom Met Office (e.g. O’Neill et al., 2016) and the Norwe- communications start to make the whole computation less gian Meteorological Institute (e.g. Jeuring et al., 2019). In effective? (4) What is the scalability of a rectangular compu- general, these agencies have substantial computational fa- tational grid in a typical SWAN set-up? cilities but nonetheless still face the challenge of optimiz- ing the use of their computational clusters between vari- ous models (being executed simultaneously). These models 2 Methodology and background may include atmospheric models (e.g. the Weather Research and Forecasting (WRF) model), Hydrodynamic models (e.g. Details of the model configuration can be found in Rauten- Regional Ocean Modeling System (ROMS) and the Semi- bach et al. (2020a, b). The computational domain (refer to implicit Cross-scale Hydroscience Integrated System Model Fig. 1) and physics used here were the same as presented in (SCHISM)) and spectral wave models (e.g. Wave Watch III those studies. (WW3) and SWAN; Holthuijsen, 2007; The SWAN team, All computations were performed on Intel Xeon Gold 2006, 2019b, a). Holthuijsen (2007) presents a theoretical E5-2670 with 2.3 GHz computational nodes. A total of 28 background to the spectral wave equations, wave measure- threads/cores each with 96 GB RAM was used with 1 Gbyte ment techniques and statistics as well as a concluding chap- inter-thread communication speed. Given that the present ter on the theoretical analysis of the SWAN numerical model. study was performed using a single computational node, There must also be a balance between hindcast and forecast inter-node communication speeds are not considered. Thus, priorities and client needs. Some of these agencies use a reg- given a computational node with similar processing speed, ular grid (instead of irregular grids, e.g. Zhang et al., 2016), the present study should be reproducible. In general, these with nested domains in many of their operational and hind- node specifications are reasonably standard, and therefore the cast projects. Here, we focus only on the computational per- present study is an acceptable representation of the SWAN formance of a structured regular grid (typically implemented scalability parameters. for spectral wave models). Kerr et al. (2013) performed SWAN 40.91 was implemented with the Van der West- an inter-model comparison of computational efficiencies by huysen whitecapping formulation (van der Westhuysen et al., comparing SWAN, coupled with ADCIRC, and the NOAA 2007) and Collins bottom friction correlation (Collins, 1972) official storm surge forecasting model, Sea, Lake, and Over- with a coefficient value of 0.015. Fully spectral wave bound- land Surges from Hurricanes (SLOSH); however, they did ary conditions were extracted from a global Wave Watch III not investigate the optimal thread usage of a single model. model at 0.5 geographical degree resolution. Other examples of coupled wave and storm surge model Here, the main aim was not the validation of the model computational benchmarking experiments include Tanaka et but rather to quantify the relative computational scalabilities, al. (2011) and Dietrich et al. (2012), who used a unstruc- as described at the end of the previous section. However, it tured meshes to simulate waves during Hurricanes Katrina, should be noted that no nested domains were employed dur- Rita, Gustav and Ike in the Gulf of Mexico. Results from ing the present study. Only the parent domain was used as these models were presented on a log–log scale, and their a measure for scalability. The computational extent given in experimental design tested computational thread numbers Rautenbach et al. (2020a, b) contains numerous non-wet grid not easily obtainable by smaller agencies and companies. cells that are not included in the computational expense of The latter rather require sequential versus paralleled com- the current study. In Table 1, the size of the computational putational efficiencies using smaller-scale efficiency metrics. domain and resolution together with the labelling conven- Genseberger and Donners (2015) explored the scalability of tion is given. For clarity, we define the resolutions as low, SWAN using a case study focused on the Wadden Sea in medium and high, denoted L, M and H, respectively, in the the Netherlands. By investigating the efficiency of both the present study (noting that given the domain size, these res- OpenMP (OMP) and MPI version of the then current SWAN, olutions would be classified as intermediate to high regional they found that the OpenMP was more efficient on a single resolution for operational purposes). node. They also proposed a hybrid version of SWAN, to com- The test for scalability ability of a model used here was bine the strengths of both implementations of SWAN: using the ability to respond to an increased number of computa- OpenMP to more optimally share memory and MPI to dis- tions with an increasing amount of resources. In the present tribute memory over the computational nodes. study these resources are computational threads/cores. Com- Here we build on the case study of Genseberger and putations were performed for an arbitrary week to assess Donners using results produced in the present study for model performance. Model spin-up was done via a single southern Africa, to answer the following research questions: stationary computation. The rest of the computation was per- Geosci. Model Dev., 14, 4241–4247, 2021 https://doi.org/10.5194/gmd-14-4241-2021
C. Rautenbach et al.: Parallel computing efficiency of SWAN 40.91 4243 Figure 1. SWAN model extent and associated bathymetry. The location of all the major coastal towns are also provided via acronyms as follows: Port Nolloth (PN), Saldanha Bay (SB), Cape Town (CT), Mossel Bay (MB), Port Elizabeth (PE), East London (EL), Durban (DN) and Richards Bay (RB). Table 1. SWAN grid resolution, grid cell numbers and reference The time-saving ratio is given by labels. T1 Sp = (T1 − Tp )/T1 , (2) Label SWAN grid Computational grid resolution cell number and the efficiency ratio is defined as L 0.1000 31 500 Ep = Sp /p. (3) M 0.0625 91 392 H 0.0500 142 800 The scalability of SWAN was tested based on the speed-up ratios for the grid resolutions in Table 1. Zafari et al. (2019) recently presented some of the first formed using a non-stationary computation using an hourly results investigating the effect of different compilers on the time step, which implied wind-wave generation within the scalability of a shallow water equation solver. Their experi- model occurred on the timescale of the wind forcing reso- ments compared a model compiled with GNU Compiler Col- lution. The grid resolutions used in the present study corre- lection (gcc) 7.2.0 and linked with OpenMPI and Intel C++ sponded to 0.1, 0.0625 and 0.05 geographical degrees. Local compilers with Intel MPI for relatively small computational bathymetric features were typically resolved through down- problems. Their numerical computation considered models scaled, rotated, rectangular grids, following the methodol- with 600, 300 and 150 K grid cell sizes (what they called ogy employed by Rautenbach et al. (2020a). A nested res- matrix size). These computational grid sizes were deemed olution increase of more than 5 times is also not recom- “small”, but they still acknowledged the significant compu- mended (given that the regional model is nested in the global tational resources required to execute geographical models Wave Watch III output at 0.5 geographical degree resolution, of this size due to the large number of time steps undertaken (Rautenbach et al., 2020a). Given these constraints, these res- to solve these problems. olutions represent a realistic and typical SWAN model set- From a practical point of view, regular SWAN grids will up, for both operational and hindcast scenarios. rarely be used in dimensions exceeding the resolutions pre- The three main metrics for estimating computational ef- sented in the previous section. The reason for this statement ficiency are the speed-up, time-saving and efficiency ratios. is twofold: (1) to downscale a spectral wave model from a A fourth parameter, and arguably the most important, is the global resolution to a regional resolution should not exceed a scalability and is estimated using the other three parameters 5-fold refinement factor and (2) when reasonably high reso- as metrics. lutions are required in the nearshore (to take complex bathy- The speed-up ratio is given as metric features into account), nested domains are preferred. The reasoning will be different for an unstructured grid ap- Sp = T1 /Tp , (1) proach (Dietrich et al., 2012). Given these limitations with where T1 is the time in seconds it takes for a sequential com- the widely used structured SWAN grid approach, SWAN putation on one thread and Tp is the time a simulation takes grids will almost exclusively be deemed as a low spatial com- with p computational threads/cores (Zhang et al., 2014). putational demand model. Small tasks create a sharp drop in https://doi.org/10.5194/gmd-14-4241-2021 Geosci. Model Dev., 14, 4241–4247, 2021
4244 C. Rautenbach et al.: Parallel computing efficiency of SWAN 40.91 performance via the Intel C++ compiler due to the “work MPI SWAN and 64 nodes (64 × 24 threads/cores) for the hy- stealing” algorithm, aimed at balancing out the computa- brid SWAN. These results were based on using the Cartesius tional load between threads/cores (Zafari et al., 2019). In this 2690 v3 (Genseberger and Donners, 2020). With the hybrid scenario, the threads/cores compete against each other result- SWAN, the optimal wall-clock time turn point, for iterations, ing in an unproductive simulation. In our experiments, each increased with an increased number of computational cells. task performed via Intel was approximately 13 times faster All of the reported turn points (optimal points) occurred at but the overall performance was 16 times slower than the node counts well above 4 nodes (4 × 24 threads/cores). The equivalent gcc version of the compiled shallow water model wall-clock performance estimation of Genseberger and Don- presented by Zafari et al. (2019). ners (2015) did, however, indicate similar results to those presented in Fig. 2a, with OMP running faster than MPI for small thread/core counts. For larger thread/core counts 3 Results MPI performs better in the present study. This difference in performance is probably related to the particular hard- In Fig. 2, the computational scalability of SWAN is given ware configuration (Genseberger and Donners, 2015). It must as a function of number of computational threads/cores. Fig- still be noted that with an increased number of nodes, and ure 2a shows the computational time in seconds and here the thus threads/cores, the total computational time should con- model resolutions grouped together with not much differen- tinue to decrease up until the point where the internal do- tiation between them. These results also highlight the need main decomposition, communication efficiencies, starts to for performance metrics, as described in the previous sec- outweigh the gaining of computational power. Based on re- tion. From Fig. 2b the MPI version of SWAN is more effi- sults of Genseberger and Donners (2020), we can estimate cient for all the computational domain sizes. There is also a that, for our node configuration and region of interest, the clear grouping between OMP and MPI. Figure 2c presents communication inefficiencies will become dominant at ap- the speed-up ratios and clearly indicates that the MPI ver- proximately 16 nodes (16 × 24 threads/cores). One of the sion of SWAN outperforms the OMP version. The closer the possible explanations for the non-perfect speed-up observed results are to the 1 : 1 line, the better the scalability. in Fig. 2c is related to the computational domain partition Near-linear speed-up is observed for a small number of methods used, and the wet and dry (or active and inac- computational threads/cores. This result agrees with those re- tive) point definitions in the model. In the present study the ported by Zafari et al. (2019). In Fig. 2d the results are ex- dry points were the bathymetry or topography values above pressed via the time-saving ratio. In this case, the curves start mean sea level (MSL). The employed partition method is to asymptote with thread counts larger than approximately 6. currently stripwise because of the underlying parallel tech- nique, namely the wavefront method (Genseberger and Don- 4 Discussion ners, 2015; Zijlema, 2005). The stripwise partition is thus po- tentially not the most effective method to optimize speed-up. The behaviour noted in the results is similar to the In the present study this partition leads to an optimal point dam-breaking computational results reported by Zhang et around six threads/cores without losing too much parallel ef- al. (2014). Genseberger and Donners (2020) present the lat- ficiency. In general, increasing the number of threads/cores est finding on the scalability and benchmarking of SWAN. will still produce results faster, but in an increasingly ineffi- However, their focus was quantifying the performance of cient manner. This trend is clear from Fig. 2c and d where their new hybrid version of SWAN. In their benchmark- the total computational time (speed-up ratio, time-saving ra- ing experiments (for the Wadden Sea, in the Netherlands), tio) does not scale linearly with the increasing number of they obtained different results to Fig. 2a, with OMP gener- threads/cores. The ideal scenario (linear scalability) would be ally producing faster wall-clock computational times. They if the computational results followed the 1 : 1 line in Fig. 2c. also considered the physical distances between computa- In Fig. 2d the non-linear flattening of the time-saving ratio is tional threads/cores and found that this parameter has a negli- also evident, although the ratio still slightly increases beyond gible effect compared to differences between OMP and MPI, six threads/cores. This result implies the total computational over an increasing number of threads/cores. Their bench- time will marginally decrease with an increasing number of marking also differed from the results presented here as they threads/cores. This marginal decrease in computational time only provided results as a function of node number. Each one could, however, still make significant differences in the to- of their nodes consisted of 24 threads/cores. In the present tal simulation times when extensive simulation periods are study, the benchmarking of a single node (28 threads/cores) considered. is evaluated compared with a serial computation on a sin- gle thread. For benchmarking, without performance metrics, they found that the wall clock times, for the iterations and not a full simulation, reached a minimum (for large compu- tational domains) at 16 nodes (16 × 24 threads/cores) for the Geosci. Model Dev., 14, 4241–4247, 2021 https://doi.org/10.5194/gmd-14-4241-2021
C. Rautenbach et al.: Parallel computing efficiency of SWAN 40.91 4245 Figure 2. Model performance as a function of the number of computational threads/cores. (a) Computing time in seconds, (b) efficiency (Eq. 3), (c) speed-up ratio (Eq. 1) and (d) the time-saving ratio (Eq. 2). 5 Conclusions computational configuration. Ultimately, the efficiencies rec- ommended here can improve operational performance sub- The present study investigated the scalability of SWAN, a stantially, particularly when implemented over the range of widely used spectral wave model. Three typical wave model modelling software needed to produce useful metocean fore- resolutions were used for these purposes. Both the OpenMP casts. Future studies might consider investigating the scala- (OMP) and the Message Passing Interface (MPI) implemen- bility employing a gcc compiler. tations of SWAN were tested. The scalability is presented via three performance metrics: the efficiency, speed-up ratio and the timesaving ratio. The MPI version of SWAN outper- Code and data availability. The open-source version of SWAN formed the OMP version based on all three metrics. The MPI was run for the purposes of the present study. SWAN may version of SWAN performed best with the largest computa- be downloaded from http://swanmodel.sourceforge.net/ (SWAN, tional domain resolution, resulting in the highest speed-up 2020). To ensure a compatible version of SWAN remains available, the current, latest version of SWAN is permanently ratios. The time-saving ratio indicated a decrease after ap- archived at https://hdl.handle.net/10289/14269 (SWAN, 2021). The proximately six computational threads/cores. This result sug- bathymetry used for the present study may be downloaded from gests that six threads/cores are the most effective configura- https://www.gebco.net/ (GEBCO, 2020), and the wind forcing tion for executing SWAN. The largest increases in speed-up may be found from https://climatedataguide.ucar.edu/climate-data/ and efficiency were observed with small thread counts. Ac- climate-forecast-system-reanalysis-cfsr (National Center for At- cording to Genseberger and Donners (2020), computational mospheric Research Staff, 2017). times decrease up to ∼ 16 nodes (16 × 24 threads/cores), in- dicating the wall-clock optimal computational time for their case study. This result suggests that multiple nodes will be Author contributions. CR conceptualized the study, executed the required to reach the optimal wall-clock computational time experiments and wrote the paper. He also secured the publication – even though this turn point might not be the most efficient funding. JCM and KRB reviewed the paper. https://doi.org/10.5194/gmd-14-4241-2021 Geosci. Model Dev., 14, 4241–4247, 2021
4246 C. Rautenbach et al.: Parallel computing efficiency of SWAN 40.91 Competing interests. The authors declare that they have no conflict Holthuijsen, L. H.: Waves in Oceanic and Coastal Waters, of interest. Cambridge University Press, Technische Universiteit Delft, The Netherlands, https://doi.org/10.1017/CBO9780511618536, 2007. Disclaimer. Publisher’s note: Copernicus Publications remains Jeuring, J., Knol-kauffman, M., and Sivle, A.: Toward valu- neutral with regard to jurisdictional claims in published maps and able weather and sea-ice services for the marine Arc- institutional affiliations. tic: exploring user – producer interfaces of the Nor- wegian Meteorological, Polar Geogr., 43, 139–159, https://doi.org/10.1080/1088937X.2019.1679270, 2019. Financial support. This research was funded by the National Re- Kamphuis, J. W.: Introduction to coastal engineering and manage- search Foundation of South Africa (grant number 116359). ment – Advanced series on ocean engineering, World scientific publishing Co. Pte. Ltd., Singapore, Vol. 48, 2020. Kerr, P. C., Donahue, A. S., Westerink, J. J., Luettich, R. A., Zheng, L. Y., Weisberg, R. H., Huang, Y., Wang, H. V., Teng, Y., Forrest, Review statement. This paper was edited by Julia Hargreaves and D. R., Roland, A., Haase, A. T., Kramer, A. W., Taylor, A. A., reviewed by two anonymous referees. Rhome, J. R., Feyen, J. C., Signell, R. P., Hanson, J. L., Hope, M. E., Estes, R. M., Dominguez, R. A., Dunbar, R. P., Semeraro, L. N., Westerink, H. J., Kennedy, A. B., Smith, J. M., Powell, M. D., Cardone, V. J., and Cox, A. T.: U.S. IOOS coastal and ocean References modeling testbed: Inter-model evaluation of tides, waves, and hurricane surge in the Gulf of Mexico, J. Geophys. Res.-Oceans, Babatunde, A., Pascale, B., Sin, C. C., William, C., Peter, C., Fa- 118, 5129–5172, https://doi.org/10.1002/jgrc.20376, 2013. tima, D., Seita, E., Veronika, E., Forest, C., Peter, G., Eric, G., Liu, T. K., Sheu, H. Y., and Tseng, C. N.: Environmental im- Christian, J., Vladimir, K., Chris, R., and Markku, R.: Evalua- pact assessment of seawater desalination plant under the frame- tion of Climate Models, in: Climate Change 2013: The Physi- work of integrated coastal management, Desalination, 326, 10– cal Science Basis. Contribution of Working Group I to the Fifth 18, https://doi.org/10.1016/j.desal.2013.07.003, 2013. Assessment Report of the Intergovernmental Panel on Climate National Center for Atmospheric Research Staff (Eds.): The Change, edited by: Stocker, T. F., Qin, D., Plattner, G.-K., Tig- Climate Data Guide: Climate Forecast System Reanaly- nor, M., Allen, S. K., Boschung, J., Nauels, A., Xia, Y., Bex, sis (CFSR), available at: https://climatedataguide.ucar.edu/ V., and Midgley, P. M., Cambridge University Press, Cambridge, climate-data/climate-forecast-system-reanalysis-cfsr (last ac- UK, 2013. cess: 1 June 2020), 2017. Collins, J. I.: Prediction of shallow-water spectra, J. Geophys. O’Neill, C., Saulter, A., Williams, J., and Horsburgh, K.: NEMO- Res., 77, 2693–2707, https://doi.org/10.1029/JC077i015p02693, surge: Application of atmospheric forcing and surge evaluation, 1972. Technical report 619, available at: http://www.metoffice.gov.uk/ de Souza, J. M. A. C., Couto, P., Soutelino, R., and binaries/content/assets/mohippo/pdf/library/frtr_619_2016p.pdf Roughan, M.: Evaluation of four global ocean reanalysis (last access: 1 February 2020), 2016. products for New Zealand waters – A guide for regional Rautenbach, C., Daniels, T., de Vos, M., and Barnes, M. A.: A ocean modelling, New Zeal. J. Mar. Fresh., 55, 132–155, coupled wave, tide and storm surge operational forecasting sys- https://doi.org/10.1080/00288330.2020.1713179, 2020. tem for South Africa: validation and physical description, Nat. Dietrich, J. C., Tanaka, S., Westerink, J. J., Dawson, C. N., Hazards, 103, 1407–1439, https://doi.org/10.1007/s11069-020- Luettich, R. A., Zijlema, M., Holthuijsen, L. H., Smith, J. 04042-4, 2020a. M., Westerink, L. G., and Westerink, H. J.: Performance of Rautenbach, C., Barnes, M. A., Wang, D. W. and Dykes, J.: South- the Unstructured-Mesh, SWAN+ADCIRC Model in Comput- ern African wave model sensitivities and accuracies, J. Mar. Sci. ing Hurricane Waves and Surge, J. Sci. Comput., 52, 468–497, Eng., 8, 77, https://doi.org/10.3390/jmse8100773, 2020b. https://doi.org/10.1007/s10915-011-9555-6, 2012. SWAN (Simulating WAves Nearshore): a third generation wave Frihy, O. E.: The necessity of environmental impact assessment model, Delft University of Technology, available at: http:// (EIA) in implementing coastal projects: lessons learned from the swanmodel.sourceforge.net/, last access: 18 September 2020. Egyptian Mediterranean Coast, Ocean Coast. Manage., 44, 489– SWAN (Simulating WAves Nearshore): a third generation wave 516, https://doi.org/10.1016/S0964-5691(01)00062-X, 2001. model, Delft University of Technology, available at: https://hdl. GEBCO: General Bathymetric Chart of the Oceans, available at: handle.net/10289/14269, last access: 1 April 2021. https://www.gebco.net/, last access: 2 February 2020. Tanaka, S., Bunya, S., Westerink, J. J., Dawson, C., and Luettich, Genseberger, M. and Donners, J.: A Hybrid SWAN Version for Fast R. A.: Scalability of an unstructured grid continuous Galerkin and Efficient Practical Wave Modelling, Procedia Comput. Sci., based hurricane storm surge model, J. Sci. Comput., 46, 329– 51, 1524–1533, https://doi.org/10.1016/j.procs.2015.05.342, 358, https://doi.org/10.1007/s10915-010-9402-1, 2011. 2015. The SWAN Team: USER MANUAL SWAN Cycle III ver- Genseberger, M. and Donners, J.: Hybrid SWAN for Fast and sion 40.51, Cycle, available at: http://falk.ucsd.edu/modeling/ Efficient Practical Wave Modelling – Part 2, edited by: swanuse.pdf (last access: 1 March 2021), 2006. Krzhizhanovskaya, V. V., Závodszky, G., Lees, M. H., Dongarra, J. J., Sloot, P. M. A., Brissos, S., and Teixeira, J., Springer Inter- national Publishing, Cham, vol. 12139, 87–100, 2020. Geosci. Model Dev., 14, 4241–4247, 2021 https://doi.org/10.5194/gmd-14-4241-2021
C. Rautenbach et al.: Parallel computing efficiency of SWAN 40.91 4247 The SWAN team: Implementation Manual Swan Cycle III version Zhang, S., Xia, Z., Yuan, R., and Jiang, X.: Parallel com- 41.31, available at: http://swanmodel.sourceforge.net/online_ putation of a dam-break flow model using OpenMP doc/swanimp/swanimp.html (last access: 1 March 2021), 2019a. on a multi-core computer, J. Hydrol., 512, 126–133, The SWAN team: User Manual Swan Cycle III version https://doi.org/10.1016/j.jhydrol.2014.02.035, 2014. 41.3, available at: http://swanmodel.sourceforge.net/download/ Zhang, Y. J., Ye, F., Stanev, E. V., and Grashorn, S.: Seamless zip/swanimp.pdf (last access: 2 July 2021), 2019b. cross-scale modeling with SCHISM, Ocean Model., 102, 64–81, van der Westhuysen, A. J., Zijlema, M., and Battjes, J. A.: https://doi.org/10.1016/j.ocemod.2016.05.002, 2016. Nonlinear saturation-based whitecapping dissipation in SWAN Zijlema, M.: Parallelization of a nearshore wind wave model for dis- for deep and shallow water, Coast. Eng., 54, 151–170, tributed memory architectures, in: Parallel Computational Fluid https://doi.org/10.1016/j.coastaleng.2006.08.006, 2007. Dynamics 2004, Elsevier, 207–214, 2005. Zafari, A., Larsson, E., and Tillenius, M.: DuctTeip: An efficient programming model for distributed task- based parallel computing, Parallel Comput., 90, 102582, https://doi.org/10.1016/j.parco.2019.102582, 2019. https://doi.org/10.5194/gmd-14-4241-2021 Geosci. Model Dev., 14, 4241–4247, 2021
You can also read