![]() |
Geosciences and in particular numerical weather prediction are demanding the highest levels of available computer power. The European Centre for Medium-Range Weather Forecasts, with its experience in using supercomputers in this field, organizes every other year a workshop bringing together manufacturers, computer scientists, researchers and operational users to share their experiences and to learn about the latest developments. This book provides an excellent overview of the latest achievements in and plans for the use of new parallel techniques in meteorology, climatology and oceanography.
The proceedings have been selected for coverage in:
• Index to Scientific & Technical Proceedings (ISTP CDROM version / ISI Proceedings)
https://doi.org/10.1142/9789812704832_fmatter
PREFACE.
CONTENTS.
https://doi.org/10.1142/9789812704832_0001
No abstract received.
https://doi.org/10.1142/9789812704832_0002
During the 1990’s, the Met Office made the transition from a traditional shared memory vector supercomputer to a massively parallel, distributed memory RISC based supercomputer. The Cray T3Es used by the Met Office since 1996 have been a great success, and the Met Office have shown that it is possible to run a world class operational NWP and climate prediction programme on such an architecture.
In mid 2002, the Met Office announced that its next supercomputer would be a vector machine, the NEC SX-6. This paper will give some background to how this decision was made, show some initial results from the SX-6 and outline some of the challenges and opportunities that we will encounter over the coming years.
https://doi.org/10.1142/9789812704832_0003
A spectral atmospheric general circulation model called AFES for climate studies was developed and optimized for the Earth Simulator. The model is a global three-dimensional hydrostatic model using the spectral transform method. We achieved an extremely high efficiency executing AFES in the T1279L96 resolution on the Earth Simulator. The performances of 26.58 Tflops for a 10-timestep run and 23.93 Tflops for a 1-day simulation run were achieved using all 5120 processors. These performances correspond to 64.9% and 58.4%, respectively, of the theoretical peak performance 40.96 Tflops. The T1279 resolution, equivalent to about 10 km horizontal grid interval at the equator, is very close to the highest resolution in which the hydrostatic approximation is valid. To our best knowledge, no other model simulation of the global atmosphere has ever been performed with such super high resolution. Currently, such a simulation is possible only on the Earth Simulator with AFES. In this paper we describe vector parallel programming, optimization method and computational performance.
https://doi.org/10.1142/9789812704832_0004
Following the development of an atmospheric general circulation model that runs very efficiently on the Earth Simulator, three meso-scale resolving global 10-km mesh simulations were performed. Three meso-scale phenomena were chosen as simulation and research targets: They were the typhoon genesis, wintertime cyclogenesis and Baiu-Meiyu frontal zone. A brief summary of these results is given in this paper. Generally speaking, the results are realistic, and the figures of precipitation fields from the simulations may look like synthesized pictures from artificial satellites. The results are very encouraging and suggest the usefulness of such ultra-high resolution global simulations for studies on, for example, interaction between large-scale circulation and meso-scale disturbances. Also rationales for this kind of simulations are discussed.
https://doi.org/10.1142/9789812704832_0005
FRSGC is developing a series of ocean general circulation models (OGCMs) on the Earth Simulator. The purpose of this development is to create OGCMs optimized for high-resolution calculations. As a first step, we focus on developing a model which has high computational performance on the Earth Simulator. The model is vectorized and parallelized well and shows high computation performance. In this paper, the computational methods and performance are described.
https://doi.org/10.1142/9789812704832_0006
We have developed a 4D-VAR global ocean data assimilation system which works efficiently on the earth simulator. The improvement of the computational efficiency has been made mainly by effective parallelization of the component codes and by a simple preconditioning of the gradient method. With this system we can reproduce dynamically consistent ocean states at high resolution which enable us to figure out the detailed dynamical features of ocean circulation and also gives us precise informations for ocean state prediction.
https://doi.org/10.1142/9789812704832_0007
No abstract received.
https://doi.org/10.1142/9789812704832_0008
Today many computational problems are within the reach of more people due to the advances in computers designed for the mass market. Each year computer companies deliver higher computer speeds, more memory, larger disks, and faster networking at a decreased cost. This reality has led many to build large clusters of Pentium class computers interconnected with fast networking, programmed with message passing APIs, and run under the Linux operating system. These high performance clusters built with commodity off the shelf (COTS) hardware have been assembled for much less than the yearly maintenance costs of large, commercial parallel processor machines. Universities and research organizations with modest budgets can now use computer clusters to offer compute performance previously only attainable on expensive mainframe computers. The Naval Research Lab’s Marine Meteorology division has starting evaluating Linux clusters as possible replacement platforms to run their weather models at the lab and at the U.S.Navy’s regional data centers. This paper describes our experiences with this effort, outlining some of the practical considerations, and comparing the performance against similar cases run on the SGI and IBM computers.
https://doi.org/10.1142/9789812704832_0009
An international collaborative project to address a growing need for remote access to real-time and retrospective high volume numerical weather prediction and global climate model data sets (Atmosphere-Ocean General Circulation Models) is described. This paper describes the framework, the goals, benefits, and collaborators of the NOAA Operational Model Archive and Distribution System (NOMADS) pilot project. The National Climatic Data Center (NCDC) initiated NOMADS, along with the National Centers for Environmental Prediction (NCEP) and the Geophysical Fluid Dynamics Laboratory (GFDL). A description of operational and research data access needs is provided as outlined in the U.S. Weather Research Program (USWRP) Implementation Plan for Research in Quantitative Precipitation Forecasting and Data Assimilation to “redeem practical value of research findings and facilitate their transfer into operations.”.
https://doi.org/10.1142/9789812704832_0010
The High Performance Computing (HPC) environment at Fleet Numerical Meteorology and Oceanography Center (FNMOC) is marching towards a teraflop performance level to support the requirements defined in our meteorological and oceanographic (METOC) “Modeling Roadmap”. A new concept, Net-Centric Warfare (NCW), has profound implications for the way the Navy and Joint Forces will fight wars. In an NCW scenario, dispersed warfighters will take advantage of timely and collocated data and information, to create a common operating picture of the battlespace. These developments become driving requirements for a scalable, and flexible infrastructure to serve the upstream and downstream, data ingest, data distribution, product generation, and product delivery functions for the HPC infrastructure. We will describe our approach to designing an e-business infrastructure that is independent of platform, operating system, and programming model. It will be an infrastructure that will interface to our HPC environment, scale to handle ever increasing data loads, support dynamic reuse of software components, and provide distributed warfighters a means to achieve information superiority.
https://doi.org/10.1142/9789812704832_0011
No abstract received.
https://doi.org/10.1142/9789812704832_0012
The CrossGrid project is one of the ongoing research projects involving GRID technology. One of the main tasks in the Meteorological applications package is the implementation of data mining systems for the analysis of operational and reanalysis databases of atmospheric circulation patterns. Previous parallel data mining algorithms reported in the literature focus on parallel computers with predetermined resources (processing units) and high-performance communications. The main goal in this project is designing adaptive schemes for distributing data and computational load according to the changing resources available for each GRID job submitted. In this paper, some preliminary work regarding two different data mining algorithms (self organizing maps and smoothing filters) is presented. These techniques can be used in combination with databases of observations to provide downscaled local forecasts from operative model outputs. This is a more general and practical framework to look at data mining techniques from the meteorological point of view.
https://doi.org/10.1142/9789812704832_0013
Current developments for an e-Science Environment for Environmental Science, integrating data discovery and retrieval, computation and visualisation will be presented. The paper will focus on three developments of the CLRC e-Science Centre: the DataPortal, the HPCPortal and the VisualisationPortal. The DataPortal technology is to be used e.g. for all (Central Laboratory of the Research Councils of the UK) CLRC departments, the Natural Environmental Research Council DataGrid and Environment from the Molecular level project. The HPCPortal will provide access to code libraries and compute resources on the UK Science Grid. The VisualisationPortal finally is to be used e.g. by the projects mentioned above and the GODIVA project to provide access to suitable visualisation tools. It is our aim to provide easy access and support for the usage of data, substantial computing and visualisation resources across Europe by using Grid technologies like grid services and Globus, via user configurable web access points (personal workbenches).
https://doi.org/10.1142/9789812704832_0014
This paper identifies, and suggests solutions to, the key coupling requirements of Climate and Forecast model developers. These requirements have been extracted primarily from meetings with Met Office model developers but are also influenced by discussions with developers from NCAR, CERFACS and UGAMP.
https://doi.org/10.1142/9789812704832_0015
In 1984, initiated by the author, the first workshop on Use of Parallel Processors in Meteorology considered the problem of how to use a small number (≤ 4) of parallel processors in a shared memory environment for numerical meteorological models. At the tenth workshop in 2002, the number of processors has grown to thousands, configured as a cluster of SMP systems. During the same time, the computational peak rates have risen from about 500 MFlopsa to about 5 TFlopsb, an increase of a factor of 104 in 18 years. During this time, many problems have been diagnosed, solved or superseded by new developments. New problem areas have been discovered and different solutions have been found. However, some problems have remained and still haunt us today. An overview of the development in the use of parallel and high-speed computing in meteorology will be given, covering the period spanned by the ten biennial workshops held at ECMWF.
https://doi.org/10.1142/9789812704832_0016
During the past decades there has been a continuous growth in the number of physical and societal problems that have been successfully studied and solved by means of computational modeling and simulation. Further, many new discoveries depend on high performance computer simulations to satisfy their demands for large computational resources and short response time. The Advanced CompuTational Software (ACTS) Collection brings together a number of general-purpose computational tool development projects funded and supported by the U.S. Department of Energy (DOE). These tools make it easier for scientific code developers to write high performance applications for parallel computers. They tackle a number of computational issues that are common to a large number of scientific applications, mainly implementation of numerical algorithms, and support for code development, execution and optimization. The ACTS collection promotes code portability, reusability, reduction of duplicate efforts, and tool maturity. This paper presents a brief introduction to the functionality available in ACTS. It also highlights the tools that are in demand by Climate and Weather modelers.
https://doi.org/10.1142/9789812704832_0017
Many current large and complex HPC applications use simulation packages built on semi-independent program components developed by different groups or for different purposes. On distributed memory parallel supercomputers, how to perform component-name registration and initialize communications between independent components are among the first critical steps in establishing a distributed multi-component environment. Here we describe MPH, a multi-component hand-shaking library that resolves these tasks in a convenient and consistent way. MPH uses MPI for high performance and supports many PVM functionality. It supports three major parallel integration mechanisms: single-component multi-executable (SCME), multi-component single-executable (MCME), and multi-component multi-executable (MCME). It is a simple, easy-to-use module for developing practical user codes, or as basis for larger software tools/frameworks.
https://doi.org/10.1142/9789812704832_0018
The Scalable Modeling System (SMS) is a directive-based parallelization tool. The user inserts directives in the form of comments into existing Fortran code. SMS translates the code and directives into a parallel version that runs on shared and distributed memory high-performance computing platforms. Directives are available to support array re-sizing, inter-process communications, loop translations, and parallel output. SMS also provides debugging tools that significantly reduce code parallelization time. SMS is intended for applications using regular structured grids that are solved using explicit finite difference approximation (FDA) or spectral methods. It has been used to parallelize ten atmospheric and oceanic models but the tool is sufficiently general that it can be applied to other structured grids codes.
The performance of SMS parallel versions of the Eta atmospheric and Regional Ocean Modeling System (ROMS) oceanic models is analyzed. The analysis demonstrates that SMS adds insignificant overhead compared to hand-coded Message Passing Interface (MPI) solutions in these cases. This research shows that, for the ROMS model, using a distributed memory parallel approach on a cache-based shared memory machine yields better performance than an equivalent shared-memory solution due to false sharing. We also find that the ability of compilers/machines to efficiently handle dynamically allocated arrays is highly variable. Finally, we show that SMS demonstrates the performance benefit gained by allowing the user to explicitly place communications. We call for extensions of the High Performance Fortran (HPF) standard to support this capability.
https://doi.org/10.1142/9789812704832_0019
In the framework of FLAME (Family of Linked Atlantic Model Experiments) an eddy-permitting model of the Atlantic Ocean was used to hindcast the uptake and spreading of anthropogenic trace gases, CO2 and CFC, during the last century. The code is based on the public domain software MOM (Modular Ocean Model) Version 2.1. Towards a parallel version the code was extended for shmem and MPI message passing to achieve portability to Cray-T3E and NEC SX systems. The performance of this production code on Cray-T3E as well as NEC-SX5 and SX6 systems is discussed. To underline the need for high-resolution modeling some physical model results are presented.
https://doi.org/10.1142/9789812704832_0020
A computationally efficient three-dimensional modelling system (Proudman Oceanographic Laboratory Coastal-Ocean Modelling System, POLCOMS) has been developed for the simulation of shelf-sea, ocean and coupled shelf-ocean processes. The system is equally suited for use on single processor workstations and massively parallel supercomputers, and particular features of its numerics are an arbitrary (terrain following) vertical coordinate system, a feature preserving advection scheme and accurate calculation of horizontal pressure gradients, even in the presence of steep topography.
One of the roles of this system is to act as a host to ecosystem models, so that they can interact with as accurate a physical environment as is currently feasible. In this study, a hierarchy of nested models links the shelf-wide circulation and ecosystem, via a high resolution physics model of the whole Irish Sea, to the test domain: a region of the western Irish Sea. In this domain, ecosystem models are tested at a resolution of ~1.5km (c.f. the typical summer Rossby radius of 4km). Investigations in the physics-only model show the significance of advective processes (particularly shear diffusion and baroclinic eddies) in determining the vertical and horizontal temperature structure in this region. Here we investigate how a hierarchy of complexity (and computational load) from a 1D point model to a fully 3D eddy resolved model affects the distribution of phytoplankton (and primary production) and nutrients predicted by the European Regional Seas Ecosystem Model (ERSEM), a complex multi-compartment ecosystem model.
We shall also show how the parallel programming features of the POLCOMS code allows large-scale simulations to be carried out on hundreds, and now on over a thousand. processors, approaching Teraflop/s performance levels. This is shown using a series of benchmark runs on the 1280 processor IBM POWER4 system operated by the UK’s HPCx Consortium6.
https://doi.org/10.1142/9789812704832_0021
An adaptive finite elemente method is introduced in a simplified, nonlinear, global atmospheric circulation model to investigate internally generated climate variability in the atmosphere. This article focusses on two aspects of the parallel implementation: fast grid partitioning with a space filling curve approach and coupling MPI-parallel solver libraries to the OpenMP-parallel model.
https://doi.org/10.1142/9789812704832_0022
The requirements for optimisation are slowly changing. The hardware is much more complex, and the compiler is much move sophisticated. The compiler writers have spent years matching code to hardware.
Some examples of the complexity of the p690 CPU, which is based on the Power4 architecture, are:.
• Two floating point (FP) pipes.
• Multiply add every cycle per pipe.
• 6 to 7 cycle FP pipe length.
• 32 architected FP registers.
• 72 hardware “rename” FP registers.
• Cache to FP register takes 4 to 5 cycles.
• Up to 100 instructions hardware look ahead.
Some examples of the complexity of the p690 cache and memory are:.
• L1: 32KB, 2-way associative, FIFO replacement, ‘store through’.
• L2: 1.44MB shared between 2 processors, 4-way associative, latency about 12 cycles, ‘store in’.
• L3: 128MB shared between 8 processors, 8-way associative, latency about 100 cycles.
• Memory:.
- Latency about 300 cycles.
- Bandwidth (measured on 8GB node) about 3.5 GB/sec for 1 processor.
- Bandwidth (measured) about 10GB/sec for 8 processors.
- Hardware for each processor can handle 8 outstanding cache misses, and 8 prefetch streams.
The hardware is described in considerably more detail in:.
• IBM J of R&D, Vol 46, No 1, Jan 2002, IBM Power4 System http://www.research.ibm.com/journal.
• The POWER4 Processor Introduction and Tuning Guide, SG24–7041 http://www.ibm.com/redbooks.
Now the compiler knows about all these things, but the application programmer probably does not - and does not want to!. In fact, I think we can slightly modify the statement of the famous physicist Richard Feynman (who was referring to quantum mechanics) to say that “if you think you understand the p690, you probably don’t understand the p690”. Bearing this in mind, it is best to leave as much as possible to the compiler.
https://doi.org/10.1142/9789812704832_0023
In 2002 Florida State University added a large IBM SP4 distributed SMP cluster to its existing 42 node IBM SP3 cluster. The new cluster has 16 nodes, with each node having 32 CPUs. The FSU GCM has been parallelized using OpenMP and MPI, as well as a hybrid OpenMP/MPI approach. Our results show that the OpenMP approach, when applied at a coarse grained level, gives performance that is comparable, if not better than, MPI on the SP4 cluster. These results are consistent with our previous results on the SP3.
https://doi.org/10.1142/9789812704832_0024
Current parallel computer systems consist of computational nodes connected via an interconnection fabric, with each node containing a finite number of processors connected to a central memory, called multi-node multi-processor architecture. A hybrid MPI/OpenMP programming method on these parallel systems has become popular. In this paper an analytical mathematical model was developed, in order to predict the parallel performance characteristics as a function of the various domain partitioning schemes and the number of participating processors. Actual computations were performed on the High-resolution Limited Area Forecast System (HLAFS), a mesoscale grid-point NWP model developed in China. The methodology and techniques used to optimize and parallelize HLAFS on the IBM RS6000 SP were described. The HLAFS model was parallelized using the hybrid MPI/OpenMP programming method, in combination with various domain partition techniques. The results show that the performance was in good agreement with the analytical mathematical model.
https://doi.org/10.1142/9789812704832_0025
Aeronomy studies the chemical composition of upper atmosphere. An important goal in aeronomic research is to collect a data set of satellite observations that provides comprehensive global coverage. Such a data set takes many months of surveying, because appropriate satellites have a very narrow footprint. In the course of the collection, stratospheric winds redistribute the air. It is therefore necessary to complement a purely vertical aeronomic assimilation process with a stratospheric advection model, and also with a chemical kinetic model. Chemical kinetics need to be calibrated from the advected assimilation data set but the time scales involved in the advection are much longer than those of chemical kinetics. Parallel computing can speed up this calibration process significantly. This is currently not possible, because practically all the assimilation methods are inherently sequential. In this article, we study the separability of chemical and dynamic assimilation on parallel computers with a theoretical analysis and simple one dimensional models.
https://doi.org/10.1142/9789812704832_0026
As the scale of computation becomes larger, software management plays a more important role in operational numerical weather prediction. Agile change is required as well as stability of operation in huge, distributed, and diversified system. In this talk, two ongoing projects within JMA related to the problem will be presented. Firstly, a new scripting language is being developed to describe NQS jobs and to simplify file-handling conventions. Secondly, in order to access various data seamlessly, an extension to API for gridded data access is planned.
https://doi.org/10.1142/9789812704832_0027
A brief review of the Bureau of Meteorology’s (BoM) growth in computational power is given. A look at what changes moving to teraflop level computation are discussed and how this could be achieved for the BoM’s LAPS (limited area prediction system) numerical weather prediction model and subsequent data handling and storage. How this impacts on the existing infrastructure in the delivery of products to the forecaster. A visual technique for the distribution of high-resolution data fields to the forecaster is discussed.
https://doi.org/10.1142/9789812704832_0028
The Air Quality Modeling project at the University of Houston is an ambitious effort that focuses on analyzing air quality in the Greater Houston area and proposing strategies to ensure compliance with new regulations in this area. It is one of several major research activities at the University that require access to considerable computational power and, in some cases, large amounts of storage. To accommodate many of these needs locally, we decided to create and professionally operate a campus-wide computational grid that combines this facility with departmental clusters using GridEngine for job submission. Researchers in Computer Science are developing a customized portal to execute the project’s jobs as part of their EZGrid project. Future plans include collaborating with regional and national partners to form a wide area grid.
https://doi.org/10.1142/9789812704832_0029
As has become customary, brief statements raising fundamental or controversial issues were made to open the discussions.
https://doi.org/10.1142/9789812704832_bmatter
LIST OF PARTICIPANTS.