LASSO-CACTI Beta Release Documentation

LASSO-CACTI Beta Release Details

This post contains the documentation for the LASSO-CACTI Beta Release. It will be updated as the beta evolves.

Version as of 17-May-2022

We are excited to announce that development of the LASSO-CACTI scenario has advanced sufficiently to make a beta release available for a subset of the data to be formally released later in 2022. The LASSO-CACTI scenario consists of a series of simulations conducted in two phases, with 20 dates selected for mesoscale simulations with kilometer-scale grid spacing followed by a target of nine dates for doing large-eddy simulations (LES). The mesoscale runs are done using an ensemble of up to 33 members per date, and the best performing of these are used for one or more LES simulations for the smaller number of case dates.

Users interested in accessing the LASSO-CACTI beta release should send a note to and request an account on ARM’s Cumulus cluster, where the data are currently hosted. Refer to ARM’s Computing Resources web page for requesting Cumulus access. We also ask that any files used be reported back to so that usage tracking can be done before the data set is finalized and hosted in ARM’s Data Discovery and the LASSO Bundle Browser.

The goal of the beta release is to improve the final data product. So, users working with the beta release are expected to provide feedback to the LASSO team regarding potential improvements to the model, the approach to providing data to users, and/or other aspects of the LASSO-CACTI scenario. We expect the contents of the beta release to evolve as more data and aspects of the scenario become available. Notifications about updates to the beta release will be posted to the LASSO-CACTI discussion forum.

Modeling details

All LASSO-CACTI simulations use a slightly customized version of the Weather Research and Forecasting (WRF) model version 4.3.1 with a nested, downscaling approach. A typical WRF workflow is used with important configuration options and enhancements noted in this section.

Grid spacings for the mesoscale domains are 7.5 km (domain 1, or d1) and 2.5 km (d2), and the LES domains use 500 m (d3) and 100 m (d4). WRF’s ndown program is used to separate the mesoscale and LES pairs of domains, with each pair run separately. Model output is every 15 minutes for the 7.5-km, 2.5-km, and 500-m domains; the 100-m domain outputs are every 5 minutes. We are open to user input on portions of LES runs that could be rerun to provide more frequent output with a subset of variables—this would only be possible for some well-behaved periods of value to the community.

The four domains are shown in the accompanying figure. Each domain is roughly centered on the ARM Mobile Facility (AMF) location but with an offset to the east to permit convection to develop downwind of the site. Additional area is also included to the south for domains 1 and 2 because that region has large storm development that in some cases could affect convection near the AMF.

The base set of simulations uses the Mellor-Yamada-Janjic boundary layer parameterization for the mesoscale runs. That physics parameterization is replaced by the Deardorff subgrid-scale scheme in the LES. All domains use the aerosol-aware Thompson microphysics and Rapid Radiation Transfer Model for Global Climate Models (RRTMG) radiation parameterizations. We are exploring a modified selection of physics parameterizations for a subset of the simulations to explore potential benefits for specific conditions.

The atmospheric initial and boundary conditions come from an ensemble of input data sets. Input options used include the Global Data Assimilation System (GDAS) Final Analysis (FNL), the Global Ensemble Forecast System (GEFS) that has 21 members, the European Centre for Medium-Range Weather Forecasts (ECMWF) Reanalysis Version 5 (ERA5), and the ECMWF Ensemble of Data Assimilations (EDA) that has 10 members. In total, there are 33 possible input options. Each case date uses all these options for the mesoscale domains except for a few dates with missing data. As noted above, a small subset of this mesoscale ensemble is then used for the LES domains. Terrain data are obtained from the Multi-Error-Removed Improved-Terrain Digital Elevation Map (MERIT DEM; Yamazaki et al. 2017). These 3-arcsecond terrain data have been smoothed with a 1-km-scale filter using software from Branko Kosovic, NCAR, to improve model stability.

Use of the aerosol-aware Thompson microphysics parameterization requires a background aerosol field when initializing WRF. This is done using aerosol fields from the Goddard Earth Observing System Version 5 (GEOS-5) model, with the species-specific information mapped to the “water friendly” and “ice friendly” aerosol proxies used by Thompson as described in Juliano et al. (2022). The mapping is done using software developed for the WRF-Solar variant of WRF (Jimenez et al. 2016), and portions of the WRF-Solar code are merged into the LASSO version of WRF to better integrate with the GEOS-5 data. We are grateful to Tim Juliano and Pedro Jimenez, both from NCAR, for their sharing of code and assistance in merging it into the LASSO workflow.

Initialization of the soil state is done using an offline WRF-Hydro simulation with the Noah-MP land surface scheme without surface or subsurface routing. The WRF-Hydro simulation is run continuously on a grid with 2.5-km spacing from August 2018 through April 2019 to allow the soil to spin up before the first date used for the LASSO-CACTI atmospheric simulations. The ERA5-Land data set is used to initialize and drive WRF-Hydro.

Four inert tracers have been added to WRF to assist with feature tracking and other applications. These tracers are defined as follows:

  • Tracer 1: a conservative scalar initialized to have the value of the height above mean sea level for each grid point
  • Tracer 2: a conservative scalar initialized to have the value of the height above the surface for each grid point
  • Tracer 3: a unitless scalar (with zero initial concentration) that is emitted at the surface with a flux that is constant in both time and space, and decays with a time scale of 30 minutes
  • Tracer 4: a conservative “PBL” tracer, with zero initial concentration, that is switched on to be 1 at 14 UTC for each case date for vertical levels below 1 km above the surface.

Additional online-calculated diagnostic values are included in the output for users. Primarily, these consist of radiation fluxes and tendencies, convection parameterization tendencies, and microphysics process rates. Further offline-calculated diagnostics, such as combining WRF’s base and perturbation values for height and pressure, will be made available in the subset files, described below, in the final LASSO-CACTI product. During the beta period, an example suite of subset data is made available, and users will need to calculate these offline quantities themselves for other simulations, either by using traditional WRF processing techniques or with the LASSO-CACTI subsetting code, which can be made available upon request.

Observations and mesoscale simulation skill scores

Infrared (IR) and visible satellite data loops are available for all 20 dates for which we ran mesoscale simulations:

Of these dates, nine have been identified as candidates for LES runs based on different factors—most important of which is that deep convection initiates near the AMF such that it would be observable by the C-Band Scanning ARM Precipitation Radar (CSAPR). The list of candidate LES dates is given below. We are conducting LES for selected ensemble members for these dates to further assess their viability based on agreement with observations.


The quality of the mesoscale and LES runs will be assessed through comparisons with satellite brightness temperature and ARM observations such as sounding profiles, radar echo-top height, and surface precipitation. We use skill scores to quantify the level of agreement between some observed and simulated variables.

For this beta release, we provide quantification of the level of agreement of the simulated deep convective regions for all mesoscale runs using satellite brightness temperatures based on the equitable threat score (ETS). In the simulations and satellite data, deep convective regions are determined using an “anvil” IR brightness temperature threshold of 240 K. The quality of the simulation agreement is quantified in terms of an ETS Skill and Bias Skill, and their combination in a Net ETS Skill. Descriptions of these skill scores can be found in the LASSO shallow convection documentation. In short, the values range from 0 to 1, where 1 means perfect agreement. The location of the convection is best characterized by the ETS Skill and the total area by the Bias Skill.

Plots of the ETS skill scores for mesoscale simulations are given here. For each date, the link with the form YYYYmmdd.masks.html, e.g., 20190129.masks.html, takes you to a site where the skill scores for all ensemble member simulations are plotted for that date and provided digitally in .csv files. Below the overview skill-based plots are links to plots for each ensemble member. The plots show, for each 15-minute satellite image, masks of the deep convective area from the observations, simulations, and their intersection. Each image is accompanied by the ETS, Bias, and Net ETS Skill scores. Scrolling through the plots shows the time evolution of the skill scores for the simulation to enable users to assess simulation quality.

Beta release data format

The beta release consists primarily of raw model output and skill scores for users to examine and provide feedback. An example is also provided of the subset files for one simulation, which aim to provide the model variables in more manageable chunks. The approach to bundling the files and updates to the Bundle Browser are still in development, and thus are not currently part of the beta release. The full suite of mesoscale ensembles is made available along with a subset of the LES simulations. Specifically, the LES run for 29-Jan-2019 forced by the EDA09 initial and boundary conditions is currently available. Others have been run, and interested users can contact for more information.

The files are located on the Wolf file system mounted to ARM’s Cumulus cluster. Users can gain access to the files via the compute allocation request noted above. A small amount of computational capacity exists to do analysis on Cumulus. Alternatively, users can transfer the data elsewhere using Globus. This approach is tenable for working with some of the mesoscale runs, but it quickly becomes problematic with the LES data. So, users should be aware of the data sizes as they plan their workflows. A single mesoscale simulation for the input and output data is 326 GB, and an LES simulation is about 37 TB.

A tree structure is used to organize the files into tiers by 1) date, 2) ensemble member, 3) model configuration, and then the output associated with a given simulation. For example, /gpfs/wolf/atm123/world-shared/cacti/beta_release/runs/20181129/eda05/base/run_meso is for the mesoscale simulation using the fifth ensemble member from the EDA as input for the 29-Nov-2018 case date.

In the final release, the data files will be renamed to assist with differentiating between simulations, but for the beta release, the default WRF output file names are retained. As such, the mesoscale simulations will have file names such as wrfout_d01_ and wrfout_d02_. Likewise, the LES simulations also have the d01 and d02 domain monikers due to the use of ndown between the two levels of simulations. Differentiating between files with identical file names is done via the full path of the file (e.g., the path distinguishes the ensemble member for the same date). Restart files are available for LES simulations but not for the mesoscale simulations, which were run as one continuous model run.

To simplify user access to the final LASSO-CACTI release, subsets of variables extracted from the raw WRF output will be made available with a selection of diagnostic variables calculated offline. These subset files will be much smaller than the wrfout files, and many users will likely not need most variables, so only a portion of the subset files will be needed for many applications. This will greatly reduce file transfer requirements and speed analysis software. The subsets will also provide data interpolated to pressure and height levels in addition to WRF’s terrain-following coordinates. The processing to generate these subsets is somewhat intensive for the LES domains, and thus we are not making the subsets available for all runs in the beta release. Instead, for now, an example of what these subsets would contain is being made available across all four domains for the 29-Jan-2019 case for the EDA09 ensemble member. These can be found at ​​/gpfs/wolf/atm123/world-shared/cacti/beta_release/runs/20190129/eda09/base/subset.

We encourage users to check out these example files and provide feedback as to their usefulness, additional variables that might be included, and whether the provided levels are appropriate. This will greatly help improve the final data product by capturing feedback prior to beginning post-processing of all the simulations.

Feedback requested

This is one of the last opportunities for users to provide feedback as we begin finalizing the LASSO-CACTI data set. If you would like to suggest changes to the file formats and/or contents, or if you notice issues or have other suggestions, please post feedback in the LASSO-CACTI forum.

Thank you for taking time to check out the new scenario, and we hope it will be a great resource for your research.

1 Like