Python, netCDF, and Model Results Visualization

Intro to Python

Most of the analysis and tools in the Salish Sea MEOPAR project are written in Python, though Matlab makes occasional guest appearances.

This slide deck from a physics course at Cornell University provides a good, fairly detailed, introduction to Python for people who already know at least one programming language. Of course, no two groups make exactly the same choices within a language and the few differences to our choices are detailed below. Also, don’t get too bogged down in the details of object-oriented and functional programming (especially slides 18 through 22) as we don’t use those aspects much.

A few differences you will see compared to our Python code:

  • The Cornell course uses an older syntax for string interpolation:

    print 'value of %s = %s' % (name, val)
    

    In our notebooks and code you are more likely to see that spelled like:

    print('value of {n} = {v}'.format(n=name, v=val))
    

    or perhaps:

    print('value of {0} = {1}'.format(name, val))
    
  • The scipy.array syntax discussed on slides 25 through 28 is a synonym for numpy.ndarray and you will see it used in our code as:

    import numpy as np
    
    a = np.array([[1,2,3], [4,5,6], [7,8,9]])
    ...
    p = np.arange(0.,1.,0.1)
    etc.
    
  • The pylab namespace mentioned on slide 31 is a Matlab-like interface to the Matplotlib library. In our code we try to use the pyplot object-oriented interface, so you will see things like:

    import matplotlib.pyplot as plt
    import numpy as np
    
    xvals = np.linspace(-10., 10., 100)
    yvals = xvals**3
    fig, (ax1, ax2, ax3) = plt.subplots(1, 3)
    ax1.plot(xvals, yvals)
    ax2.plot(xvals, yvals, 'r.')
    ax3.hist(yvals)
    

Jupyter Notebook, netCDF, and Model Results

We have an ongoing project to develop a collection of Jupyter Notebooks that provide discussion, examples, and best practices for plotting various kinds of model results from netCDF files. There are code examples in the notebooks and also examples of the use of functions from the SalishSeaTools Package.

If you are new to the Salish Sea project, or to Jupyter Notebook, netCDF, and Matplotlib you should read the notebooks in the following order:

The links here are to static renderings of the notebooks via nbviewer.jupyter.org . The notebook source files are in the analysis_tools directory of the tools repo.

ERDDAP and xarray

From late-2013 until early-2016 we used the netCDF4-python library to open locally stored files. The notebooks above describe that way of working. In early-2016 we set up an ERDDAP server to provide public access to our model results. The netCDF4-python library can open datasets from ERDDAP URLs just as easily as it can open them from local files. So, here is a reworking of the Exploring netCDF Files.ipynb notebook using ERDDAP:

One reason that you might want to use ERDDAP to access our model results is if you don’t have access to our results files stored on the UBC EOAS Ocean cluster. Our ERDDAP server is public.

Another reason to use ERDDAP is that it provides access to the daily model results as continuous data streams, hiding the fact that they are stored in per-day files. ERDDAP makes it much easier to work with a dataset that spans multiple days because it removes the task of opening each day’s file(s) and splicing the variable values into arrays. You can just ask for a slice of the dataset in time and space and ERDDAP takes care of the slicing and splicing (provided that the resulting dataset is less than 2Gb in size).

Another new development is the xarray package. Quoting from the introduction to its documentation:

xarray … is an open source project and Python package that aims to bring the labeled data power of pandas to the physical sciences, by providing N-dimensional variants of the core pandas data structures.

Our goal is to provide a pandas-like and pandas-compatible toolkit for analytics on multi-dimensional arrays, rather than the tabular data for which pandas excels. Our approach adopts the Common Data Model for self-describing scientific data in widespread use in the Earth sciences: xarray.Dataset is an in-memory representation of a netCDF file.

Here is a reworking of the Exploring netCDF Files.ipynb notebook using xarray:

xarray uses the netCDF4-python library so it is capable of accessing netCDF datasets from either local files or from ERDDAP servers. The xarray.Dataset object hides many of the low level details of the netcdf4.Dataset objects to provide a more Pythonic interface to the dataset that is heavily inspired by pandas. Like panada variables, xarray variables have a plot() method that makes quick visualization of datasets very easy.

xarray provides sophisticated handling of the time coordinate of datasets. In combination with ERDDAP that feature makes accessing arbitrary length time slices from the daily Salish Sea Nowcast system results collection very easy.

In summary, you can think of ERDDAP as a higher level abstraction for storage of our model results, and xarray as a higher level abstraction for working with the results as Python objects. The ERDDAP abstraction hides some of the discrete daily runs storage details, and the xarray abstraction hides some of the netCDF4 file structure details.

Here is a notebook that demonstrates some of the features of xarray combined with accessing model results from our ERDDAP server: