Getting started

A quick overview of FLEXPART data

reflexible was originally developed for working with FLEXPART V8.x which has some fairly new features to how the output data is created. The latest version of FLEXPART also has functionality for saving directly to Netcdf. The ability to read this data directly is forthcoming, but for now reflexible still only works with the raw unformatted binary Fortran data FLEXPART has traditionally used for output. See the documents for information regarding FLEXPART .

A users guide for FLEXPART is available which explains the model output.

Note If you are interested in contributing functionality for other FLEXPART versions, please contact me.

reflexible was originally released as ‘pflexpart’, but as the goal is to be more generic, the package was renamed. The current release is still focused on FLEXPART, but some generalizations are starting to make their way into the code base.

reflexible is undergoing constant modifications and is not particularly stable or backward compatible code. I am trying to move in the right direction, and have moved the code now to github.org. If you are interested in contributing, feel free to contact me: John F. Burkhart


Example data

Several example simulations are available for testing. The simulations contain a simple backward run case and a forward case, and are suitable for testing some of the unique functions of reflexible for analysis and creation of the retroplumes.

The data is distributed with the repository:

$ ls reflexible/reflexible/uio_examples

Converting FLEXPART output to netCDF4 format

Reflexible is using a netCDF4 internally for doing its analysis and plotting duties. Not all the example data sets contain netcdf output, and several FLEXPART users are not yet using this functionality. This section demonstrates how to convert the FLEXPART output to netCDF4 format. In order to do that the create_ncfile script will be invoked. This script is copied into a directory in your path when reflexible is installed, so you should not worry about copying it manually.

The example data we will use is in a directory named uio_examplesBwd1_V9.02. It contains the result of processing a simple backward run case with FLEXPART. Next we can execute the fprun script:

$ fprun Bwd1_V9.02/pathnames
UserWarning: NetCDF4 files not found in output directory 'Bwd1_V9.02/outputs'.
You can always generate them from data there with the `create_ncfile` command line utility.
Read b'FLEXPART V9.0' Header: Bwd1_V9.02/outputs/header
Flexpart('Bwd1_V9.02/pathnames', nested=False)

note that you pass the pathnames of a FLEXPART run. The pathnames file has a simple structure. For example, in our case it goes like this:

/site/opt/flexpart/9.02/examples/Bwd1/options/
./outputs/

/site/opt/flexpart/WIND_FIELDS/AVAILABLE_ECMWF_EI_fields_global
============================================

So, basically in the first line indicates the <options> directory for the FLEXPART run, whereas the second line specifies the <output> directory. With this, you can easily mix and match different <options> and <output> directories for your analysis.

If we want to select the nested data instead, we can look instead at the Fwd1_V9.02/pathnames simulation with the -n flag:

$ fprun -n Fwd2_V9.02/pathnames
UserWarning: NetCDF4 files not found in output directory 'Fwd2_V9.02/outputs'.
You can always generate them from data there with the `create_ncfile` command line utility.
Read b'FLEXPART V8.2' Header: Fwd2_V9.02/outputs/header_nest
Flexpart('Fwd2_V9.02/', nested=True)

And if you want to get some info on the COMMANDS file:

$ fprun -n -C Fwd2_V9.02/pathnames
Read b'FLEXPART V8.2' Header: Fwd2_V9.02/outputs/header_nest
Flexpart('Fwd2_V9.02/pathnames', nested=True)
Command: OrderedDict([('AVG_CNC_INT', 3600), ('AVG_CNC_TAVG', 3600), ('CNC_SAMP_TIME', 300), ('CTL', 3.0), ('IFINE', 4), ('IFLUX', 0), ('IND_RECEPTOR', 1), ('IND_SOURCE', 1), ('IOUT', 5), ('IPIN', 0), ('IPOUT', 0),  ('LAGESPECTRA', 0), ('LCONVECTION', 0), ('LINIT_COND', 2), ('LSUBGRID', 1), ('MDOMAINFILL', 0), ('MQUASILAG', 0), ('NESTED_OUTPUT', 1), ('OUTPUTFOREACHRELEASE', 0), ('SIM_DIR', 1), ('SIM_END', ['20070122', '180000']), ('SIM_START', ['20070121', '090000']), ('SYNC', 300), ('T_PARTSPLIT', 999999999)])

You can get more info on the supported flags by passing the -h flag to the fprun command line utility:

$ fprun -h
usage: fprun [-h] [-n] [-C] [-R] [-S] [-H HEADER_KEY] [-K] [pathnames]

positional arguments:
  pathnames             The Flexpart pathnames file stating where options and
                        output are. If you pass a dir, a 'pathnames' file will
                        be appended automatically. If not found yet, a FP
                        output dir is assumed.

optional arguments:
  -h, --help            show this help message and exit
  -n, --nested          Use a nested output.
  -C, --command         Print the COMMAND contents.
  -R, --releases        Print the RELEASES contents.
  -S, --species         Print the SPECIES contents.
  -H HEADER_KEY, --header-key HEADER_KEY
                        Print the contents of H[HEADER_KEY].
  -K, --header-keys     Print all the HEADER keys.

Reading data out of a FLEXPART run

Newer versions of FLEXPART can generate convenient NetCDF4 files as output, so let’s have a quick glimpse on how you can access the different data on it.

Note In case you have a FLEXPART output that is not in NetCDF4 format, you can always make use the create_ncfile command line utility.

Open the file and print meta-information for the run:

In [1]: from netCDF4 import Dataset

In [2]: rootgrp = Dataset('./Fwd1_V9.02/outputs/grid_conc_2007012190000_nest.nc', 'r')

In [3]: print(rootgrp)
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    Conventions: CF-1.6
    title: FLEXPART model output
    institution: NILU
    source: V8 model output
    history: 2016-10-28 16:43 NA created by faltet on faltet-Latitude-E6430
    references: Stohl et al., Atmos. Chem. Phys., 2005, doi:10.5194/acp-5-2461-200
    outlon0: 1.0
    outlat0: 39.5
    dxout: 0.019999999553
    dyout: 0.019999999553
    ldirect: 1
    ibdate: 20070121
    ibtime: 100000
    iedate: 20070122
    ietime: 180000
    loutstep: 3600
    loutaver: 3600
    loutsample: 300
    lsubgrid: 1
    lconvection: 0
    ind_source: 1
    ind_receptor: 1
    itsplit: 999999999
    linit_cond: 2
    lsynctime: 300
    ctl: 3.0
    ifine: 4
    iout: 5
    ipout: 0
    lagespectra: 0
    ipin: 0
    ioutputforeachrelease: 0
    iflux: 0
    mdomainfill: 0
    mquasilag: 0
    nested_output: 1
    surf_only: 0
    dimensions(sizes): time(33), longitude(220), latitude(220), height(1), numspec(1), pointspec(1), nageclass(1), nchar(45), numpoint(1)
    variables(dimensions): int32 time(time), float32 longitude(longitude), float32 latitude(latitude), float32 height(height), <class 'str'> RELCOM(numpoint), float32 RELLNG1(numpoint), float32 RELLNG2(numpoint), float32 RELLAT1(numpoint), float32 RELLAT2(numpoint), float32 RELZZ1(numpoint), float32 RELZZ2(numpoint), int32 RELKINDZ(numpoint), int32 RELSTART(numpoint), int32 RELEND(numpoint), int32 RELPART(numpoint), float32 RELXMASS(numspec,numpoint), int32 LAGE(nageclass), int32 ORO(latitude,longitude), float32 spec001_mr(nageclass,pointspec,time,height,latitude,longitude), float32 WD_spec001(nageclass,pointspec,time,latitude,longitude), float32 DD_spec001(nageclass,pointspec,time,latitude,longitude)
    groups:

We can get the info for a specific attribute just by referencing it like this:

In [4]: print(rootgrp.loutstep)
3600

We can have a look at the different dimensions and variables in the file:

In [5]: print(rootgrp.dimensions)
OrderedDict([('time', <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 33
), ('longitude', <class 'netCDF4._netCDF4.Dimension'>: name = 'longitude', size = 220
), ('latitude', <class 'netCDF4._netCDF4.Dimension'>: name = 'latitude', size = 220
), ('height', <class 'netCDF4._netCDF4.Dimension'>: name = 'height', size = 1
), ('numspec', <class 'netCDF4._netCDF4.Dimension'>: name = 'numspec', size = 1
), ('pointspec', <class 'netCDF4._netCDF4.Dimension'>: name = 'pointspec', size = 1
), ('nageclass', <class 'netCDF4._netCDF4.Dimension'>: name = 'nageclass', size = 1
), ('nchar', <class 'netCDF4._netCDF4.Dimension'>: name = 'nchar', size = 45
), ('numpoint', <class 'netCDF4._netCDF4.Dimension'>: name = 'numpoint', size = 1
)])


In [6]: rootgrp.variables.keys()
Out[11]: odict_keys(['time', 'longitude', 'latitude', 'height', 'RELCOM', 'RELLNG1', 'RELLNG2', 'RELLAT1', 'RELLAT2', 'RELZZ1', 'RELZZ2', 'RELKINDZ', 'RELSTART', 'RELEND', 'RELPART', 'RELXMASS', 'LAGE', 'ORO', 'spec001_mr'])

The netCDF4 Python wrappers allows to easily slice and dice variables:

In [15]: longitude = rootgrp.variables['longitude']

In [16]: print(longitude)
<class 'netCDF4._netCDF4.Variable'>
float32 longitude(longitude)
    long_name: longitude in degree east
    axis: Lon
    units: degrees_east
    standard_name: grid_longitude
    description: grid cell centers
unlimited dimensions:
current shape = (220,)
filling on, default _FillValue of 9.969209968386869e+36 used

We see that ‘longitude’ is a unidimensional variable with shape (220,). Let’s read just the 10 first elements:

In [20]: longitude[:10]
Out[20]:
array([ 1.00999999,  1.02999997,  1.04999995,  1.07000005,  1.09000003,
    1.11000001,  1.13      ,  1.14999998,  1.16999996,  1.18999994], dtype=float32)

As only the 10 first elements are brought into memory, that permits to reduce your memory needs for your analysis.

Also, what you get from slicing netCDF4 variables are always NumPy arrays:

In [21]: type(longitude[:10])
Out[21]: numpy.ndarray

which, besides of being memory-efficient, they are what you normally use in your analysis tasks.

Also, each variable can have attached different attributes meant to add more information about what they are about:

In [23]: longitude.ncattrs()
Out[23]: ['long_name', 'axis', 'units', 'standard_name', 'description']

In [24]: longitude.long_name
Out[24]: u'longitude in degree east'

In [25]: longitude.units
Out[25]: u'degrees_east'

That’s is basically all you need to know to access the on-disk data. Feel free to play a bit more with the netCDF4 interface, because you will find it very convenient when combined with reflexible.


Basic reflexible functionality

Once you have checked out the code and have a sufficient FLEXPART dataset to work with you can begin to use the module. The first step is to load the package. Depending on how you checked out the code, you can accomplish this in a few different way, but the preferred is as follows:

In [1]: import reflexible as rf

The next step is to create the accessor to the FLEXPART run. In this first example we’ll work with a Forward simulation. This is easier, conceptually and a common FLEXPART use case. You pass the location of the ‘pathnames’ file to the Flexpart constructor:

In [2]: fprun = rf.Flexpart("Fwd2_V9.02/pathnames")

In [3]: type(fprun)
Out[3]: reflexible.flexpart.Flexpart

So, fprun is an instance of the Flexpart class that allows you to easily access different parts of the FLEXPART run. For example, we can access the COMMAND like this:

In [4]: fprun.Command
Out[4]:
OrderedDict([('AVG_CNC_INT', 3600),
             ('AVG_CNC_TAVG', 3600),
             ('CNC_SAMP_TIME', 300),
             ('CTL', 3.0),
             ('IFINE', 4),
             ('IFLUX', 0),
             ('IND_RECEPTOR', 1),
             ('IND_SOURCE', 1),
             ('IOUT', 5),
             ('IPIN', 0),
             ('IPOUT', 0),
             ('LAGESPECTRA', 0),
             ('LCONVECTION', 0),
             ('LINIT_COND', 2),
             ('LSUBGRID', 1),
             ('MDOMAINFILL', 0),
             ('MQUASILAG', 0),
             ('NESTED_OUTPUT', 1),
             ('OUTPUTFOREACHRELEASE', 0),
             ('SIM_DIR', 1),
             ('SIM_END', ['20070122', '180000']),
             ('SIM_START', ['20070121', '090000']),
             ('SYNC', 300),
             ('T_PARTSPLIT', 999999999)])

the SPECIES:

In [5]: fprun.Species
Out[5]:
defaultdict(list,
            {'decay': [-999.9],
             'dquer': [-9.9],
             'dryvel': [-9.99],
             'dsigma': [-9.9],
             'f0': [-9.9],
             'henry': [-9.9],
             'kao': [-99.99],
             'ohreact': [-9.9e-09],
             'reldiff': [-9.9],
             'spec_ass': [-9],
             'weightmolar': [350.0],
             'weta': [-9.9e-09],
             'wetb': [-9.9e-09]})

But perhaps the most important accessor is the Header:

In [8]: H = fprun.Header

In [9]: type(H)
Out[9]: reflexible.data_structures.Header

Now we have a variable ‘H’ which has all the information about the run that is available from the header file. This ‘Header’ is a class instance, so the first step may be to explore some of the attributes:

In [12]: print(H.keys())
['C', 'FD', 'Heightnn', 'ORO', 'absolute_path', 'alt_unit', 'area', 'available_dates', 'available_dates_dt', 'direction', 'dxout', 'dyout', 'fill_grids', 'fp_path', 'ibdate', 'ibtime', 'iedate', 'ietime', 'ind_receptor', 'ind_source', 'iout', 'ireleaseend', 'ireleasestart', 'latitude', 'lconvection', 'ldirect', 'longitude', 'loutaver', 'loutsample', 'loutstep', 'lsubgrid', 'nageclass', 'nc', 'ncfile', 'nested', 'nspec', 'numageclasses', 'numpoint', 'numpointspec', 'numxgrid', 'numygrid', 'numzgrid', 'options', 'outheight', 'outlat0', 'outlon0', 'output_unit', 'pointspec', 'releaseend', 'releasestart', 'releasetimes', 'species', 'zpoint1', 'zpoint2']

Working with rflexible in depth

Assuming the above steps worked out, then we can proceed to understand the tools in a bit more detail.

Okay, let’s take a look at the example code above line by line. The first line imports the module, giving it a namespace “rf” – this is the preferred approach.

The next line creates a “fprun” instance of Flexpart, by passing the pathnames of a FLEXPART run.:

In [24]: fprun = rf.Flexpart("Fwd2_V9.02/pathnames")

and from there, we can easily have access to the Header container:

In [25]: H = fprun.Header

The Header is central to reflexible. This contains much information about the FLEXPART run, and enable plotting, labeling of plots, looking up dates of runs, coordinates for mapping, etc. All this information is contained in the Header. See for example:

In [27]: print(dir(H))
['C', 'FD', 'Heightnn', 'ORO', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_gridarea', 'absolute_path', 'add_trajectory', 'alt_unit', 'area', 'available_dates', 'available_dates_dt', 'direction', 'dxout', 'dyout', 'fill_grids', 'fp_path', 'ibdate', 'ibtime', 'iedate', 'ietime', 'ind_receptor', 'ind_source', 'iout', 'ireleaseend', 'ireleasestart', 'keys', 'latitude', 'lconvection', 'ldirect', 'longitude', 'loutaver', 'loutsample', 'loutstep', 'lsubgrid', 'nageclass', 'nc', 'ncfile', 'nested', 'nspec', 'numageclasses', 'numpoint', 'numpointspec', 'numxgrid', 'numygrid', 'numzgrid', 'options', 'outheight', 'outlat0', 'outlon0', 'output_unit', 'pointspec', 'releaseend', 'releasestart', 'releasetimes', 'species', 'zpoint1', 'zpoint2']

This will show you all the attributes associated with the Header.


H is now an object in your workspace. Using Ipython you can explore the methods and attributes of H.

Data Containers: H.FD and H.C

There are a couple attributes of the Header that contain data. These are important to understand in order to work with reflexible.

Reasonably, you should now want to read in some of the data from your run. At this point, the Header or, H variable, should now have an attribute ‘FD’ which is again a dictionary of the FLEXPART grids (think ‘Flexpart Data’):

In [13]: H.FD
Out[13]: <reflexible.data_structures.FD at 0x7f83bc4c5898>

In [15]: H.FD.keys()[:3]  # for only the 3 first entries
Out[15]: [(0, '20070121100000'), (0, '20070121110000'), (0, '20070121120000')]

Look at the keys of the dictionary to see what information is stored. The actual data is keyed by tuples: (nspec, datestr) where nspec is the species number and datestr is a YYYYMMDDHHMMSS string for the grid timestep:

In [16]: fd = H.FD[(0, '20070121100000')]

In [17]: fd.data_cube.shape
Out[18]: (60, 40, 1, 1, 1)

And we can see what the dimensions are as they relate to the shape:

In [19]: fd.data_cube.dims
Out[20]: ('longitude', 'latitude', 'height', 'pointspec', 'nageclass')

Some information about the individual data for the timestep is provided:

In [33]: fd.keys()
Out[33]:
['data_cube',
 'gridfile',
 'itime',
 'timestamp',
 'species',
 'rel_i',
 'spec_i',
 'dry',
 'wet',
 'slabs',
 'shape',
 'max',
 'min']

These are available as attributes of the flexpart data class:

In [97]: fd.timestamp
Out[97]: datetime.datetime(2007, 1, 21, 10, 0)

In [102]: fd.species
Out[102]: ['TRACER']

There is a second attribute, H.C, that is meant to provide ‘Cumulative` sensitivity at each time step, as well as some cumulative values such as a total_column attribute.

H.C is similar to the H.FD object described above, but contains the Cumulative sensitivity at each time step, so you can use it for plotting retroplumes.

It is important to understand the differences between H.FD and H.C while working with reflexible. If we look closely at the keys of H.FD:

In [29]: H.FD.keys()[:3]
Out[29]: [(0, '20070121100000'), (0, '20070121110000'), (0, '20070121120000')]

You’ll see that the dictionary is primary keyed by a set of tuples. These tuples represent (s, date), where s is the specied ID and date is the date of a grid in FLEXPART.

However, if we look at the keys of the H.C dictionary:

In [30]: H.C.keys()[:3]
Out[30]: [(0, 0)]

We see only tuples, now keyed by (s, rel_id), where s is still the species ID, but rel_id is the release ID. These release IDs correspond to the times in H.releasetimes which is a list of the release times. In a forward run, this will be the same. However, in a backward run, there will be differences.

Each tuple is a key to another dictionary, that contains the data. Currently there are differences between the way the data is stored in H.FD and in H.C, but future versions are working to make these two data stores common.

So we know now H.C is keyed by (s,k) where s is an integer for the species #, and k is an integer for the release id.

Working with Backward Simulations

The biggest difference for the H.C object is when you have a backward simulation. First, for a backward simulation, the attribute will not exist unless you explicitly call for it:

In [135]: bwrun = rf.Flexpart('Bwd1_V9.02/pathnames')
/Data/johnbur/.conda/envs/shyft/lib/python3.6/site-packages/reflexible-0.5.0-py3.6.egg/reflexible/flexpart.py:45: UserWarning: NetCDF4 files not found in output directory 'Bwd1_V9.02/outputs'.  You can always generate them from data there with the `create_ncfile` command line utility.
  self.fp_output))
Read b'FLEXPART V9.0' Header: Bwd1_V9.02/outputs/header

In [136]: bwH = bwrun.Header

In [137]: bwH.C
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-137-71db37c7653c> in <module>()
----> 1 bwH.C

~/.conda/envs/shyft/lib/python3.6/site-packages/reflexible-0.5.0-py3.6.egg/reflexible/conv2netcdf4/legacy_structures.py in __getattr__(self, attr)
    178         if attr == "__getstate__":
    179             return lambda: None
--> 180         return self[attr]
    181
    182     def __setattr__(self, attr, value):

KeyError: 'C'

Because reading all the data for a backward run is time and memory intensive, it will not be done automatically. Instead, you need to explicitly ask for it:

In [138]: bwH.fill_backward()
getting grid for:  ['20070121090000', '20070121100000', '20070121110000', '20070121120000', '20070121130000', '20070121140000', '20070121150000', '20070121160000', '20070121170000', '20070121180000', '20070121190000', '20070121200000', '20070121210000', '20070121220000', '20070121230000', '20070122000000', '20070122010000', '20070122020000', '20070122030000', '20070122040000', '20070122050000', '20070122060000', '20070122070000', '20070122080000', '20070122090000', '20070122100000', '20070122110000', '20070122120000', '20070122130000', '20070122140000', '20070122150000', '20070122160000', '20070122170000']
Assumed V8 Flexpart
Using readgrid from FortFlex
60 40 1 [0] 0 0 33 1
20070121090000
20070121100000
20070121110000
20070121120000
20070121130000
20070121140000
20070121150000
20070121160000
20070121170000
20070121180000
20070121190000
20070121200000
20070121210000
20070121220000
20070121230000
20070122000000
20070122010000
20070122020000
20070122030000
20070122040000
20070122050000
20070122060000
20070122070000
20070122080000
20070122090000
20070122100000
20070122110000
20070122120000
20070122130000
20070122140000
20070122150000
20070122160000
20070122170000
[0]

At which point, reflexible will run through all the releasetimes and calculate the cumulative sensitivity so that retroplumes may be calculated.

Note the shapes of the data returned when running backward simulations different:

In [154]: bw_fd = bwH.FD[(0, '20070121090000')]

In [155]: c = bwH.C[(0,0)]

In [156]: bw_fd.shape
Out[156]: (60, 40, 1, 1, 1)

In [157]: c.shape
Out[157]: (60, 40, 1)

Adding Trajectories

I use the read_trajectories() function to read the trajectories.txt file and get the trajectories from the run output directory.:

T = rf.read_trajectories(H)

Note, that the only required parameter is the Header “H”, this provides all the metadata for the function to read the trajectories. This is a function that accepts simply the “H” instance or a path to a trajectories file.

Now we can see how we might batch process a backward run and create total column plots as well as add the trajectory information to the plots. The following lines plot the data sets using the plot_totalcolumn(), plot_trajectory(), and plot_footprint().

Warning

There is a lot of reliance on the mapping module in the plot_routines. If you have problems, see the mapping.py file. Or the mapping docstrings. Documentation of this module is presently incomplete but I am working on it.

In order to reuse figures which is much faster when working with the basemap module, I create a “None” objects for passing the figure instances around:

TC = None

After that we loop over the keys (s=species, and k=rel_i) of the H.C attribute we created by calling fill_backward. Note, I named this attribute C for “Cumulative”. In each iteration, for a new combination of s,k we pull the data object out of the dictionary. The “data” object is returned from the function readgridV8() and has some attributes that we can use later in conjunction with the plot_totalcolumn() function and for saving and naming the figures. See for example the following lines:

for s, k in H.C:
    data = H.C[(s, k)]
    TC = rf.plot_totalcolumn(H, data, map_region='Europe', FIGURE=TC)
    TC = rf.plot_trajectory(H, T, k, FIGURE=TC)
    filename = '%s_tc_%s.png' % (data.species, data.timestamp)
    TC.fig.savefig(filename)

This will create filenames based on the data metadata and save the figure to the path defined by filename. You should now have several images looking like this:

_images/sample_totalcolumn.png

The next step is the use the source and learn more about the functionality of the module. I highly recommend the Ipython interpreter and use of the Tab key to explore the modules methods.

Enjoy!