Storing results#

In most experiments, a IterationTree is used to represent the configurations of its instruments. For each of them, one or multiple measurements are carried out. They should then be stored, alongside the corresponding configuration.

Phileas can be used to configure the instruments. However, it is not responsible for carrying out the measurements and handling their results, which should be done by the user. This includes storing the results of the experiment. Yet, Phileas provides some utility functions that can be used to prepare dataframes for storing those results.

Gridded datasets with xarray#

If the experiment parameter space is a cartesian product of each of the instruments’ parameters’ space, then a gridded dataset is appropriate to store the results of the experiment. xarray datasets are suitable for this purpose.

The function iteration_tree_to_xarray_parameters() can be used to initialize a Dataset or DataArray. Given an iteration tree, it returns a tuple that can be used to define the coordinates of the dataset. The user now just needs to define the number and types of the measurements.

 1config_file = """
 2ins_a:
 3    param1: !sequence [a1-0, a1-1, a1-2]
 4    param2: !sequence [a2-0, a2-1, a2-2]
 5ins_b:
 6    param1: !sequence [b1-0, b1-1, b1-2]
 7    param2: b2
 8"""
 9tree = load_iteration_tree_from_yaml_file(config_file)
10
11coords, dims_name, dims_shape = iteration_tree_to_xarray_parameters(tree)
12results = xr.Dataset(
13    data_vars=dict(
14        field1=(dims_name, np.full(dims_shape, np.nan)),
15        field2=(dims_name, np.full(dims_shape, np.nan)),
16    ),
17    coords=coords
18)
19results
<xarray.Dataset> Size: 584B
Dimensions:       (ins_a.param1: 3, ins_a.param2: 3, ins_b.param1: 3)
Coordinates:
  * ins_a.param1  (ins_a.param1) <U4 48B 'a1-0' 'a1-1' 'a1-2'
  * ins_a.param2  (ins_a.param2) <U4 48B 'a2-0' 'a2-1' 'a2-2'
  * ins_b.param1  (ins_b.param1) <U4 48B 'b1-0' 'b1-1' 'b1-2'
    ins_b.param2  <U2 8B 'b2'
Data variables:
    field1        (ins_a.param1, ins_a.param2, ins_b.param1) float64 216B nan...
    field2        (ins_a.param1, ins_a.param2, ins_b.param1) float64 216B nan...

In this simple example, the experiment uses two instruments. The parameter space contains three dimensions, corresponding to the two parameters of ins_a and the single parameter of ins_b. The returned coordinates are named "ins_a.param1", "ins_a.param2", "ins_b.param2", each containing the corresponding parameter values. The user wants to carry out two measurements for each configuration, containing float values.

See also

If the measurements dataset is sparse, ie. if only a few measurements are carried out, using numpy arrays is not optimal. Instead, consider using sparse containers.

Modifying results requires using another utility function: flatten_datatree(), which converts a hierarchical data tree to a flat one that can be used for xarray indexing. Thus, the acquisition loop usually has this structure:

 1for config in tree:
 2    flat_config = {
 3        coord: v for coord, v in flatten_datatree(config).items()
 4        if coord in dims_name
 5    }
 6
 7    # Acquire measurements ...
 8    measurements = np.random.rand(2)
 9
10    results.field1.loc[flat_config] = measurements[0]
11    results.field2.loc[flat_config] = measurements[1]
12
13results
<xarray.Dataset> Size: 584B
Dimensions:       (ins_a.param1: 3, ins_a.param2: 3, ins_b.param1: 3)
Coordinates:
  * ins_a.param1  (ins_a.param1) <U4 48B 'a1-0' 'a1-1' 'a1-2'
  * ins_a.param2  (ins_a.param2) <U4 48B 'a2-0' 'a2-1' 'a2-2'
  * ins_b.param1  (ins_b.param1) <U4 48B 'b1-0' 'b1-1' 'b1-2'
    ins_b.param2  <U2 8B 'b2'
Data variables:
    field1        (ins_a.param1, ins_a.param2, ins_b.param1) float64 216B 0.8...
    field2        (ins_a.param1, ins_a.param2, ins_b.param1) float64 216B 0.9...

Notice how the output of flatten_datatree() is filtered at line 4: the goal is to keep indexed coordinates only. Indeed, xarray refuses to use non-indexed coordinates for indexing.

Hide code cell source

1print("config:")
2pprint(config)
3print("\nflat_config:")
4pprint(flat_config)
config:
{'ins_a': {'param1': 'a1-2', 'param2': 'a2-2'},
 'ins_b': {'param1': 'b1-2', 'param2': 'b2'}}

flat_config:
{'ins_a.param1': 'a1-2', 'ins_a.param2': 'a2-2', 'ins_b.param1': 'b1-2'}

Tabular datasets with pandas#

Alternatively, you can chose to store measurements in a tabular dataset, like the DataFrame provided by pandas. This can be convenient when the experiment parameter space is not a cartesian product.

The iteration_tree_to_multiindex() function builds aMultiIndex from an IterationTree. It can then be used as a dataframe index.

 1config_file = """
 2ins_a: !union
 3    _reset: last
 4    param1: !sequence [a1-0, a1-1, a1-2]
 5    param2: !sequence [a2-0, a2-1, a2-2]
 6ins_b:
 7    param1: !sequence [b1-0, b1-1, b1-2]
 8    param2: b2
 9"""
10tree = load_iteration_tree_from_yaml_file(config_file)
11
12index = iteration_tree_to_multiindex(tree)
13
14results = pd.DataFrame(
15    np.full((len(index), 2), np.nan),
16    index=index,
17    columns=["field1", "field2"]
18)
19results
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[5], line 12
      1 config_file = """
      2 ins_a: !union
      3     _reset: last
   (...)
      8     param2: b2
      9 """
     10 tree = load_iteration_tree_from_yaml_file(config_file)
---> 12 index = iteration_tree_to_multiindex(tree)
     14 results = pd.DataFrame(
     15     np.full((len(index), 2), np.nan),
     16     index=index,
     17     columns=["field1", "field2"]
     18 )
     19 results

File ~/checkouts/readthedocs.org/user_builds/phileas/checkouts/latest/phileas/iteration/utility.py:295, in iteration_tree_to_multiindex(tree)
    292     raise ValueError(error_message, datatree) from e
    294 if current_keys != names:
--> 295     raise ValueError(error_message, names, current_keys)
    297 assert isinstance(fd, dict)
    298 tuples.append(tuple(fd[key] for key in names))

ValueError: ('Iteration tree that change shape are not supported.', ['ins_a.param1', 'ins_a.param2', 'ins_b.param1', 'ins_b.param2'], {'ins_a.param2', 'ins_a.param1', 'ins_b.param1', 'ins_b.param2'})

This example is a simple variation of the last one. Notice the iteration method of ins_a: it is a Union, and not a CartesianProduct anymore. Thus, a tabular storage format might be appropriate.

Then, storage of the results is once again done with a modification of the output of flatten_datatree():

 1for config in tree:
 2    flat_config = flatten_datatree(config)
 3    current_index = tuple(flat_config[param] for param in results.index.names)
 4
 5    # Acquire measurements ...
 6    measurements = np.random.rand(2)
 7
 8    results.loc[current_index, "field1"] = measurements[0]
 9    results.loc[current_index, "field2"] = measurements[1]
10
11results