Storing results#

In most experiments, a IterationTree is used to represent the configurations of its instruments. For each of them, one or multiple measurements are carried out. They should then be stored, alongside the corresponding configuration.

Phileas can be used to configure the instruments. However, it is not responsible for carrying out the measurements and handling their results, which should be done by the user. This includes storing the results of the experiment. Yet, Phileas provides some utility functions that can be used to prepare dataframes for storing those results.

Gridded datasets with xarray #

If the experiment parameter space is a cartesian product of each of the instruments’ parameters’ space, then a gridded dataset is appropriate to store the results of the experiment. xarray datasets are suitable for this purpose.

The function iteration_tree_to_xarray_parameters() can be used to initialize a Dataset or DataArray. Given an iteration tree, it returns a tuple that can be used to define the coordinates of the dataset. The user now just needs to define the number and types of the measurements.

In this simple example, the experiment uses two instruments. The parameter space contains three dimensions, corresponding to the two parameters of ins_a and the single parameter of ins_b. The returned coordinates are named "ins_a.param1", "ins_a.param2", "ins_b.param2", each containing the corresponding parameter values. The user wants to carry out two measurements for each configuration, containing float values.

Tabular datasets with pandas #

Alternatively, you can chose to store measurements in a tabular dataset, like the DataFrame provided by pandas. This can be convenient when the experiment parameter space is not a cartesian product.

The iteration_tree_to_multiindex() function builds aMultiIndex from an IterationTree. It can then be used as a dataframe index.

config_file = """
ins_a: !union
    _reset: last
    param1: !sequence [a1-0, a1-1, a1-2]
    param2: !sequence [a2-0, a2-1, a2-2]
ins_b:
    param1: !sequence [b1-0, b1-1, b1-2]
    param2: b2
"""
tree = load_iteration_tree_from_yaml_file(config_file)

index = iteration_tree_to_multiindex(tree)

results = pd.DataFrame(
    np.full((len(index), 2), np.nan),
    index=index,
    columns=["field1", "field2"]
)
results

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[5], line 12
      1 config_file = """
      2 ins_a: !union
      3     _reset: last
   (...)
      8     param2: b2
      9 """
     10 tree = load_iteration_tree_from_yaml_file(config_file)
---> 12 index = iteration_tree_to_multiindex(tree)
     14 results = pd.DataFrame(
     15     np.full((len(index), 2), np.nan),
     16     index=index,
     17     columns=["field1", "field2"]
     18 )
     19 results

File ~/checkouts/readthedocs.org/user_builds/phileas/checkouts/latest/phileas/iteration/utility.py:295, in iteration_tree_to_multiindex(tree)
    292     raise ValueError(error_message, datatree) from e
    294 if current_keys != names:
--> 295     raise ValueError(error_message, names, current_keys)
    297 assert isinstance(fd, dict)
    298 tuples.append(tuple(fd[key] for key in names))

ValueError: ('Iteration tree that change shape are not supported.', ['ins_a.param1', 'ins_a.param2', 'ins_b.param1', 'ins_b.param2'], {'ins_a.param2', 'ins_a.param1', 'ins_b.param1', 'ins_b.param2'})

This example is a simple variation of the last one. Notice the iteration method of ins_a: it is a Union, and not a CartesianProduct anymore. Thus, a tabular storage format might be appropriate.

Then, storage of the results is once again done with a modification of the output of flatten_datatree():

for config in tree:
    flat_config = flatten_datatree(config)
    current_index = tuple(flat_config[param] for param in results.index.names)

    # Acquire measurements ...
    measurements = np.random.rand(2)

    results.loc[current_index, "field1"] = measurements[0]
    results.loc[current_index, "field2"] = measurements[1]

results

Storing results#

Gridded datasets with xarray#

Tabular datasets with pandas#

This Page

Gridded datasets with xarray #

Tabular datasets with pandas #