Storing results#
In most experiments, a IterationTree is used to
represent the configurations of its instruments. For each of them, one or
multiple measurements are carried out. They should then be stored, alongside
the corresponding configuration.
Phileas can be used to configure the instruments. However, it is not responsible for carrying out the measurements and handling their results, which should be done by the user. This includes storing the results of the experiment. Yet, Phileas provides some utility functions that can be used to prepare dataframes for storing those results.
Gridded datasets with xarray#
If the experiment parameter space is a cartesian product of each of the instruments’ parameters’ space, then a gridded dataset is appropriate to store the results of the experiment. xarray datasets are suitable for this purpose.
The function
iteration_tree_to_xarray_parameters() can
be used to initialize a Dataset or
DataArray. Given an iteration tree, it returns a tuple that
can be used to define the coordinates of the dataset. The user now just needs
to define the number and types of the measurements.
1config_file = """
2ins_a:
3 param1: !sequence [a1-0, a1-1, a1-2]
4 param2: !sequence [a2-0, a2-1, a2-2]
5ins_b:
6 param1: !sequence [b1-0, b1-1, b1-2]
7 param2: b2
8"""
9tree = load_iteration_tree_from_yaml_file(config_file)
10
11coords, dims_name, dims_shape = iteration_tree_to_xarray_parameters(tree)
12results = xr.Dataset(
13 data_vars=dict(
14 field1=(dims_name, np.full(dims_shape, np.nan)),
15 field2=(dims_name, np.full(dims_shape, np.nan)),
16 ),
17 coords=coords
18)
19results
<xarray.Dataset> Size: 584B
Dimensions: (ins_a.param1: 3, ins_a.param2: 3, ins_b.param1: 3)
Coordinates:
* ins_a.param1 (ins_a.param1) <U4 48B 'a1-0' 'a1-1' 'a1-2'
* ins_a.param2 (ins_a.param2) <U4 48B 'a2-0' 'a2-1' 'a2-2'
* ins_b.param1 (ins_b.param1) <U4 48B 'b1-0' 'b1-1' 'b1-2'
ins_b.param2 <U2 8B 'b2'
Data variables:
field1 (ins_a.param1, ins_a.param2, ins_b.param1) float64 216B nan...
field2 (ins_a.param1, ins_a.param2, ins_b.param1) float64 216B nan...In this simple example, the experiment uses two instruments. The parameter space
contains three dimensions, corresponding to the two parameters of
ins_a and the single parameter of ins_b. The returned
coordinates are named "ins_a.param1", "ins_a.param2",
"ins_b.param2", each containing the corresponding parameter values.
The user wants to carry out two measurements for each configuration, containing
float values.
See also
If the measurements dataset is sparse, ie. if only a few measurements are carried out, using numpy arrays is not optimal. Instead, consider using sparse containers.
Modifying results requires using another utility function:
flatten_datatree(), which converts a
hierarchical data tree to a flat one that can be used for xarray indexing.
Thus, the acquisition loop usually has this structure:
1for config in tree:
2 flat_config = {
3 coord: v for coord, v in flatten_datatree(config).items()
4 if coord in dims_name
5 }
6
7 # Acquire measurements ...
8 measurements = np.random.rand(2)
9
10 results.field1.loc[flat_config] = measurements[0]
11 results.field2.loc[flat_config] = measurements[1]
12
13results
<xarray.Dataset> Size: 584B
Dimensions: (ins_a.param1: 3, ins_a.param2: 3, ins_b.param1: 3)
Coordinates:
* ins_a.param1 (ins_a.param1) <U4 48B 'a1-0' 'a1-1' 'a1-2'
* ins_a.param2 (ins_a.param2) <U4 48B 'a2-0' 'a2-1' 'a2-2'
* ins_b.param1 (ins_b.param1) <U4 48B 'b1-0' 'b1-1' 'b1-2'
ins_b.param2 <U2 8B 'b2'
Data variables:
field1 (ins_a.param1, ins_a.param2, ins_b.param1) float64 216B 0.7...
field2 (ins_a.param1, ins_a.param2, ins_b.param1) float64 216B 0.4...Notice how the output of
flatten_datatree() is filtered at line 4:
the goal is to keep indexed coordinates only. Indeed, xarray refuses to use
non-indexed coordinates for indexing.
config:
{'ins_a': {'param1': 'a1-2', 'param2': 'a2-2'},
'ins_b': {'param1': 'b1-2', 'param2': 'b2'}}
flat_config:
{'ins_a.param1': 'a1-2', 'ins_a.param2': 'a2-2', 'ins_b.param1': 'b1-2'}
Tabular datasets with pandas#
Alternatively, you can chose to store measurements in a tabular dataset, like
the DataFrame provided by pandas. This can be convenient
when the experiment parameter space is not a cartesian product.
The iteration_tree_to_multiindex() function
builds aMultiIndex from an
IterationTree. It can then be used as a
dataframe index.
1config_file = """
2ins_a: !union
3 _reset: last
4 param1: !sequence [a1-0, a1-1, a1-2]
5 param2: !sequence [a2-0, a2-1, a2-2]
6ins_b:
7 param1: !sequence [b1-0, b1-1, b1-2]
8 param2: b2
9"""
10tree = load_iteration_tree_from_yaml_file(config_file)
11
12index = iteration_tree_to_multiindex(tree)
13
14results = pd.DataFrame(
15 np.full((len(index), 2), np.nan),
16 index=index,
17 columns=["field1", "field2"]
18)
19results
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[5], line 12
1 config_file = """
2 ins_a: !union
3 _reset: last
(...)
8 param2: b2
9 """
10 tree = load_iteration_tree_from_yaml_file(config_file)
---> 12 index = iteration_tree_to_multiindex(tree)
14 results = pd.DataFrame(
15 np.full((len(index), 2), np.nan),
16 index=index,
17 columns=["field1", "field2"]
18 )
19 results
File ~/checkouts/readthedocs.org/user_builds/phileas/checkouts/stable/phileas/iteration/utility.py:295, in iteration_tree_to_multiindex(tree)
292 raise ValueError(error_message, datatree) from e
294 if current_keys != names:
--> 295 raise ValueError(error_message, names, current_keys)
297 assert isinstance(fd, dict)
298 tuples.append(tuple(fd[key] for key in names))
ValueError: ('Iteration tree that change shape are not supported.', ['ins_a.param1', 'ins_a.param2', 'ins_b.param1', 'ins_b.param2'], {'ins_a.param2', 'ins_a.param1', 'ins_b.param1', 'ins_b.param2'})
This example is a simple variation of the last one. Notice the iteration method
of ins_a: it is a Union, and not a
CartesianProduct anymore. Thus, a tabular
storage format might be appropriate.
Then, storage of the results is once again done with a modification of the
output of flatten_datatree():
1for config in tree:
2 flat_config = flatten_datatree(config)
3 current_index = tuple(flat_config[param] for param in results.index.names)
4
5 # Acquire measurements ...
6 measurements = np.random.rand(2)
7
8 results.loc[current_index, "field1"] = measurements[0]
9 results.loc[current_index, "field2"] = measurements[1]
10
11results