--- jupytext: text_representation: extension: .md format_name: myst format_version: 0.13 jupytext_version: 1.17.1 kernelspec: name: python3 display_name: Python 3 (ipykernel) language: python --- # Storing results ```{code-cell} ipython3 :tags: [remove-cell] from pprint import pprint import numpy as np import xarray as xr import pandas as pd from phileas.iteration.utility import ( flatten_datatree, iteration_tree_to_multiindex, iteration_tree_to_xarray_parameters, ) from phileas.parsing import load_iteration_tree_from_yaml_file ``` In most experiments, a {py:class}`~phileas.iteration.IterationTree` is used to represent the configurations of its instruments. For each of them, one or multiple measurements are carried out. They should then be stored, alongside the corresponding configuration. Phileas can be used to configure the instruments. However, it is not responsible for carrying out the measurements and handling their results, which should be done by the user. This includes storing the results of the experiment. Yet, Phileas provides some utility functions that can be used to prepare dataframes for storing those results. ## Gridded datasets with [xarray](https://docs.xarray.dev/en/stable/index.html) If the experiment parameter space is a cartesian product of each of the instruments' parameters' space, then a gridded dataset is appropriate to store the results of the experiment. [xarray](https://docs.xarray.dev/en/stable/index.html) datasets are suitable for this purpose. The function {py:func}`~phileas.iteration.utility.iteration_tree_to_xarray_parameters` can be used to initialize a {py:class}`~xarray.Dataset` or {py:class}`~xarray.DataArray`. Given an iteration tree, it returns a tuple that can be used to define the coordinates of the dataset. The user now just needs to define the number and types of the measurements. ```{code-cell} ipython3 config_file = """ ins_a: param1: !sequence [a1-0, a1-1, a1-2] param2: !sequence [a2-0, a2-1, a2-2] ins_b: param1: !sequence [b1-0, b1-1, b1-2] param2: b2 """ tree = load_iteration_tree_from_yaml_file(config_file) coords, dims_name, dims_shape = iteration_tree_to_xarray_parameters(tree) results = xr.Dataset( data_vars=dict( field1=(dims_name, np.full(dims_shape, np.nan)), field2=(dims_name, np.full(dims_shape, np.nan)), ), coords=coords ) results ``` In this simple example, the experiment uses two instruments. The parameter space contains three dimensions, corresponding to the two parameters of `ins_a` and the single parameter of `ins_b`. The returned coordinates are named `"ins_a.param1"`, `"ins_a.param2"`, `"ins_b.param2"`, each containing the corresponding parameter values. The user wants to carry out two measurements for each configuration, containing `float` values. :::{seealso} If the measurements dataset is sparse, *ie.* if only a few measurements are carried out, using numpy arrays is not optimal. Instead, consider using [sparse](http://sparse.pydata.org) containers. ::: Modifying `results` requires using another utility function: {py:func}`~phileas.iteration.utility.flatten_datatree`, which converts a hierarchical data tree to a flat one that can be used for xarray indexing. Thus, the acquisition loop usually has this structure: ```{code-cell} ipython3 for config in tree: flat_config = { coord: v for coord, v in flatten_datatree(config).items() if coord in dims_name } # Acquire measurements ... measurements = np.random.rand(2) results.field1.loc[flat_config] = measurements[0] results.field2.loc[flat_config] = measurements[1] results ``` Notice how the output of {py:func}`~phileas.iteration.utility.flatten_datatree` is filtered at line 4: the goal is to keep indexed coordinates only. Indeed, xarray refuses to use non-indexed coordinates for indexing. ```{code-cell} ipython3 :tags: [hide-input] print("config:") pprint(config) print("\nflat_config:") pprint(flat_config) ``` ## Tabular datasets with [pandas](http://pandas.pydata.org) Alternatively, you can chose to store measurements in a tabular dataset, like the {py:class}`~pandas.DataFrame` provided by pandas. This can be convenient when the experiment parameter space is not a cartesian product. The {py:func}`~phileas.iteration.utility.iteration_tree_to_multiindex` function builds a{py:class}`~pandas.MultiIndex` from an {py:class}`~phileas.iteration.IterationTree`. It can then be used as a dataframe index. ```{code-cell} ipython3 config_file = """ ins_a: !union _reset: last param1: !sequence [a1-0, a1-1, a1-2] param2: !sequence [a2-0, a2-1, a2-2] ins_b: param1: !sequence [b1-0, b1-1, b1-2] param2: b2 """ tree = load_iteration_tree_from_yaml_file(config_file) index = iteration_tree_to_multiindex(tree) results = pd.DataFrame( np.full((len(index), 2), np.nan), index=index, columns=["field1", "field2"] ) results ``` This example is a simple variation of the last one. Notice the iteration method of `ins_a`: it is a {py:class}`~phileas.iteration.Union`, and not a {py:class}`~phileas.iteration.CartesianProduct` anymore. Thus, a tabular storage format might be appropriate. Then, storage of the results is once again done with a modification of the output of {py:func}`~phileas.iteration.utility.flatten_datatree`: ```{code-cell} ipython3 for config in tree: flat_config = flatten_datatree(config) current_index = tuple(flat_config[param] for param in results.index.names) # Acquire measurements ... measurements = np.random.rand(2) results.loc[current_index, "field1"] = measurements[0] results.loc[current_index, "field2"] = measurements[1] results ```