Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track input data generation process in the repo #236

Merged
merged 27 commits into from
Apr 25, 2023
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
2e767ac
Update .gitignore to ony ignore data/*.zip rather than whole director…
willGraham01 Feb 21, 2023
8888e82
Relocate common MATLAB functionality
willGraham01 Feb 21, 2023
d718c35
Clarify what we're generating here
willGraham01 Feb 21, 2023
5d90f8d
Create script for generating test data
willGraham01 Feb 22, 2023
74e8a31
Add a readme so other people know what's going on
willGraham01 Feb 22, 2023
bffb08c
Fix now-broken pathing on MATLAB tests
willGraham01 Feb 22, 2023
b93d2db
Add regenerate_all script
willGraham01 Feb 22, 2023
1f9b147
More generality for the classes
willGraham01 Feb 22, 2023
fa53176
1/2: new tests regenerate input data
willGraham01 Feb 23, 2023
3a3600a
Regen tests now work!
willGraham01 Feb 23, 2023
db765e2
Remove Will's quick-run hack
willGraham01 Feb 23, 2023
252feaa
Update README.md now PoConcept is complete
willGraham01 Feb 23, 2023
11133a4
Mark test_regen for skipping ATM
willGraham01 Feb 24, 2023
87ccfac
Update so the mark is actually applied now
willGraham01 Feb 24, 2023
fcbad8f
Update to use matlabengine for cleaner calls
willGraham01 Feb 24, 2023
ebfab3f
Update README to reflect use of matlabengine. Mark test_regen.py to b…
willGraham01 Feb 24, 2023
4240fa0
Update ci.yml to have pytest ignore test_regen.py on GH runners.
willGraham01 Feb 24, 2023
85b5e80
Force test_regen file ignore rather than test ignore due to MATLABEng…
willGraham01 Feb 27, 2023
dcbc87b
Apply suggestions from code review
willGraham01 Mar 13, 2023
544582e
Add Sam's suggestions from code review (2)
willGraham01 Mar 13, 2023
6943561
Apply suggestions from code review
willGraham01 Apr 21, 2023
a52492d
Apply doc updates that were hanging from GitHub
willGraham01 Apr 21, 2023
86f7cbb
Simpler python plz: request granted
willGraham01 Apr 21, 2023
a209b25
Remove duplicate functions and wrappers
willGraham01 Apr 24, 2023
5aa7cd2
Update paths and move regenerate_all script to top-level
willGraham01 Apr 24, 2023
67a80a9
Function and file renames accordingly
willGraham01 Apr 24, 2023
e202ba1
Update tdms/tests/system/run_system_test.py
willGraham01 Apr 24, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -189,4 +189,4 @@ jobs:
shell: bash
run: |
export OMP_NUM_THREADS=2
pytest ${GITHUB_WORKSPACE}/tdms/tests/system/ -s -x
pytest ${GITHUB_WORKSPACE}/tdms/tests/system/ --ignore=${GITHUB_WORKSPACE}/tdms/tests/system/test_regen.py -s -x
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ html/
# (py)tests and test data
**.pyc
**/.pytest_cache/
tdms/tests/system/data
tdms/tests/system/data/*.zip
**.mat

# text editor files
Expand Down
75 changes: 67 additions & 8 deletions doc/developers.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,19 +222,78 @@ It's good practice, and reassuring for your pull-request reviewers, if new C++ f
The full system tests are written in Python 3, and call the `tdms` executable for known inputs and compare to expected outputs.
We use [pytest](https://docs.pytest.org) and our example data is provided as zip files on [zenodo](https://zenodo.org/).

There are a few [python packages you will need](https://github.com/UCL/TDMS/blob/main/tdms/tests/requirements.txt) before running the tests so run:
There are a few [python packages you will need](https://github.com/UCL/TDMS/blob/main/tdms/tests/requirements.txt) before you are able to run the tests, which can be installed by executing:
```{.sh}
python -m pip install -r tdms/tests/requirements.txt
```
if you don't already have them.
You'll then need to [compile](#compiling) `tdms` with `-DBUILD_TESTING=ON`.
Once compiled, the system tests can be run by invoking `pytest` and pointing it to the `tdms/tests/system` directory.
For example, from the build directory:
```{.sh}
$ pwd
/path/to/repository/TDMS/tdms/build
$ pytest ../tests/system/
```
The [`test_system.py`](https://github.com/UCL/TDMS/blob/main/tdms/tests/system/test_system.py) script runs each system test in sequence.

When you run the tests for the first time, the example data will be downloaded to `tdms/tests/system/data` (which is [ignored by git](https://github.com/UCL/TDMS/blob/main/.gitignore)).
Subsequent runs of the test will not re-download unless you manually delete the zip file.
**[`test_regen.py`](https://github.com/UCL/TDMS/blob/main/tdms/tests/system/test_regen.py) and [`tdms_testing_class.py`](https://github.com/UCL/TDMS/blob/main/tdms/tests/system/tdms_testing_class.py.py) will replace `test_system.py` and `read_config.py` when the [input overhaul](https://github.com/UCL/TDMS/issues/70) is complete.**

A good example of running the `tdms` executable for a given input and expected output is [test_arc01.py](https://github.com/UCL/TDMS/blob/main/tdms/tests/system/test_arc01.py)
When you run the tests for the first time, test data is downloaded to `tdms/tests/system/data` (and will be [ignored by git](https://github.com/UCL/TDMS/blob/main/.gitignore)).
These reference input files contain arrays for: the incident electric field, the computational grid, etc. which are needed by the simulation, and have been generated by a trusted version of the relevant MATLAB scripts.
Subsequent runs of the tests will not re-download unless you manually delete the zip file(s).

You need to [compile](#compiling) `tdms`, then the system tests can be run, e.g. from the build directory:
The system tests for `tdms` are configured with yaml files in the `data/input_generation/` directory.
They are named `config_XX.yaml` where `XX` matches the ID of the system test, which themselves are named `arc_XX` by historical convention.
This should also match the `test_id` field in the configuration file itself.
The _reference outputs_ or _reference data_ are a collection of `.mat` files, produced from the _reference inputs_ by a trusted version of the `tdms` executable.
We test for regression using these reference files.

```{.sh}
pytest ../tests/system/
```
A given system test typically has two calls to the `tdms` executable; one for when there is no scattering object present, and one for when there is some obstacle.
More than two runs in a test might be due to the use of band-limited interpolation over cubic interpolation.
Each call to the executable has a reference input and reference output.
In the scripts, a given execution is called by `tests.utils.run_tdms` which wraps a [subprocess.Popen](https://docs.python.org/3/library/subprocess.html#subprocess.Popen).

#### Workflow of a System Test

The workflow of a particular system test `arc_XX` is:
- Locally generate the reference inputs using `data/input_generation/WHATISTHESCRIPTNAME.py`.
- `arc_XX` fails if its reference input cannot be successfully generated.
This indicates a failure in the scripts and/or functions in the `data/input_generation/{bscan,matlab}` directories.
- Fetch the reference outputs from [Zenodo](https://zenodo.org/record/7440616/files).
- For each run, named `run_id` in `arc_XX`:
- Execute the call to `tdms` corresponding to `run_id`.
- Compare the output of each run to the corresponding reference data.
- `run_id` fails if the output produced differs significantly from the reference data.
- Outputs produced by `run_id` are cleaned up.
- `arc_XX` fails if any one of its runs fail. Failed runs are reported by name.
- Reference inputs are cleaned up.
- `arc_XX` passes if this step is reached successfully.

Due to [licensing issues regarding running `MATLAB` on GitHub runners](https://github.com/matlab-actions/setup-matlab/issues/13), we cannot use `matlabengine` to regenerate the reference input data during CI. (Although we are currently thinking of removing the `MATLAB` dependency which will then enable us to resolve this issue). The work-in-progress `test_regen.py` workflow can still be run locally through `pytest`, however in addition to `requirements.txt` you will also need to [install `matlabengine`](https://uk.mathworks.com/help/matlab/matlab_external/install-the-matlab-engine-for-python.html). See the [MathWorks page](https://uk.mathworks.com/help/matlab/matlab_external/install-the-matlab-engine-for-python.html) link for detailed requirements. You will need (at least) `Python >= 3.8` and a licensed version of `MATLAB`.

### Generating Input Data for the System Tests

The system tests rely on `.mat` input files that are generated through a series of MATLAB function calls and scripts. This directory contains the functionality to automatically (re)generate this input data, which serves two purposes:
- The `.mat` _input_ files do not need to be uploaded and fetched from Zenodo each time the system tests are run. They can be generated locally instead.
- Note that the reference output files corresponding to these inputs still need to be downloaded from Zenodo.
- We track changes to the way we handle inputs to `tdms`, and the system tests. Ensuring we test against unexpected behaviour due to input changes.

#### (Re)generation of the Data

(Re)generating the input data for a particular test case, `arc_XX`, is a three-step process:
1. Determine variables, filenames, and the particular setup of `arc_XX`. This information is stored in the corresponding `config_XX.yaml` file. For example, is an illumination file required? What are the spatial obstacles? What is the solver method?
1. Call the `run_bscan.m` function (and sub-functions in `./matlab`) using the information in `config_XX.yaml` to produce the `.mat` input files. Each test case requires an input file (`input_file_XX.m`) which defines test-specific variables (domain size, number of period cells, material properties, etc) which are too complex to specify in a `.yaml` file.
1. Clean up the auxillary `.mat` files that are generated by this process. In particular, any `gridfiles.mat`, illumination files, or other `.mat` files that are temporarily created when generating the input `.mat` file.

#### Contents of the `data/input_generation` Directory (and subdirectories)

The `run_bscan` function is inside the `bscan/` directory.

The `matlab/` directory contains functions that `run_bscan` will need to call on during the creation of the input data. This in particular includes the `iteratefdtd_matrix` function, which does the majority of the work in setting up gridfiles, illumination files, and the `.mat` inputs themselves.

The `generate_test_input.py` file contains `.py` files that the system tests can invoke to regenerate the input data. Since the system test framework uses `pytest`, but the data generation requires `MATLAB` (for now), we use `Python` to read in and process the information that each test requires, and then call `run_bscan` with the appropriate commands from within Python.

The `regenerate_all.py` file will work through all of the `config_tc.yaml` files in the directory and regenerate the input `.mat` data corresponding to each.

The remaining `config_XX.yaml` and `input_file_XX.m` files are as mentioned in [the previous section](#regeneration-of-the-data). These contain the information about each test that Python and `run_bscan` will need to regenerate the input files.
2 changes: 1 addition & 1 deletion tdms/tests/matlab/test_fdtdduration.m
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@

% Define the tests as all the locally defined functions
function tests = test_iteratefdtd_matrix_function
addpath('../../matlab/', 'data/');
addpath('../system/data/input_generation/matlab', 'data/');
tests = functiontests(localfunctions);
end

Expand Down
Empty file.
File renamed without changes.
Empty file.
46 changes: 46 additions & 0 deletions tdms/tests/system/data/input_generation/bscan/run_bscan.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
function [] = run_bscan(test_directory, input_filename)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%This function generates the files used as input to the executeable
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%% Create directory into which to place the input files, if it doesn't exist already
dir_to_place_input_mats = test_directory;%strcat(test_directory,'/in')
if ~exist(dir_to_place_input_mats, 'dir')
mkdir(dir_to_place_input_mats);
end

%% Generate the input file

%start by defining the coordinates of the computational grid
[x,y,z,lambda] = fdtd_bounds(input_filename);

%15 micron radius cylinder
rad = 15e-6;
%refractive index of cylinder
refind = 1.42;

%insert a cylinder at the origin
y = 0;
[X,Y,Z] = ndgrid(x,y,z);

%generate scattering matrix
I = zeros(size(X));
%set all Yee cells within the cylinder to have index of 1
I( (X.^2 + Z.^2) < rad^2 ) = 1;
I( (end-3):end,1,:) = 0;
I( :, 1, (end-3):end) = 0;
inds = find(I(:));
[ii,jj,kk] = ind2sub(size(I), inds);
composition_matrix = [ii jj kk ones(size(ii))];
material_matrix = [1 refind^2 1 0 0 0 0 0 0 0 0];

save('gridfile_cyl', 'composition_matrix', 'material_matrix');
%setup free space matrix and save
composition_matrix = [];
save('gridfile_fs', 'composition_matrix', 'material_matrix');

%generate tdms executable input files
iteratefdtd_matrix(input_filename,'filesetup',strcat(dir_to_place_input_mats,'/pstd_cyl_input'),'gridfile_cyl.mat','');
iteratefdtd_matrix(input_filename,'filesetup',strcat(dir_to_place_input_mats,'/pstd_fs_input'),'gridfile_fs.mat','');

end
willGraham01 marked this conversation as resolved.
Show resolved Hide resolved
20 changes: 20 additions & 0 deletions tdms/tests/system/data/input_generation/config_01.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
test_id: '01'
tests:
fs_bli:
input_file: pstd_fs_input.mat
reference: pstd_fs_bli_reference.mat
fs_cubic:
input_file: pstd_fs_input.mat
reference: pstd_fs_cubic_reference.mat
cubic_interpolation: True
cyl_bli:
input_file: pstd_cyl_input.mat
reference: pstd_cyl_bli_reference.mat
cyl_cubic:
input_file: pstd_cyl_input.mat
reference: pstd_cyl_cubic_reference.mat
cubic_interpolation: True
input_generation:
input_file: input_file_01.m
spatial_obstacles: ["fs", "cyl"]
illumination_input_file:
102 changes: 102 additions & 0 deletions tdms/tests/system/data/input_generation/generate_test_input.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
import os
from glob import glob
from pathlib import Path

import matlab.engine as matlab
import yaml
from matlab.engine import MatlabEngine

LOCATION_OF_THIS_FILE = os.path.dirname(os.path.abspath(__file__))
# Additional options for running matlab on the command-line
MATLAB_OPTS_LIST = ["-nodisplay", "-nodesktop", "-nosplash", "-r"]
MATLAB_STARTUP_OPTS = " ".join(MATLAB_OPTS_LIST)
# Paths to matlab functions not in LOCATION_OF_THIS_FILE
MATLAB_EXTRA_PATHS = [
os.path.abspath(LOCATION_OF_THIS_FILE + "/bscan"),
os.path.abspath(LOCATION_OF_THIS_FILE + "/matlab"),
]


def run_bscan(
test_directory: Path | str, input_filename: Path | str, engine: MatlabEngine
) -> None:
"""Wrapper for running the run_bscan MATLAB function in the MATLAB engine provided.

MatlabEngine cannot parse Path objects so file and directory paths must be cast to string when calling.

The bscan/ and matlab/ directories are assumed to already be in the
includepath of the engine instance, so that the run_bscan and supporting
MATLAB files can be called.
"""
# function [] = run_bscan(test_directory, input_filename)
engine.run_bscan(str(test_directory), str(input_filename), nargout=0)
return


def start_MatlabEngine_with_extra_paths(
working_directory: str | Path | None = None,
) -> MatlabEngine:
"""Starts a new MatlabEngine and adds the bscan/ and matlab/ folders to its path, which are required to be in scope when regenerating the input data.

:param working_directory: The working directory to start the MatlabEngine in. Should be an absolute path. Defaults to the working directory of the currently executing script if not passed.
:returns: MatlabEngine instance with the additional bscan/ and matlab/ files on the MATLABPATH.
"""
engine = matlab.start_matlab(MATLAB_STARTUP_OPTS)
# Change to requested working directory if provided
if working_directory:
engine.cd(str(working_directory))
# Append tdms scripts and functions to MATLABPATH
for path in MATLAB_EXTRA_PATHS:
engine.addpath(path)
return engine


def generate_test_input(
config_filepath: Path | str, engine: MatlabEngine | None = None
) -> None:
"""(re)Generates the input data (.mat files) contained in the config file, using the MATLAB session provided.

This function is equivalent to running the run_{pstd,fdtd}_bscan.m scripts on the (test corresponding to the) config file in question.

:param config_filepath: The path to the config file containing information about this system test
:param engine: The MATLAB session to run the run_bscan function within. A session will be created and quit() if one is not provided.
"""
with open(config_filepath, "r") as file:
config_data = yaml.safe_load(file)

# ID of the test we are generating input data for
test_id = config_data["test_id"]
# Absolute path to the directory into which the input data should be placed
test_dir = Path(LOCATION_OF_THIS_FILE, "arc_" + test_id)
# Ensure that the directory to place the output into exists, or create it otherwise
if not test_dir.exists():
print(f"The Path {test_dir} does not exist - creating now")
os.mkdir(test_dir)
elif not test_dir.is_dir():
raise RuntimeError(f"{test_dir} is not a directory!")
# else: the directory already exists, we don't need to do anything

# Extract necessary input data generation information
generation_info = config_data["input_generation"]
# Fetch the location of the input file that generates the binary .mat input
input_file = Path(LOCATION_OF_THIS_FILE, generation_info["input_file"])
if not input_file.exists():
raise RuntimeError(f"{input_file} does not exist")
# Fetch the spatial obstacles
obstacles = generation_info["spatial_obstacles"]

# Determine if we need to create our own MATLAB session
# Explicit instance check since MatlabEngine may not have implicit casts/ interpretations
engine_provided = isinstance(engine, MatlabEngine)
if not engine_provided:
# Start a new Matlab engine operating in the test directory
engine = start_MatlabEngine_with_extra_paths(working_directory=test_dir)

run_bscan(test_dir, input_file, engine)

# Quit our temporary MATLAB session, if we started one
if not engine_provided:
engine.quit()
# Cleanup auxillary .mat files that are placed into this directory
for aux_mat in sorted(glob(LOCATION_OF_THIS_FILE + "/*.mat")):
os.remove(aux_mat)
Loading