-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track input data generation process in the repo #236
Conversation
I like this more than #229, indeed. Thanks for the (moderately) drastic reworking ❤️ . You might be able to avoid some of your python classes altogether. You can take advantage of python's implicit tuple (un)packing and maybe parameters as a vanilla python dictionary. |
The bscan/run_bscan.m file contains the matlab function which generates the input data. Regrettably, we need to specify particular inputs to this script for each test, which requires us to translate the argument values as read from the config.yaml file into a long string of values in the correct order, which can in turn be called from matlab. | ||
|
||
BScanArguments is essentially a glorified dictionary, it's members sharing the names of the input arguments to run_bscan. It's create_bscan_argument can be used to convert the values that need to be passed into a string of the form: | ||
run_bscan(input_arguments_in_the_correct_order). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does black
allow this? I'm expecting a forced wrap at 88.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, as I said in person: no big structural problems with this, but I do have niggly comments.
The main ones:
- Can haz simpler python pls?
Simple is better than complex
- I think tests/system/README.md should be integrated into doc/developers.md.
- I'm pretty sure
sys.path.insert
followed by furtherimport
s is bad python style ([citation needed]).
# Run the test workflow, for this test | ||
workflow(test_id) | ||
# End of system test | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole file is a lot of WET from test_system.test_system
and test_system.workflow
👎 . Why is it duplicated?
If we really need a second function, I'd rename it and merge the comments into the docstring. But you don't need to repeat your code. Just import the functionality from test_system
and wrap the extra setup (or whatever).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I anticipated that test_system
and test_regen
will diverge very quickly as more tests (requiring different input methods) are added, which they do: final version of test_regen
is here. I'd argue that the workflows are sufficiently divergent as to warrant two separate files, but there are some other supporting reasons:
test_regen
's workflow is already different totest_system
's. Additionally, thepytest
parametrisation also changes to use theconfig
files rather than relying on the manually-typedZenodo
listworkflow
exists to provide some distinction between the fetch-request for the data and the actual steps in the test. Granted, not the most informative name, but I would want to separate it from the setup steps regardless.test_system.py
(and it's supporting classes) will be made obsolete if we ever convert the system tests entirely into Python or get MATLAB to behave on runners. In which case, I'd rather havetest_regen
working stand-alone so we don't get surprised when other files get deleted. (Also applies to a couple of the other classes in the files below)
Although I'm aware that I'm very much grounding my "justification" in the future. Importing from test_system
would've been nicer to look at, at this stage of the chain.
"""One run, or execution, of the TDMS executable. That is, one call to | ||
|
||
tdms [OPTIONS] [input_file] [gridfile] [output_file] | ||
|
||
that is required as part of a single system test. | ||
The run() command executes the above call to TDMS | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need another whole class to wrap this? I'm lost as to why what we had before is insufficient.
We now have TDMSRun
wrapping utils.run_tdms
wrapping a relatively simple subprocess.Popen
. Can we stick with utils.run_tdms
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd much rather reduce the wrapping by having what is presently the content of utils.run_tdms
as part of this class' method. We only use utils.run_tdms
in test_system
and test_regen
(technically in the classes they rely on), one of which we want to get rid of eventually.
Besides run_tdms
, utils.py
itself only really contains:
- The
H5File
class relative_mean_squared_difference
function (which is only called once, within theH5File
class)work_in_zipped_dir
, which onlytest_system
usesdownload_data
, whichtest_regen
&test_system
use.
So I'd rather breakutils
into a separate files:H5File
.py
file,- Move
download_data
intotest_regen
(ortest_system
?) - Move the others into
test_system
so they disappear with it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid, I'm not convinced. But you can at least make this a dataclasses.dataclass
and use minimal post_init to check the file exists.
I would probably err on the side of having Path
s rather than Union[Path, str]
. Just because less code is better and we're the only people using this.
(Tangential: I modern python allows, and maybe even prefers, the |
or... flibble: Path | str
).
@dataclass | ||
class TDMSRunAndReference: | ||
tdms_run: TDMSRun # The object that handles the tdms call itself | ||
ref_data: str # The name of the reference data within the .zip archive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't I have a plain python tuple?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can, but it makes this typedef look nasty and on this line, I'd rather write
run.tdms_run.run()
# OK, maybe I should name things
# run_and_ref.tdms_run.run()
# instead
over
run[0].run()
Because the latter looks like I've forgotten to loop through an index of runs, which I'm not doing here.
class TDMSSystemTest: | ||
"""Instance of one tdms system test. This object controls all the tdms executations that required for the system test arc_{test_id} to run, and handles the setup and tear-down of each run. | ||
|
||
The runs themselves are performed by TDMSRun instances. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐢
🐢
🐢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still uneasy that there are now three (four?) python classes to wrap: calling MATLAB to generate an input array, and calling tdms.
... sorry for the slow review 😔
…e skipped by pytest on CI
Co-authored-by: Sam Cunliffe <[email protected]>
- Move content of tdms/tests/system/README.md into doc/developers.md to avoid duplication - Remove unnecessary spaces in run_bscan script - Type-hinting guards in BScanArguments (cast to str to account for Path inputs) - Remove _stop_engine method and just write the .quit() command explicitly - Proper imports with __init__.py's in test_regen.py
Co-authored-by: Sam Cunliffe <[email protected]>
7967bbd
to
5aa7cd2
Compare
💪 |
Have rebased this guy onto
Have included actual links here since the old comments will be out of date with the combination of a rebase + conflict fix + class -> function reworking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for all of the back and forth. And sorry again for the epic review.
Also, I managed to avoid conflicts in the merge of #265. ... I leave you the joy of clicking merge. |
Fix typo in refactoring
Closes #240 if the branch chain below is completely merged into
main
|Part of a series of PRs that progressively adds the system tests to the new workflow piece by piece:
main
<- #236 <- #241 <- #242 <- #243 <- #244Ongoing conversations:
test_regen
importing fromtest_system
dataclass
Changes
Introduces the new workflow (for a system test) of:
.mat
filestdms
executable.mat
outputs to the reference material from ZenodoThis is just a PoC for now, so only
arc_01
has the updated workflow (and associated files) currently.tdms/tests/system/data/input_generation
directory has been created. Thesrc/matlab
folder has been moved to be thematlab/
subdirectory of this new directory, since these functions are only used in the generation of the input files.tdms/tests/system
to outline the system test workflowgenerate_test_input.py
is a python file that defines the objects which will allow the system tests (written in Python) to call the necessaryMATLAB
scripts/functions and regenerate the input data.There are also two futher files,
config_01.yaml
andinput_file_01.m
, whose purpose is explained in the README. Theconfig_01.yaml
files can also double-up as the config files thattest_system.py
already reads from when working in the zipped directories.Testing
Sadly, #239 and #238 show that it is impossible for us to regenerate the
.mat
inputs on GitHub runners all the while MATLAB files are required. As such, we are forced to make use of the currenttest_system.py
andread_config.py
supporting files.However,
test_regen.py
and the supportingtdms_testing_class.py
are available in the repository to be run locally on a machine that has the requirements installed. These can be run locally before pushing whenever we need to alter the format of the inputs totdms
themselves.As we remove the dependency on MATLAB (#70), one of the task might be to translate the input generation into Python anyway. In which case,
test_system.py
andread_config.py
would be made redundant.