Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect bounds checking on conductive_aux.rho_* #350

Open
prmunro opened this issue Jan 16, 2024 · 22 comments
Open

Incorrect bounds checking on conductive_aux.rho_* #350

prmunro opened this issue Jan 16, 2024 · 22 comments
Assignees
Labels
bug Something isn't working

Comments

@prmunro
Copy link
Collaborator

prmunro commented Jan 16, 2024

Describe the bug
When the multilayer variable in the input file is not set to [], there are two errors, one within iteratefdtd_matrix.m and one within the TDMS executeable. The first error (iteratefdtd_matrix) is due to a check to ensure the entries in multilayer increase monotomically, however, this test was specified incorrectly. The second error assumes that resulting data structures (conductive_aux.rho_x and conductive_aux.rho_y) should be vectors, which is incorrect.

To Reproduce

  1. Run gen_input_file.m in the attached, path to TDMS will need updating.
  2. This will generate an error in iteratefdtd_matrix.m, which may be fixed using the following change:

line 299 of iteratefdtd_matrix.m, currently has:

elseif all(sort(multilayer)==multilayer)

whereas is should be

elseif ~all(sort(multilayer)==multilayer)
  1. With the above correction made, tdms can be run, which will generate this error:
terminate called after throwing an instance of 'std::runtime_error'
  what():  Incorrect dimension conductive_aux.rho_x. Required 1D

Inspecting the mat file produced by iteratefdtd_matrix reveals:

>> dat.conductive_aux

ans = 

  struct with fields:

    rho_x: [41x61 double]
    rho_y: [41x61 double]
    rho_z: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... ]

meaning that rho_x and rho_y are not vectors, which is the expected behavior. Looking at the original version of the PSTD/FDTD code, we see the following error checking:


  /*Get conductive_aux */

  if(mxIsStruct(prhs[input_counter])){
    num_fields = mxGetNumberOfFields(prhs[input_counter]);
    if(num_fields != 3){
      sprintf(message_buffer, "conductive_aux should have 3 members, it has %d",num_fields);
      mexfprintf(message_buffer);
    }
    for(int i=0;i<3;i++){
      element = mxGetField( (mxArray *)prhs[input_counter], 0, conductive_aux_elements[i]);
      ndims = mxGetNumberOfDimensions(element);
      if( ndims == 2 ){
	dimptr_out = mxGetDimensions(element);
	if( !(dimptr_out[0] == 1 ||dimptr_out[0] == 0) ){ 
	  //sprintf(message_buffer, "Incorrect dimension on conductive_aux.%s",conductive_aux_elements[i]);
	  //mexfprintf(message_buffer);
	}
      }
      if(!strcmp(conductive_aux_elements[i],"rho_x")){
	rho_x = mxGetPr(element);
      }
      else if(!strcmp(conductive_aux_elements[i],"rho_y")){
	rho_y = mxGetPr(element);
      }
      else if(!strcmp(conductive_aux_elements[i],"rho_z")){
	rho_z = mxGetPr(element);
      }
      else{
	sprintf(message_buffer, "element conductive_aux.%s not handled",conductive_aux_elements[i]);
	mexfprintf(message_buffer);
      }
    }
  }
  else{
    sprintf(message_buffer, "Argument %d was expected to be a structure (conductive_aux)",input_counter);
    mexfprintf(message_buffer);
  }
  input_counter++;
  /*Get conductive_aux */
************************************************************** 

which does not require elements to be vectors. I am happy to not check the dimension of these variables since they should be set correctly in iteratefdtd_matrix

input_file_ml.zip

This bug is platform independent.

@prmunro prmunro added the bug Something isn't working label Jan 16, 2024
@willGraham01
Copy link
Collaborator

Hi Peter - Sam and I are taking a look at this today.

We've managed to get TDMS to run without an error (and without editing the C++/executable code) after applying your suggested change to iteratefdtd_matrix: the following output file is produced. @samcunliffe & I of course don't know if this is the output you'd expect, so please let us know if that output is wrong.

output-on-main-with-ifdtd-patch.zip

This output the result of running the "development version" of TDMS - it includes the partial switch from the MATLAB reader to the HDF5 reader. It seems that these changes (which aren't in version 1.0.1) at least prevent the segmentation fault (likely because we're no longer relying on MATLAB-allocated memory) - but as I mentioned we don't know if it actually fixes the bug or just prevents the seg-fault and produces an incorrect output!

@samcunliffe
Copy link
Member

Hey @prmunro,

We travelled back in time to the very first version of the code you gave us in Jan 2022, and we think the above output file is consistent with what was being done back then.

If the above file is wrong, could you give us an example pair of input/output files that you are confident in the output's correctness, please?

@prmunro
Copy link
Collaborator Author

prmunro commented Jan 22, 2024

Hi @samcunliffe , sorry for the delay in responding. I'll have a look at the development version, however, the main problem with v 1.0.1, or at least the initial problem, is that it is generating an error when dat.conductive_aux.rho_x and dat.conductive_aux.rho_y are matrices. Are you able to see in v 1.0.1 that this is occuring?

@willGraham01
Copy link
Collaborator

Hi @prmunro,

is that it is generating an error when dat.conductive_aux.rho_x and dat.conductive_aux.rho_y are matrices. Are you able to see in v 1.0.1 that this is occuring?

Yes we are able to see this occurring in version 1.0.1 on our machines too. We also have an idea as to why this bug/error occurs, given the fact that the "development" version is working and 1.0.1 is not:

rho_x is read into a "flattened vector", but it looks like the TDMS executable is setup so as to expect the input to have already been flattened by MATLAB. EG If rho_x is 5-by-10, tdms is expecting to read a 50-by-1 input from the file (which incidentally is why rho_x & rho_y being row vectors has not thrown the error). This is assuming that the output from the development version (attached above) is correct - otherwise, it is likely a slightly more complex error in how the MATLAB memory is being read.

@prmunro
Copy link
Collaborator Author

prmunro commented Jan 22, 2024

Hi @willGraham01 , my understanding is that TDMS should not be expecting a 50-by-1 vector. I think in the first version of the code, rho_x could be, for exampl, 5-by-10, and TDMS was also expecting that.

Please tell me if I'm missing your point above :-)

@willGraham01
Copy link
Collaborator

willGraham01 commented Jan 22, 2024

I think in the first version of the code, rho_x could be, for exampl, 5-by-10, and TDMS was also expecting that.

Hi Peter - Sam & I came to this conclusion as well; apologies my explanation above wasn't super clear.

The short version is: we (incorrectly) added a check to rho_x/y that they should be vectors, as we were reorganising the code. We didn't catch this because there is no test data on CI that tries to use these values as matrices. My hypothesis: removing this "check" (which is what the development version effectively does) once again lets TDMS run as it did before. Removing the check in version 1.0.1's code I also think would do the same thing (but haven't tested it).

The long version:

TDMS originally read the (say 5-by-10) matrix into a pointer (float* to be precise), and then manually worked out the appropriate indices to respect the "shape" that's lost when using a single-index pointer. So in the code, rho_x is only ever accessed as rho_x[array_index], where array_index is something like K_tot * k + j mapping to the original (k,j) index in the 5-by-10 structure.

Flash forward to our first steps refactoring / reorganisation of the code. rho_x is still read into a flattened vector, but it now contains a hard check for the input data to be 1D (see here for the offending function and its "error check"). This has slipped under the radar because none of the tests give rho_x as a 2D array. However, rho_x is still indexed in the same way (array_ind = K_tot * k + j, etc) - so I suspect "removing" this error check will result in the original behaviour coming back (though haven't tried this myself). This is why v1.0.1 throws the error.

Now on the "development version" of TDMS, this bug is not seen because we use the HDF5 reader functions. In particular, we replaced the individual "read vector/matrix/tensor" etc methods with a general "read input data and convert to a flattened array". As such, whatever shape rho_x comes in as (vector or matrix) no longer matters from the point of view of the reader, and thus TDMS runs and produces an output. Again, rho_x is still being indexed in the same way, it's just the reader that has changed but the output should be consistent with the original.

@prmunro
Copy link
Collaborator Author

prmunro commented Feb 5, 2024

Thanks for the above, sorry it's taking me ages to get to this.

Can you please remind me how to access the development version of TDMS?

@willGraham01
Copy link
Collaborator

Hi Peter,

The development version of TDMS is the latest version on main. If you open a terminal and run the following;

git clone [email protected]:UCL/TDMS.git  # Don't need these two commands if you've already 
cd TDMS                                # got the TDMS repository cloned from GitHub, just
                                       # cd into the appropriate folder
git checkout main
git pull

you'll have the development version of the code, which you can then compile with the usual instructions on the website.

You can also download the code for the development version as a zip file from github, but the clone method is recommended if you've used the GitHub clone before.

@prmunro
Copy link
Collaborator Author

prmunro commented Feb 27, 2024

Hi @willGraham01 , sorry it's taking me a long time to get to this. I just noticed that test arc_15 tests this functionality, so I'm confused as to how the test was passing. Could you please check that this test was indeed passing on the stable version of TDMS? This test can be used to check the correctness of the development version too.

By the way, do we test for the creation of the .mat input file? Just asking as when I first noticed this problem, iteratefdtd_matrix.m would generate an error when multilayer was non-empty. So test arc_15 should also generate an error...unless I'm missing something!

Peter

@willGraham01
Copy link
Collaborator

Hi @prmunro,

I just noticed that test arc_15 tests this functionality, so I'm confused as to how the test was passing.

It looks like arc_15 (and arc_14 for that matter, if it exists) aren't tracked in the GitHub repository, so aren't being run when new versions are uploaded. See tests run on development version and tests run on stable version. This test data is also not on Zenodo. This would explain why we didn't catch this sooner.

By the way, do we test for the creation of the .mat input file?

We do when running the system tests (those linked to above). The first stage of these tests is to take the pstd/fdtd_input_file_XX.m and run it through the run_bscan.m script, to generate the input .mat file. Then that file gets passed to TDMS and the output is checked against the Zenodo data (expected output).

If you have arc_15 to hand (the input file, the particular version of run_bscan that it gets passed through, the expected .mat file that iteratefdtd_matrix should produce, and the TDMS output file) then I can try out running these on the stable and development versions of TDMS.

@prmunro
Copy link
Collaborator Author

prmunro commented Feb 29, 2024

Hi @willGraham01 , thanks for this. Here is a onedrive link to arc_14 and arc_15. I hope you can follow the logic of run_pstd_bscan.m. It generates the input .mat files (if recalc is set to 1) and then performs a test against analytically calculated fields. This test code uses an input .m file that is not compatible with TDMS, i.e., it's using my original file format. I know you have a way to deal with this, but let me know if that's a problem.

https://liveuclac-my.sharepoint.com/:f:/g/personal/rmaprtm_ucl_ac_uk/EoNTGsRrw4xHmYPA2KS7G-ABcnZzDM-RyCBplq5DTwESrA?email=william.graham%40ucl.ac.uk&e=0pZa7m

Cheers - Peter

@willGraham01
Copy link
Collaborator

Hi @prmunro -

I've just run these examples through the development version of TDMS. A mixture of good news and bad news with these tests as detailed below. (Applies to both arc_14 and arc_15).

For both tdms version 1.0.1 & the development version, the only way to generate the in/pstd_fs.mat files correctly is to set use_pstd = 0 and compactsource = 0. Doing so, then running the bscan script will produce an input file that is identical to the one provided in the zip files above. However passing this input into tdms results in use of the fdtd solver (since use_pstd = 0) which produces an incorrect out/pstd_fs.mat. Interestingly enough though, if I generate in/pstd_fs.mat as described above, then manually change use_pstd to 1 in the generated in/pstd_fs.mat, then run tdms, the resulting out/pstd_fs.mat seems to be within good (numerical) agreement of the output provided.

Attempting to generate in/pstd_fs.mat as I would expect (with use_pstd = 1, which requires compactsource = 1) results in in/pstd_fs.mat being incorrect (again compared to the provided file in the zip folder). The Ksource term in particular is generated incorrectly in this case, which is of course the one affected by the compactsource condition in this case.

So a couple of things here:

  • Are arc_14 and arc_15 meant to be PSTD solver simulations?
    • If so, then it looks like the development version of tdms is producing the correct result given the input file and reference outputs you've provided.
    • I haven't checked this for the 1.0.1 version, but I imagine they will fail for the reason you described in the original bug.
  • If this is meant to be a PSTD solver simulation, we have to be using the compact source condition. Is it possible that - because these input files are from an old version of tdms - that the zip file data is affected by the dz/2 offset bug (link)?

@prmunro
Copy link
Collaborator Author

prmunro commented Mar 4, 2024

HI @willGraham01 , thanks for looking in to this. The PSTD algorithm should be used, so we should have use_pstd = 1 and compactsource = 1. However, this also means we need to set hfname = ''. There is also another thing to be mindful of, in particular, efield_gauss contains the following lines:

dz = 1300e-9/4;
vertices = [X(:) Y(:) Z(:)-dz/2];

these lines should not be there for TDMS, as TDMS takes care of the -dz/2 term.

All of this makes sense in light of your comments. I'm unsure about the best way to proceed. Should I generate input files that should work with TDMS and provide the output from my original version, albeit generated using an old .m input format?

Does that make sense?

ps: You're very good at being able to work out what's going on here given the intricacy of the code and that you are focused on other projects!

Peter

@willGraham01
Copy link
Collaborator

willGraham01 commented Mar 5, 2024

Hi @prmunro,

Should I generate input files that should work with TDMS and provide the output from my original version, albeit generated using an old .m input format?

Yes I think this is the way to go if we want to debug the original "rho_x and rho_y are not vectors". If you can produce an input .mat file (regardless of how it's generated) alongside the corresponding output .mat file, we can use this as a regression test. You'll have to let me know which of pstd or fdtd you're using (I assume pstd) so I can tell the more recent versions of TDMS which method to use of course.

Doing this, we should observe that:

  • Version 1.0.1 of TDMS throws the error reported above,
  • The development version of TDMS does not throw the error above, and is consistent with the provided .mat file. (This is assuming my earlier investigations didn't miss something else).

From there, we can then work towards releasing a new TDMS version with a fix from the development branch, or identify the root cause of the bug!

There is also another thing to be mindful of, in particular, efield_gauss contains the following lines:

dz = 1300e-9/4;
vertices = [X(:) Y(:) Z(:)-dz/2];

I'd missed this - thanks for pointing it out. Looks like MATLAB was picking up this version of efield_gauss rather than the version we have saved in the repository (which would explain why "turning off" pstd gives the correct result!). Not that it affects our plan above, but I'll double-check that accounting for this inconsistency means that the input generation is working as intended. Update: Making appropriate changes results in correct input generation with use_pstd=1 and compactsource=1.

@prmunro
Copy link
Collaborator Author

prmunro commented Mar 7, 2024

Hi @willGraham01 , I have created a TDMS compatible test, which could can download here:

https://liveuclac-my.sharepoint.com/:u:/g/personal/rmaprtm_ucl_ac_uk/EaD52JmjMkVDtFZyKQdU_n8BDKAutazedYsJYmuctJt0xw?email=william.graham%40ucl.ac.uk&e=a8SAwx

Running run_pstd_bscan.mwill perform a simulation and compare it with a field calculated analytically. This will tell us if the code is working.

Cheers - Peter

@willGraham01
Copy link
Collaborator

Hi @prmunro - apologies, the .zip file you shared is missing the efield_gauss_check function, so the script encounters an error after running the tdms executable (the in/pstd_fs.mat input file is generated correctly of course and matches the one provided). Could you share the efield_gauss_check function?

Regardless;

  • The development version of tdms happily runs to completion with the input .mat file - so that's promising. I can share the output .mat file too if you would like it (though I don't know if it's correct or not without efield_gauss_check).
  • v1.0.1 of tdms indeed throws original error(s), as expected.

(Also the out/pstd_fs.mat file that is in the arc_15_tdms_format.zip folder is 0 bytes in size - it was possibly corrupted when creating the zip folder?)

@prmunro
Copy link
Collaborator Author

prmunro commented Mar 12, 2024

Hi @willGraham01 , I have generated another zip file, not with efield_gauss_check.m included. I also removed out/pstd_fs.mat as this should be generated by running_pstd_bscan.m and then tdms itself. It was empty because I ran tdms without letting it finish.

https://liveuclac-my.sharepoint.com/:u:/g/personal/rmaprtm_ucl_ac_uk/EaD52JmjMkVDtFZyKQdU_n8BDKAutazedYsJYmuctJt0xw?email=william.graham%40ucl.ac.uk&e=cV8y9h

Can you please try running it again?

Peter

@willGraham01
Copy link
Collaborator

willGraham01 commented Mar 13, 2024

Hi @prmunro,

Thanks - script is now running and the development TDMS version reports the error values at the bottom of this post. (If I'm reading the script correctly, these are %-relative errors being reported, but correct me if that's wrong).

With this in mind, it looks like the fields are very much out of agreement with the reference data (relative error 1 translates to errors of the same order as the absolute values themselves?). The fact that the relative error is so close to 1 makes me wonder if we're still hitting the dx/2, 2*field magnitude bug somewhere (100% error would result from one field being double the other), but I don't think we are as the analytic field is computed from the x_i and z_i coordinates that the TDMS run outputs.

In terms of absolute values; the field values themselves have a maximum (absolute) difference around 2, average difference of the order 0.1, and in places a minimum difference of 10^-7 (essentially identical). This is when compared across the particular plane we are checking - which from reading the script I inferred was the y==0 plane but correct me if I am wrong.

Next Steps

I think the next step is to take the input file in/pstd_fd.mat that this script produces, and attempt to use it to debug the current 1.0.1 version.

  • It should be a fairly easy adaptation to prevent the C++ error that's getting thrown.
  • We can then see if, with this change, TDMS is in agreement with the analytic result. If it is, then there's likely a bug in the development version. If we make the expected necessary changes and we get a spurious result, we'll have to look further into the git history and do some in-depth debugging.

Error values reported for development version of TDMS.

err_fsample = 0.1162

err_csample (3 by 16 array)

  • Min value is 0.1131
  • Max 0.3579
  • Avg 0.1948

err_surf (6 by 16 array)

  • Min 0.0929
  • Max 0.5027
  • Avg 0.2220

erran (1 by 3 array)
0.977754538612314 0.977360297407077 0.952807222709342

errnum (1 by 6 array, split onto two lines for ease)
0.0835853336901602 0.0854787661542632 0.156599207597277
0.131404750763419 0.132037636655904 0.146028239606973

@prmunro
Copy link
Collaborator Author

prmunro commented Apr 5, 2024

Hi @willGraham01 , I found some time over Easter to look at this. I found that I had made a mistake in the analysis script, and that the analytic comparison was incorrect. I have included the expected output of erran and errnum in run_pstd_bscan.m (which are absolute and not percentages).

Can you see if the development version of TDMS returns the same values?

update_mfiles.zip

@willGraham01
Copy link
Collaborator

willGraham01 commented Apr 5, 2024

Hi @prmunro, the development version of TDMS still returns the values I quoted above:

erran = 
0.977754538612314	0.977360297407077	0.952807222709342

errnum = 
0.0835853336901602	0.0854787661542632	0.156599207597277
0.131404750763419	0.132037636655904	0.146028239606973

even when using the new version of efield_guass_ml.m and run_pstd_bscan.m provided in your comment above. These are different to the values reported in the comments of the new script:

%>> erran
%
%erran =
%
%    0.0798    0.0809    0.3320
%
%>> errnum
%
%errnum =
%
%    0.0535    0.0536    0.3155    0.3414    0.3409    0.0619

(I noticed the zip file you've attached doesn't seem to change the refdata.mat file, which is what the reference data is loaded from after tdms runs. If that file was also meant to have been updated then things might be different, though I'll need such an updated version to confirm this)

@prmunro
Copy link
Collaborator Author

prmunro commented Apr 5, 2024

Thanks @willGraham01 , the erran relies on reference data calculated analytically within run_pstd_bscan.m, which now calls efield_guass_ml.m. So I am a bit suspicious that the development version is returning identical values for errnum, compared to when we were calling efield_gauss.m.

Would you mind double checking that efield_gauss_ml.m is being called?

Thanks

@willGraham01
Copy link
Collaborator

Hi @prmunro, I can confirm efield_guass_ml.m is being called and used to create the reference data on lines 82-89 in run_pstd_bscan.m:

Ean = efield_gauss_ml(X,Y,Z);
erran = zeros(1,3);
fields = {'Ex_i','Ey_i','Ez_i','Hx_i','Hy_i','Hz_i'};
for i=1:3
    com = sprintf('testfield=dat_test.%s;',fields{i});
    eval(com);
    erran(i) = sum(sqrt(abs(squeeze(Ean{i})-squeeze(testfield(indsx,find(dat_test.y_i==0),indsz))).^2))/sum(sqrt(abs(squeeze(Ean{i})).^2));
end

The numbers remain the same as when efield_guass_check was being used. (Using Ean = efield_gauss(X,Y,Z); throws an error due to array sizes).


I have pushed the 350-investigation branch branch which contains the development version of TDMS (+ the multilayer fix above) to GitHub.
You should be able to pull this branch down to your machine and build the development version of TDMS, if you want to run things yourself to see what's going on:

$ cd /path/to/TDMS/repo
$ pwd
/path/to/TDMS/repo
$ ls -a
.git doc examples tdms README.md <other top-level files>
$ git checkout main
$ git pull
$ git switch 350-investigation

This will fetch the development branch from github and checkout your local repository to that state (you can use git checkout v1.0.1 to go back to version 1.0.1, for example). From there, the build instructions are the same as for the current version of TDMS, with the addition of needing hdf5 before you run the cmake instructions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants