-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple Data Files of The Same Type Will Only Have 1 Name in Assay Conversion #509
Comments
Should fix issue ISA-tools#509.
@ptth222 Thank you for the PR. The following would be the expected way to representing more than one output to a 'data acquisition' event.
What the PR does is to generate the following output:
This is not allowed and would be require changing the isatab load component. We now need to check the initial behavior and why only the last output file is kept. This will require adding new tests to the testing suite and possibly amend the parser |
I made new commits to #510 to address what you said. I hope it is better. I also discovered another issue while making these changes. There are some inconsistencies between validation and the ProcessSequenceFactory that parses things. There is a defaults.py file in the isatab module that has a list of acceptable column headers, and these are imported for use in the ProcessSequenceFactory, but aren't in the validation. The validation often uses it's own sets of column headers for each rule instead of pulling from defaults or some other unified source. I discovered this because the column name "Derived Data File" was causing a validation error that wouldn't let the conversion continue. This was in the load_table_checks function in the rules_40xx.py file and I added "Derived Data File" to the list in the function. It might be worth while to try unifying the code so it is pulling column headers from 1 unified place. |
Testing that the changes fix what was raised in ISA-tools#509.
If you try to create 2 files of the same type in the same assay in a JSON to Tab conversion only the last file will appear as the name in both columns. For example, if you have a Raw Data File, 'data_file1' and 'data_file2', only 'data_file2' will appear in the 2 Raw Data File columns (assuming data_file2 is later in the process sequence).
Example to reproduce:
The above example modifies the "BII-I-1" example. I basically delete the transcriptome processSequence and replace it with a simpler one.
The issue appears to be in the isatools\isatab\dump\write.py file, in the write_assay_table_files function. It is similar to issue #500 where multiple data file type column names are not being tracked. I have adjusted the code so it will track the names and the file names appear as expected. I created a PR, #510.
The text was updated successfully, but these errors were encountered: