Fix parsing of experiment IDs and names #38

irisdianauy · 2022-03-18T01:48:18Z

The previous parser uses , as a field separator, which does not behave as planned when an experiment name string uses a comma, as reported here. This fix replaces the field separator with a regular expresion for the desired field patterns. It also only accepts , when inside the second/last element (the experiment name) of the parsed string.

This, as far as tested, works using gawk (the default awk in codon as fg_atlas or fg_atlas_sc), but not in other awk versions, like mawk or the original awk.

pinin4fjords

Confirm that this does not work on Mac. I would suggest a much simpler solution- switch the jq filter to use TSV and use Awk with tabs separating fields. Could you confirm that the suggestion works, and do similarly with the other?

pinin4fjords · 2022-03-18T08:48:11Z

gxa_helper/bin/gxa_release_data_stats.sh

@@ -91,7 +91,7 @@ echo -e "\n#### Selected differential experiments\n" >> $releaseNotesFile
 ## parse list of new differential studies to get write experiment titles
 curl 'https://wwwdev.ebi.ac.uk/gxa/json/experiments' | \
    jq -r '.experiments | .[] | select(.loadDate | strptime("%d-%m-%Y") | mktime > '$last_release_epoch_time') | select(.rawExperimentType | test("DIFFERENTIAL"; "i")) | [.experimentAccession, .experimentDescription] | @csv' | \
-    awk -v FS="," '{ printf "- [%s](https://www.ebi.ac.uk/gxa/experiments/%s)\n", $2, $1}' | sed s/\"//g >> $releaseNotesFile
+    awk -v FPAT="^([^,]+)|(\"[^\"]+\")$" '{ printf "- [%s](https://www.ebi.ac.uk/gxa/experiments/%s)\n", $2, $1}' | sed s/\"//g >> $releaseNotesFile


Suggested change

awk -v FPAT="^([^,]+)|(\"[^\"]+\")$" '{ printf "- [%s](https://www.ebi.ac.uk/gxa/experiments/%s)\n", $2, $1}' | sed s/\"//g >> $releaseNotesFile

Done in edd0bde, thanks!

gxa_helper/bin/gxa_release_data_stats.sh

Co-authored-by: Jonathan Manning <[email protected]>

pinin4fjords

Looks good!

Use regex instead of field separator in parsing experiments

a917e10

irisdianauy requested review from anjaf, pinin4fjords and pcm32 March 18, 2022 01:56

pinin4fjords reviewed Mar 18, 2022

View reviewed changes

irisdianauy and others added 2 commits March 18, 2022 17:27

Use jq tsv instead of csv

288e6f0

Co-authored-by: Jonathan Manning <[email protected]>

Use jq tsv instead of csv, and a tab field separator

edd0bde

pinin4fjords approved these changes Mar 18, 2022

View reviewed changes

irisdianauy changed the title ~~Use regex instead of field separator in parsing experiments~~ Fix parsing of experiment IDs and names Mar 18, 2022

irisdianauy merged commit 09c3e63 into master Mar 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix parsing of experiment IDs and names #38

Fix parsing of experiment IDs and names #38

irisdianauy commented Mar 18, 2022 •

edited

Loading

pinin4fjords left a comment

pinin4fjords Mar 18, 2022

irisdianauy Mar 18, 2022

pinin4fjords left a comment

Fix parsing of experiment IDs and names #38

Fix parsing of experiment IDs and names #38

Conversation

irisdianauy commented Mar 18, 2022 • edited Loading

pinin4fjords left a comment

Choose a reason for hiding this comment

pinin4fjords Mar 18, 2022

Choose a reason for hiding this comment

irisdianauy Mar 18, 2022

Choose a reason for hiding this comment

pinin4fjords left a comment

Choose a reason for hiding this comment

irisdianauy commented Mar 18, 2022 •

edited

Loading