Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

equivalent to jolt recursivelySquashNulls #340

Open
samer1977 opened this issue May 20, 2024 · 10 comments
Open

equivalent to jolt recursivelySquashNulls #340

samer1977 opened this issue May 20, 2024 · 10 comments

Comments

@samer1977
Copy link

Hi,

Im coming from jolt background and now finding myself to learn jslt because apache nifi introduced new json transformation using jslt and I'm interested in learning to see if I can get the best of both world. Its totally different mind set but I can see how close its to Xquery in xml. I'm surprised that no one has asked this because this is common problem in json transformation when you want to get rid of all null values. Jolt has created function called recursivelySquashNulls that will remove all nulls in nested json recursively but I could not find something similar in jslt. Can someone please write me the spec for it in jslt? I spent the whole day trying to figure it out but its not that easy specially when your nested object is either complex object or array of complex object or even array of simple types. I would like to see if jslt can address all scenarios in not so much convoluted spec.

Thanks

@samer1977
Copy link
Author

samer1977 commented May 23, 2024

I asked this question above but no answer which Im not sure why. If you come from Jolt background you have used this function and though it doenst work perfectly it helps sometimes and its good to have as an option. I started learning JSLT couple of days ago and it caught my interest. I can see cases where jstl can be better option than jolt and might simplify things. Performance I'm not sure though, I made comparison using Nifi and ran both spec on the same input to produce the same output and jolt always had the a little bit of edge. Regarding the above question here is what I was able to come up with and I hope I was successful:

def squashNullsRecursive(obj)
 
   let simple = { for($obj) .key: .value if (.value!=null and not(is-object(.value)))}
   let complex = { for($obj) .key: squashNullsRecursive(.value) if (is-object(.value)) }
   let array = { for($obj) .key: [for(.value) . if (not(is-object(.)) and .!=null)] +
                                 [for(.value) squashNullsRecursive(.) if (is-object(.) or is-array(.))] 
                            
                 if (is-array(.value))
               }
  
                         
   $array +$complex+$simple 

Input:


{
  "x": "x1",
  "y": "y2",
  "z": {
    "z1": "z11",
    "z2": null,
    "z3": [
      1,
      {
        "zzz": "skid",
        "zzz1": null
      },
      2
    ]
  }
}

squashNullsRecursive(.)

@larsga
Copy link
Collaborator

larsga commented May 23, 2024

I didn't answer because I don't have time to write this function from scratch.

You're on the right trick, but in the top level of your function I'd use if and test the input for is-array and is-object to separate the cases: object, array, something else. You can write it much more simply and cleanly that way.

@samer1977
Copy link
Author

Can you please give an example for the simplification. Im not sure what you mean by if and test. Thanks

@larsga
Copy link
Collaborator

larsga commented May 23, 2024

You know what an if statement is, right? What's inside the () is the test.

@samer1977
Copy link
Author

Sorry I still dont get it. I thought Im using if statement with For loop and I thought this is the clean way per documentation. I know what if statement is. I might be slow and not as smart as you are but I know I can write better flatten-object than yours ;)

@catull
Copy link

catull commented May 23, 2024

This one works

def squashNulls (obj)
  from-json (replace (replace (replace (replace (string ($obj), "\\\"[^\"]+\\\":null", ""), ",,", ","), ",}", "}"), ",]", "]"))

squashNulls (.)

It could be reduced to a simpler replace, if that function supported positional patterns.

The last two replacement patterns can be collapsed into "," followed by either } or ] to
replace (s, ",([}]])", "$1") or
replace (s, ",([}]])", "\1") or
replace (s, ",([}]])", "&1") or

Or whichever mechanism there is.
What is used underneath replace, is it plain Java ?

It works on RegexPlanet, see https://www.regexplanet.com/share/index.html?share=yyyyf6v7w2d
Click on 'Java'.

I am aware this is not what @samer1977 asked for.

@catull
Copy link

catull commented May 23, 2024

I checked, see https://github.com/schibsted/jslt/blob/master/core/src/main/java/com/schibsted/spt/data/jslt/impl/BuiltinFunctions.java#L931

Java Regexp Pattern are used internally, but they do not support positional patterns.
I'll open an issue for that.

@samer1977
Copy link
Author

string replace? that looks scary from performance perspective but I guess I need to do some testing and find out

@catull
Copy link

catull commented May 23, 2024

My original algorithm did not support an initial property of an object being null.
It only worked if the null property was in the middle or the end.

string replace? that looks scary from performance perspective but I guess I need to do some testing and find out
I also got rid of two nested replace calls, from 4 calls to 2.

Better performance, right ?

This one does now support initial nulls in objects:

def squashNulls (obj)
  from-json (
    replace (
      replace (
          string ($obj),
          ",?\\\"[^\"]+\\\":null",
          ""
      ),
      "\\{,",
      "{"
    )
  )

squashNulls (.)

Tested on this input:

{
  "w": null,
  "x": "x1",
  "y": "y2",
  "z": {
    "z1": "z11",
    "z2": null,
    "z3": [
      1,
      {
        "zzz": "skid",
        "zzz1": null
      },
      2
    ]
  }
}

@catull
Copy link

catull commented May 23, 2024

@samer1977
By the way, what is the policy on null values in arrays ?

[ null, 2, 7, { "a": 1, "b": 2 }]

Should that null be dropped ?
The size of the array would change.

My algorithm only drops attributes that are null.
They do not change the objects, i.e. I consider the objects { "a": null, "b": 7 } and { "b": 7 } structurally equivalent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants