You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to implement some functionality that would give me the previous seen value inside a window and I have realised that it is currently quite cumbersome to do it in Pathway.
I think this feature will be useful for anyone trying to do not-so-trivial aggregations.
The text was updated successfully, but these errors were encountered:
To add more context, I was looking for a way to do an operation that is similar to diff - take a previous row value, manipulate it, and then multiply by the next row value. I defined a window with hop=1, duration=1 but was only able to get those values after an additional join - which I found cumbersome.
This is a great point. Actually, diff is a particularly unpleasant (slow and memory-inefficient) operation to perform in a streaming or distributed system in its full generality - and may cause unexpected performance bottlenecks. At the same time, it can be extremely useful. Whether or not it should be present as a keyword here is a recurring topic among repo maintainers.
If by any chance you are in the lucky place that your data has a column with sequential row numbers (seq = 1, 2, 3,...), compute seq_next = seq+1 and join the table with itself on table_copy1.seq_next = table_copy2.seq. It's not beautiful but it's (fairly) fast.
Next best case, if you can localize all the adjacent rows inside a small window, a reducer running a UDF over this window is a good idea.
I tried to implement some functionality that would give me the previous seen value inside a window and I have realised that it is currently quite cumbersome to do it in Pathway.
I think this feature will be useful for anyone trying to do not-so-trivial aggregations.
The text was updated successfully, but these errors were encountered: