Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tidy] Return empty data frame from AgGrid, Table and Figure build methods #644

Merged
merged 22 commits into from
Aug 27, 2024

Conversation

petar-qb
Copy link
Contributor

@petar-qb petar-qb commented Aug 21, 2024

Description

References:


Why are we rethinking the AgGrid/Table build method?

Short reminder:

There are two phases that happen when the page is loaded:

  1. build phase -> renders all page components by returning the content made inside the components build method.
  2. on_page_load phase -> triggers a process that calculates the final appearance of all targeted figures based on the selected filters/parameters/filter_interactions.

Even when the big data is handled and showed with different figures, the build phase has to be completed quickly, so Vizro users don't stuck with the "blank white page". Since the calculation for the final figures look could take a lot of time (e.g. if the data loading process is slow or if big data has to be transported through the network), we should return some light-weight placeholder components that will be replaced with the real components afterwards.

After the build phase, the on_page_load mechanism is triggered which immediately puts all the targeted figures in the "loading state" and after some time it returns and renders filtered & parametrised figures to the UI.

How it works now?

Currently the vm.Graph.build method works as expected. It returns a placeholder graph from the build phase which is later replaced with the actual (filtered and parametrised) Graph object (result of the on_page_load mechanism) later.
However, the vm.AgGrid.build and the vm.Table.build methods returns objects made of entire (non-filtered) data_frame and original parameters from the build phase which becomes replaced with the actual AgGrid/Table objects (result of the on_page_load mechanism).

(We decided to return entire data_frames for ag_grid/tables just to quick fix the bugs mentioned at the beginning of this description.)

What do we want to achieve?

We want to optimise the AgGrid and Table build methods so they do not return an entire data_frame in build phase. Except the response time optimisation (which is the most important I guess), we also want to align them with the Graph build method that also does not return an entire data_frame from the build method. In this process we shouldn't recreate any of the persistence (or similar) bugs we solved with the #439.

Vizro UX will hugely benefit from this "relieving the initial page loading" process, and this step is crucial for all the upcoming backend changes.


Task acceptance criteria:

  • Relieve the page.build phase:
    • Ideally by returning a hardcoded placeholder component from the AgGrid/Table build methods (similarly like it's done for the vm.Graph)
    • OR by returning a real (fully configured) AgGrid/Table object but with an empty data_frame. (see in the table below why this is not optimal solution. This option is described within the "self.call(data_frame=pd.DataFrame())" column) .
  • Do not introduce any persistence (or similar) inconsistencies like:
    • Reseting AG grid resets column widths when reloading the screen (if the columnSize="autoSize"),
    • Column filter persistence,
    • Not sending an empty data_frame into figure functions,
    • Make the pagination and selectedRows to work smoothly (together with all other possible arguments).
  • Optimise the number of figure function calls. Ideally it should be called only once from the on_page_load phase, with the filtered data_frame and parameterised function arguments.

The table below shows how different objects returned from the vm.AgGrid build method (columns) match different acceptance criteria (rows):

Returned objects like dag.AgGrid() / html.Div() are not considered at all as they raise the following error (even with suppress_callback_exceptions=True set) -> "A nonexistent object was used in an State of a Dash callback...".

dag.AgGrid(id=self._input_component_id) self.call(data_frame=pd.DataFrame()) self.call() html.Div(id=self._input_component_id)
Returns a light-weigh component from the build phase
Keeps column widths on page refresh *❌ (we thought we fixed it)
Keeps column widths on page nav *❌ (we thought we fixed it)
Keeps column widths on filter changes *❌ (probably internal dash-ag-grid bug)
"Column filter" persistence
"pagination" persistence
"selectedRows" persistence
Not sending an empty data_frame into figure functions
number of figure function calls - build phase 0 - ✅ 2 - ❌ 2 - ❌ 0 - ✅
number of figure function calls - on_page_load phase 1 - ✅ 2 - ❌ 2 - ❌ 1 - ✅

Legend:

  1. *❌ (we thought we fixed it) - points out that the https://github.com/McK-Internal/vizro-internal/issues/670 is not actually fully solved. To prove it, run the scratch_dev example, select filter [1, 3], and refresh the page.
  2. *❌ (probably internal dash-ag-grid bug) - See the issue Column widths incorrectly calculated based on previous rowData when using autoSize in AgGrid plotly/dash-ag-grid#318

Conclusions:

Based on all the knowledge gathered so far (which is mainly presented in the table above), we will go with the returned object html.Div(id=self._input_component_id) option from the vm.AgGrid/vm.Table build methods.


Notice

  • I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":

    • I submit this contribution under the Apache 2.0 license and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.
    • I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorized to submit this contribution on behalf of the original creator(s) or their licensees.
    • I certify that the use of this contribution as authorized by the Apache 2.0 license does not violate the intellectual property rights of anyone else.
    • I have not referenced individuals, products or companies in any commits, directly or indirectly.
    • I have not added data or restricted code in any commits, directly or indirectly.

vizro-core/src/vizro/models/_components/figure.py Outdated Show resolved Hide resolved
vizro-core/src/vizro/_vizro.py Show resolved Hide resolved
vizro-core/src/vizro/actions/_actions_utils.py Outdated Show resolved Hide resolved
vizro-core/src/vizro/actions/_actions_utils.py Outdated Show resolved Hide resolved
@petar-qb petar-qb changed the title Feat/return empty data frame from tables build [Tidy] Return empty data frame from tables build Aug 21, 2024
@petar-qb petar-qb changed the title [Tidy] Return empty data frame from tables build [Tidy] Return empty data frame from AgGrid, Table and Figure build methods Aug 21, 2024
@petar-qb petar-qb self-assigned this Aug 22, 2024
@petar-qb petar-qb mentioned this pull request Aug 22, 2024
1 task
Copy link
Contributor

@antonymilne antonymilne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 👏 👏 Really awesome work - great investigation and write-up. I didn't actually test it but I trust your manual tests + if @maxschulz-COL can give it a quick spin!

I would love to get to the bottom of the question I had about why this doesn't trigger the filter_interaction since as you know it influences other things like the dynamic controls. But if it all works then happy to merge this without fully understanding it!


Just for other reviewers and to have it all in one place, there's a couple of things worth thinking about here.

Firstly, consider the case when you build a dashboard using dynamic data and it's running for 5 years. The data could have completely changed between the time you start the dashboard (run pre_build and then first run build) and the time you visit it (re-run build). Hence any kind of pre-calculated AgGrid or Graph or whatever might be completely irrelevant and should not be cached between pre-build and build time.

Secondly, operations that run when you go to a page with an AgGrid are basically:

  1. data loading
  2. backend running of dag.AgGrid
  3. network time
  4. frontend rendering

(plus some overhead with server handling routing etc.)

Previously I had always assumed that (1) would be the bottleneck here. This was wrong since in @maxschulz-COL and @huong-li-nguyen examples the network time was actually very significant. So In actual fact:

  • 1 is quick for static data, potentially slow for dynamic data
  • 3 is quick for small data, potentially slow for large data
  • 2 and 4 are quick

What this means is that in order for build methods to me fast they need to avoid both 1 and 3. What that boils down to is a very important rule that build methods must not load data.

I'm slightly sad that that the more we work on this, the less "useful content" is in AgGrid and similar models and the more it just becomes the same as Figure + the "action info" type stuff. My original intention was that the build method would be a bit more "real" in encoding the actual Dash component... but I am 100% in agreement now that this is the right way to do it. So it's great that we now have a good solution and a very clear direction and it's obvious what the right split between build vs. __call__ is, which has been an open question for a loooong time. Definitely we're making good progress here! 💯

Copy link
Contributor

@maxschulz-COL maxschulz-COL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⭐ 🕵️ Fantastic investigation and great implementation!!

I reviewed the code in detail, and I like it, well done! I have not tested the entire table you showed, I take it you did that plenty, but trust that this is therefore the best solution. Giving it a short spin, this is quite nice!

I added another optiimzation that I could find, it is quite significant. If you agree, then let's add that one as well 💪

After that consideration, good to merge from my side

@petar-qb
Copy link
Contributor Author

@antonymilne

I'm slightly sad that that the more we work on this, the less "useful content" is in AgGrid and similar models and the more it just becomes the same as Figure + the "action info" type stuff

Just to say that absence of custom code across different models makes me really happy. This enhances the genericity of the codebase, making it more maintainable, easier to understand and easier to customise from the users' side.

The "actions_info" is an idea I got earlier this year while documenting ActionsV2 process and is placed on the Miro board as the first ticket on the AV2 Timeline.

@petar-qb
Copy link
Contributor Author

@antonymilne

I would love to get to the bottom of the question I had about why this doesn't trigger the filter_interaction since as you know it influences other things like the dynamic controls. But if it all works then happy to merge this without fully understanding it!

After the on_page_load action returns figures on the page, the trigger_to_global_store is triggered. This client-side callback is triggered as expected according to -> https://dash.plotly.com/advanced-callbacks#prevent-callback-execution-upon-initial-component-render, because the callback Input is generated by the on_page_load and its Output is a global local store (that already exists in the client side of the app).

This should cause the gateway client-side callback to be triggered as well, since the trigger_to_global_store Output is the Input of the gateway callback (which would cause filter_interaction to be triggered). 🤔 However, I didn't succeed to figure out why the gateway callback is not triggered (as it should be), and wasn't able to trigger the gateway callback even when I removed all the prevent_initial_call=True parameters.

@antonymilne
Copy link
Contributor

After the on_page_load action returns figures on the page, the trigger_to_global_store is triggered. This client-side callback is triggered as expected according to -> https://dash.plotly.com/advanced-callbacks#prevent-callback-execution-upon-initial-component-render, because the callback Input is generated by the on_page_load and its Output is a global local store (that already exists in the client side of the app).

This should cause the gateway client-side callback to be triggered as well, since the trigger_to_global_store Output is the Input of the gateway callback (which would cause filter_interaction to be triggered). 🤔 However, I didn't succeed to figure out why the gateway callback is not triggered (as it should be), and wasn't able to trigger the gateway callback even when I removed all the prevent_initial_call=True parameters.

Thanks for looking into it. I wonder if it's related to this issue? https://github.com/McK-Internal/vizro-internal/issues/1105#issue-2378867831

No need to solve it now, but let's make a note of it so we don't lose track of it.

@petar-qb
Copy link
Contributor Author

I wonder if it's related to this issue?

Great point! It seems like we're dealing with a similar scenario here.

My main goal is to prevent any action from triggering another implicitly (which seems like it's the case right now). Instead, all actions should be explicitly defined using action chains. Given that, I wouldn't suggest altering the current behaviour. However, I'd like to dedicate more time to investigating this further and making the action loop more robust. We can address this in a separate PR when we find more time.

@petar-qb petar-qb merged commit 84c0ffc into main Aug 27, 2024
32 checks passed
@petar-qb petar-qb deleted the feat/return_empty_data_frame_from_tables_build branch August 27, 2024 10:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants