[Tidy] Remove component to data mapping from data manager #451

antonymilne · 2024-05-02T16:13:52Z

Description

Small refactor as part of https://github.com/McK-Internal/vizro-internal/issues/714 in preparation for more exciting changes such as https://github.com/McK-Internal/vizro-internal/issues/753. Commit history also contains some POCs for other parts of that ticket such as making parametrised calls to dynamic data. Getting this PR merged first will lay the foundation for that work.

Now that we have a "proper" scheme for caching, there's no need to store links between component ID and data source name in the data manager. Instead, the captured callable itself contains the data source name that is used to fetch data as data_manager[data_source_name].load(). This simplifies things quite a bit.

Tested by running hatch run example features and hatch run example features/yaml_version (after modifying to include some dynamic data examples), and all "figure" components (Graph, AgGrid, Table) work still.

Reviewers

@petar-qb please check the resulting diff and make sure you understand it, but no need to run any examples since not much has really changed here. I'll talk you through the intermediate commits and work with you on https://github.com/McK-Internal/vizro-internal/issues/753 next week.

@huong-li-nguyen or @maxschulz-COL should be easy to skim through and give a ✅ if all looks good.

Notice

I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":
- I submit this contribution under the Apache 2.0 license and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.
- I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorized to submit this contribution on behalf of the original creator(s) or their licensees.
- I certify that the use of this contribution as authorized by the Apache 2.0 license does not violate the intellectual property rights of anyone else.
- I have not referenced individuals, products or companies in any commits, directly or indirectly.
- I have not added data or restricted code in any commits, directly or indirectly.

… figure["data_frame"]. Seems to work ok

…nable for now

… don't enable for now" This reverts commit 92e6c68.

for more information, see https://pre-commit.ci

antonymilne · 2024-05-02T17:01:55Z

vizro-core/src/vizro/models/_components/graph.py

+ # This default value is not actually used anywhere at the moment since __call__ is always used with data_frame
+ # specified. It's here to match Table and AgGrid and because we might want to use __call__ more in future.
+ # If the functionality of process_callable_data_frame moves to CapturedCallable then this would move there too.
+ kwargs.setdefault("data_frame", data_manager[self["data_frame"]].load())


This bit of code probably won't last very long. After #439 is merged then I don't think any code does __call__ without specifying a data_frame. Removing or at least rewriting this bit of code will then fix the problem @petar-qb pointed out in #398 (comment).

maxschulz-COL

Skimmed through, looks good 💪

petar-qb

Regarding this commit and considering all pros/cons I fully agree that _DynamicData.pd_DataFrameCallable should be a pure function, not a CaptureCallable. 👍

Regardless that CaptureCallable is better for reusing across multiple data sources, and regardless nested function arguments cannot be updated with this approach, does it seem possible to add: _DynamicData.pd_DataFrameCallable as CaptureCallable as an optional approach in the future?
(just to allow advanced users to reuse load data function and nesting capabilities)

vizro-core/src/vizro/models/_components/_components_utils.py

vizro-core/src/vizro/models/_controls/filter.py

antonymilne · 2024-05-14T21:03:37Z

Regarding this commit and considering all pros/cons I fully agree that _DynamicData.pd_DataFrameCallable should be a pure function, not a CaptureCallable. 👍

Regardless that CaptureCallable is better for reusing across multiple data sources, and regardless nested function arguments cannot be updated with this approach, does it seem possible to add: _DynamicData.pd_DataFrameCallable as CaptureCallable as an optional approach in the future? (just to allow advanced users to reuse load data function and nesting capabilities)

Thank you for the review @petar-qb! I was waiting until #479 had been finalised before I answered this, because the answer kept on changing...

As it stands, I think this will definitely be possible (even without adding __name__ and __qualname__ to CapturedCallable, although it might be a good idea to do so anyway) because I always make sure that the cache key includes the data source name. This means that you could do:

# @capture("data") doesn't exist but pretend it does
def load_csv(filename):
     return pd.read_csv(filename)

# These are all dynamic data
data_manager["data_x"] = load_csv("data_x.csv")
data_manager["data_y"] = load_csv("data_y.csv")
data_manager["data_z"] = load_csv("data_x.csv") # even the same value should work indepedently

Probably this will work already actually if you do CapturedCallable(load_data) without the correct @capture decorator existing, because everything just relies on duck typing rather than strict checks.

Let's not promote this as a possible approach until we have a really good reason to do so though. Thinking about it again, even the reusability can now be handled using a partial function ok so the only real reason to use CapturedCallable would be that you prefer the syntax or have the nested keys, which seems pretty unlikely. So you'd already do the above as:

def load_csv(filename):
     return pd.read_csv(filename)

# These are all dynamic data
data_manager["data_x"] = partial(load_csv, "data_x.csv")
data_manager["data_y"] = partial(load_csv, "data_y.csv")
data_manager["data_z"] = partial(load_csv, "data_x.csv")

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

antonymilne added 12 commits May 1, 2024 11:09

Initial attempt to remove component_to_data. Put data_frame string in…

304f058

… figure["data_frame"]. Seems to work ok

POC of dynamic data function with Parameter argument

4161a09

Cache works well!

b5cd1ae

Captured callable works but need to set name and qualname

f4a748b

POC of Parameter(targets=["graph.data_frame.points"])

ca1624d

Simpler version that still works

7662b56

Address nested parameters of captured callable - possible but don't e…

92e6c68

…nable for now

Revert "Address nested parameters of captured callable - possible but…

9442b2b

… don't enable for now" This reverts commit 92e6c68.

Giving CC qualname and name works well

ec988d7

Consider moving DM logic to CC - decided not to

8185617

Revert POC stuff and tidy

a980cec

Finish tidy

8aa743d

antonymilne requested a review from petar-qb May 2, 2024 16:13

pre-commit-ci bot and others added 7 commits May 2, 2024 16:16

[pre-commit.ci] auto fixes from pre-commit.com hooks

e864356

for more information, see https://pre-commit.ci

Finish tidy

96fc5ce

Fix tests

a9d26c6

Fix tests

f7f07cc

Final tidy

d4fbd26

Merge branch 'main' into tidy/data-source-mapping

3b341fd

Revert change to dev app

365b5aa

antonymilne marked this pull request as ready for review May 2, 2024 16:42

antonymilne requested review from Joseph-Perkins, huong-li-nguyen, maxschulz-COL and Anna-Xiong as code owners May 2, 2024 16:42

Add changelog

2ba74cb

antonymilne commented May 2, 2024

View reviewed changes

maxschulz-COL approved these changes May 3, 2024

View reviewed changes

maxschulz-COL added Status: Ready for Review ☑️ Issue/PR is ready for review - all tests have passed Component: Refactoring 🦸 Issue/PR that refactor existing code labels May 6, 2024

Merge branch 'main' into tidy/data-source-mapping

bb9b9fa

petar-qb assigned antonymilne May 9, 2024

antonymilne enabled auto-merge (squash) May 9, 2024 08:16

petar-qb approved these changes May 9, 2024

View reviewed changes

vizro-core/src/vizro/models/_components/_components_utils.py Show resolved Hide resolved

vizro-core/src/vizro/models/_controls/filter.py Show resolved Hide resolved

Merge branch 'main' into tidy/data-source-mapping

1d65558

antonymilne merged commit 5d5d8a8 into main May 14, 2024
34 checks passed

antonymilne deleted the tidy/data-source-mapping branch May 14, 2024 21:06

nadijagraca pushed a commit that referenced this pull request May 15, 2024

[Tidy] Remove component to data mapping from data manager (#451)

a286c8f

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

antonymilne mentioned this pull request May 20, 2024

[Feat] Make dynamic data loading functions accept arguments #479

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tidy] Remove component to data mapping from data manager #451

[Tidy] Remove component to data mapping from data manager #451

antonymilne commented May 2, 2024 •

edited

antonymilne May 2, 2024

maxschulz-COL left a comment

petar-qb left a comment

antonymilne commented May 14, 2024

[Tidy] Remove component to data mapping from data manager #451

[Tidy] Remove component to data mapping from data manager #451

Conversation

antonymilne commented May 2, 2024 • edited

Description

Reviewers

Notice

antonymilne May 2, 2024

Choose a reason for hiding this comment

maxschulz-COL left a comment

Choose a reason for hiding this comment

petar-qb left a comment

Choose a reason for hiding this comment

antonymilne commented May 14, 2024

antonymilne commented May 2, 2024 •

edited