Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic: Refactor Arborist for Performance and Flexiblity #638

Open
fritzy opened this issue Jan 27, 2023 · 1 comment
Open

Epic: Refactor Arborist for Performance and Flexiblity #638

fritzy opened this issue Jan 27, 2023 · 1 comment

Comments

@fritzy
Copy link

fritzy commented Jan 27, 2023

Outcomes

  • Performance: Facilitate npm in resource constrained environments (Android, Dev Boards, cheaper CI, better GitHub Actions performance)
  • Composibility: Additional Workflows and Modes (npm add, support more community tool workflows)
  • Better Dev Momentum (composable approach with isolated logic leads to easier feature adds and bugfixes)

Current Shortcomings

  • Package tree objects contain getters and setters with a lot of embedded logic, making the debugging of workflows difficult, and supporting more workflows impossible. refactor(arborist): refactor business logic from graph logic #469
  • State is built across Arborist in a single assumed workflow, making debugging difficult and supporting more features spread everywhere.
  • Coding style has an emphasis on a single mixed-in object with strict access restrictions, leading to difficult to read code. Newer JS features makes much of this unnecessary.
  • Parallelization and resource restrictions are currently unbounded.

Philosophy

A composable function-based approach, avoiding embedded logic, keeping descriptive tasks isolated and small.

  • Tree objects should not have embedded logic beyond initialization. Getters should be used sparingly, and setters not at all. Functionally mutating the tree should be done with outside or static functions, isolated in purpose.
    • Adding or removing nodes from the tree should not mutate other nodes. If mutation logic is required, an external or static function should be used.
  • Tree attributes should have clearly documented single values (for example, the current Node.name could be any number of values symbolizing the path or package name).
  • A dependency tree, disk tree, dependency diff, and disk diff tree serve separate purposes, although may be linked on a node-by-node basis.

The main Arborist class should keep minimal state relating to configuration and command arguments.

  • Workflow functions keep state logic as scoped variables rather than instance-wide.
  • Workflow functions should be clearly labeled by naming convention or comments
  • A functions set of purposes of creating new tree object, mutating a tree, accesses io, mutating the disk should be clearly labeled by naming convention, arguments, or comments
  • A functions purpose must be kept as singular as practical.
    • A function should not both mutate a tree and the disk, for example.

Workflow functions should manage parallelization intelligently through Promises.

  • Gathering promises into arrays should be done when parallelization is possible.
  • Promise.any, all, finally, etc should be used
  • IO operations, specifically package retrieval and extraction, should be done ask Tasks that can be parallelized or broken into separate workers or processes.

The API is organized into two tiers for different audiences. The primary APIs should be Query results, Workflow functions, and CLI. Secondarily, developers should have access to the small functions that make up Workflows as well as Tree objects composed of Nodes and Links for writing their own workflows.

Testing Brainstorm

Remove coverage map, fix coverage (1 week)
Find single point to start new testing. Tests go in net new area (precedent for this is in npm 7). This is where we flesh out the starting points for the rest of the epic. (2 weeks)
The rest of the quarter is spent writing/porting tests to the new format. Refactoring can only happen when the new tests fully cover a file we want to change.

Part of this is going to be defining the surface area of Arborist. This is not easy to do currently and realistically is going to fall out of the new tests. As we write new tests one criteria we need to think about is "Is this part of the surface area of arborist that we plan on supporting going forward?

New testing plan:

Come up with new / existing mocks for how we want to interface w/ arborist. This will need to be only what we need each time, but made in a way we can expand as we go. Mock-registry is the pattern for this.

  • mock registry (exists, will likely need expanding as we iterate)
  • mock packages (mock registry does some of this but not on disk)
  • assertions of state need to be meaningful. What are we testing? formatSnapshot
  • testing disk state. packuments match fixtures? what about script execution?
  • testing logical state. formatSnapshot will be the big thing here. npm query output is maybe a good starting point? how do nodes serialize?

npm snapshot command tests are probably the best latest iteration on this idea, setting t.context. for testing both disk and logical state

Steps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants