Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fluid AI] Update ai-collab client API to transform TreeEdit types into a consumable version for UI visualizations #23117

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

chentong7
Copy link
Contributor

Description

Currently, the only way to get a set of objects for visualizing each change an LLM makes to your tree using the aiCollab library is for users to use the sharedTreeDiff utility that was created for the implicit strategy separately. This utility is not perfect, likely does not cover many edge cases and is fickle. However, when using the explicit strategy we maintain an edit log internally that is the source truth. Currently, we don't return the editLog because the TreeEdit type of the edit log is not simple for users the consume and visualize changes on their UI.

The purpose of this ticket is to update the explicit strategy's entry point function generateTreeEdits by taking the editLog, which is an array of TreeEdit, and transforming it into a new set of objects "diffs" that can be easily used for UI visualizations and return it along with the success/fail response.

Test

/workspaces/FluidFramework/node_modules/.bin/mocha ./../../../dist/test/ai-collab/jobBoardBenchmark.spec.js --no-timeouts --exit

spec.js:54
  AI Job Listings App Benchmark
spec.js:54
    - Create a new Job with the title 'QA tester' and add a candidate named 'John Doe', who is only available on mondays and tuesdays, to the job.
spec.js:66
  0 passing (3ms)
base.js:411
  1 pending

@github-actions github-actions bot added base: main PRs targeted against main branch area: examples Changes that focus on our examples area: framework Framework is a tag for issues involving the developer framework. Eg Aqueduct public api change Changes to a public API labels Nov 15, 2024
Copy link
Contributor

🔗 No broken links found! ✅

Your attention to detail is admirable.

linkcheck output


> [email protected] ci:check-links /home/runner/work/FluidFramework/FluidFramework/docs
> start-server-and-test "npm run serve -- --no-open" 3000 check-links

1: starting server using command "npm run serve -- --no-open"
and when url "[ 'http://127.0.0.1:3000' ]" is responding with HTTP status code 200
running tests using command "npm run check-links"


> [email protected] serve
> docusaurus serve --no-open

[SUCCESS] Serving "build" directory at: http://localhost:3000/

> [email protected] check-links
> linkcheck http://localhost:3000 --skip-file skipped-urls.txt

Crawling...

Stats:
  171745 links
    1640 destination URLs
    1840 URLs ignored
       0 warnings
       0 errors


Copy link
Collaborator

@msfluid-bot msfluid-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Coverage Summary

↓ packages.framework.ai-collab.src:
Line Coverage Change: -2.67%    Branch Coverage Change: No change
Metric NameBaseline coveragePR coverageCoverage Diff
Branch Coverage 0.00% 0.00% → No change
Line Coverage 35.59% 32.92% ↓ -2.67%
↓ packages.framework.ai-collab.src.explicit-strategy:
Line Coverage Change: -0.40%    Branch Coverage Change: No change
Metric NameBaseline coveragePR coverageCoverage Diff
Branch Coverage 77.98% 77.98% → No change
Line Coverage 77.60% 77.20% ↓ -0.40%

Baseline commit: 0b6d14f
Baseline build: 307935
Happy Coding!!

Code coverage comparison check failed!!
More Details: Readme

  • Skip This Check!!

What to do if the code coverage check fails:

  • Ideally, add more tests to increase the code coverage for the package(s) whose code-coverage regressed.

  • If a regression is causing the build to fail and is due to removal of tests, removal of code with lots of tests or any other valid reason, there is a checkbox further up in this comment that determines if the code coverage check should fail the build or not. You can check the box and trigger the build again. The test coverage analysis will still be done, but it will not fail the build if a regression is detected.

  • Unchecking the checkbox and triggering another build should go back to failing the build if a test-coverage regression is detected.

  • You can check which lines are covered or not covered by your tests with these steps:

    • Go to the PR ADO build.
    • Click on the link to see its published artifacts. You will see an artifact named codeCoverageAnalysis, which you can expand to reach to a particular source file's coverage html which will show which lines are covered/not covered by your tests.
    • You can also run different kind of tests locally with :coverage tests commands to find out the coverage.

Copy link
Contributor

@alexvy86 alexvy86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the comments below, a couple of things:

  • It would be nice to see how the React changes look in the running app.
  • Not sure I get what the Test section in the PR description is supposed to convey. 0 passing tests, 1 skipped test?

@@ -32,6 +32,7 @@ import { useSharedTreeRerender } from "@/useSharedTreeRerender";
// Uncomment the import line that corresponds to the server you want to use
// import { createContainer, loadContainer, postAttach, containerIdFromUrl } from "./spe"; // eslint-disable-line import/order
import { createContainer, loadContainer, postAttach, containerIdFromUrl } from "./tinylicious"; // eslint-disable-line import/order
// import { DiffViewer } from "@/components/DiffViewer";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove?

@@ -342,6 +343,7 @@ describe.skip("AI Job Listings App Benchmark", () => {
}
const foundJohnDoe = createJohnDoeCandidateTask.data;
assert(foundJohnDoe !== undefined);
assert(response.diffs !== undefined);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this test better by asserting what the diffs look like? Or are they non-deterministic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chentong7

Shouldn't we unskip the whole test suite (it's currently skipped in line 223)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error: 401 You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys

This is what I get when I run the test. I'd like to spend some time in tmrw's parking lot to ramp up on some AI-collab testing library / concepts 🤯 .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading through the suite, and the fact that it requires an API_KEY, this test is not a unit test, is more of an integration/e2e test. The code lives here now because here's where everything started, but it needs to move to a different location (potentially the e2e tests package?) later. So it should remain skipped as long as it's here.

Copy link
Contributor

@seanimam seanimam Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can extract the mapping of EditLog to Diff to a separate function we can use unit tests instead of this integration test. It isn't quite deterministic given we haven't set the temperature value on the model, and even when its set to be the most deterministic, technically it can give a different response.

Comment on lines +185 to +196

/**
* The GenerateTreeEditsResponse interface defines the structure of the response object
*
* @alpha
*/
export interface GenerateTreeEditsResponse {
status: "success" | "failure" | "partial-failure";
errorMessage?: string;
tokensUsed: TokenUsage;
diffs?: Diff[];
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we really need a new type; should we just add the diffs to the two existing response types we already have?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should be reusing the existing types here, not seeing value from this new type and we're exposing the internal naming scheme that doesn't match with the public api surface

@jikim-msft
Copy link
Contributor

Adding 21569 in description would help 😄 .

@jikim-msft
Copy link
Contributor

When running the example app, I think the LLM agent isn't returning any response (possibly because of missing API_KEY from the above comment?).

Screenshot 2024-11-18 at 16 46 43

@alexvy86
Copy link
Contributor

When running the example app, I think the LLM agent isn't returning any response (possibly because of missing API_KEY from the above comment?).

Yeah, in order for the AI collab to work you need an OpenAI API key for a paid account.

Comment on lines +185 to +196

/**
* The GenerateTreeEditsResponse interface defines the structure of the response object
*
* @alpha
*/
export interface GenerateTreeEditsResponse {
status: "success" | "failure" | "partial-failure";
errorMessage?: string;
tokensUsed: TokenUsage;
diffs?: Diff[];
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should be reusing the existing types here, not seeing value from this new type and we're exposing the internal naming scheme that doesn't match with the public api surface

Comment on lines +166 to +171
const diffs: Diff[] = editLog.map((log, index) => ({
id: `diff-${index}`,
// eslint-disable-next-line @typescript-eslint/strict-boolean-expressions
type: log.error ? "error" : "edit",
description: log.error ?? log.edit.explanation ?? "No description available",
}));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend making this a separate function. You will be able to unit test it as well this way.

What does the id here represent? This should be the id of the tree node that was edited, which the log object should have. Additionally, these diff's need more information. We need to know what type of edit was this, for example:

  • Is it a modification of an existing object? If so, what fields were modified and what is the new values if you can provide that?
  • is it an array operation? For example, the insertion of a new object, deletion, move? If so, what index was it inserted/moved/removed from?

}

/**
* The Diff interface defines the structure of the response object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment needs a more detailed explanation, this doesn't explain the concept of what a diff is.

@@ -46,6 +46,16 @@ export function createMergableDiffSeries(diffs: Difference[]): Difference[];
// @alpha
export function createMergableIdDiffSeries(oldObject: unknown, diffs: Difference[], idAttributeName: string | number): Difference[];

// @alpha
export interface Diff {
// (undocumented)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be sure to document all new exported members before merging 🙂

@@ -182,3 +182,26 @@ export interface TokenLimits {
*/
readonly outputTokens?: number;
}

/**
* The GenerateTreeEditsResponse interface defines the structure of the response object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Response to what? Would be good to be a bit more descriptive here.

* - Primitive root nodes are not supported
*
* @internal
* The editLog is transformed into an array of Diff objects. Each Diff object includes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"editLog" is something that's defined within the function - the caller of this function likely won't know what it is. We should express the docs here in terms of what the function means to the caller.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: examples Changes that focus on our examples area: framework Framework is a tag for issues involving the developer framework. Eg Aqueduct base: main PRs targeted against main branch public api change Changes to a public API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants