Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hanging on generateDiffRows #188

Open
markwoon opened this issue May 7, 2024 · 7 comments
Open

Hanging on generateDiffRows #188

markwoon opened this issue May 7, 2024 · 7 comments

Comments

@markwoon
Copy link

markwoon commented May 7, 2024

The code:

    List<String> original = Files.readAllLines(oldFile);
    List<String> revised = Files.readAllLines(file);

    DiffRowGenerator generator = DiffRowGenerator.create()
        .showInlineDiffs(true)
        .mergeOriginalRevised(true)
        .inlineDiffByWord(true)
        .oldTag(f -> f ? "<s style=\"background-color: #bbbbbb\">" : "</s>")
        .newTag(f -> f ? "<b style=\"background-color: #aaffaa\">" : "</b>")
        .build();

    List<DiffRow> rows = generator.generateDiffRows(original, revised);

The files: test.zip

generator.generateDiffRows is hanging for me with these 2 files.

System

  • Java version: 17
  • java-diff-utils version: 4.12
Copy link

github-actions bot commented Jul 7, 2024

Stale issue message

@markwoon
Copy link
Author

markwoon commented Jul 7, 2024

Still hoping for a fix.

Copy link

github-actions bot commented Sep 6, 2024

Stale issue message

@markwoon
Copy link
Author

markwoon commented Sep 7, 2024

Still hoping for a fix.

@wumpz
Copy link
Collaborator

wumpz commented Sep 24, 2024

Sorry, that I was unavailable for so long.

I build a test around your data. May I introduce it into the project or are there any bits of non-publishable data in it?

One first remark. The difftools some time ago introduced a second diff algorithm: a variant of Myers algorithm with linear space usage which performs usually better for large datasets. You could try this one. Using the following snipped you introduce a different algorithm into the library:

List<DiffRow> rows = generator.generateDiffRows(original, DiffUtils.diff(original, revised, new MyersDiffWithLinearSpace<String>() ));

Btw, you could set the default algorithm using DiffUtils.withDefaultDiffAlgorithmFactory().

However, after digging around I was not able to do the diff with the original algorithm. The line normalizer and word splitter inflate the dataset so drastically, that it is not usable. The linear space algorithm generates other change sets.

@wumpz
Copy link
Collaborator

wumpz commented Sep 24, 2024

You should know the linear space algorithm is not tested as deeply as the original implementation.

@markwoon
Copy link
Author

Thanks for looking into this.

I build a test around your data. May I introduce it into the project or are there any bits of non-publishable data in it?

Yes, go ahead.

I'll try your suggestion of the alternate algorithm the next time I return to that particular project, but sadly won't get to it any time soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants