Skip to content

[input-] fix editline() for characters having screen width > 1 #2662

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

midichef
Copy link
Contributor

@midichef midichef commented Jan 3, 2025

The input editor assumes characters have a screen width of 1. For double-width characters like , many alignment artifacts show up on screen. For example, edited values can extend 1 character past the edge of a visidata cell. Or after pressing =, pasting several double-width characters into the input field, will cause ellipses to show up on the right side of the input. And scrolling horizontally within a cell, some characters are not visible when the cursor is on them.

To demonstrate some of the issues, open a sample sheet with:
echo "12345678901234567890\nABCDEFGHIJ01-2--3---4--5--6--7--8-9" |vd -f tsv -
The header is single-width ASCII. The row is double-width letters and numbers, with single width dashes.
Shrink the cell with z_ 10. Then e and then press Right many times to see the problems. For example, after scrolling all the way right with End, the last character, , cannot be seen.

This PR fixes all known alignment issues in InputWidget.editline() for such characters. To handle the case where the screen does not have width to show a character with width > 1 at the start or end of the cell, extra characters will appear. For example, if a cell is 1 column too small to show …ABCD…, the ellipses on the right side can be repeated in the column where would otherwise start (because takes up 2 columns and can't fit into the one on screen): …ABC…… Or the ellipses on the left side can be repeated in the column where would otherwise finish: ……BCD…

(I am happy to finally be doing the len() vs dispwidth() audit that I originally scheduled for summer 2023.)

@saulpw
Copy link
Owner

saulpw commented Jan 12, 2025

I'm glad you're doing the dispwidth audit, @midichef. It definitely has been a long time coming.

This particular code though is already pretty complicated, and I'm worried these improvements exceed the code budget here. Is this the essential complexity of accounting for wide characters when truncating a displayed string? I'm wondering if there's a single function like partition that can cover this ground with less cognitive overhead. Do you have a list of the known cases that were problematic? I'd like to put those into a unit test for whatever function(s) we eventually come up with.

@midichef
Copy link
Contributor Author

You're right, this PR is overly complicated. It also does not reproduce the existing desired behavior of the cursor for some corner cases. I've got a draft of a better version I'll submit soon.

@anjakefala
Copy link
Collaborator

@midichef Would you like us to close this one?

@midichef
Copy link
Contributor Author

@anjakefala
No, I'll just submit the modified version here and force-push. Is there a drawback to that I may not be considering?

@anjakefala
Copy link
Collaborator

@midichef Oh, that's fine! I forgot about force-pushing. That keeps the history of the conversation.

@midichef
Copy link
Contributor Author

midichef commented Apr 5, 2025

Okay, I've force-pushed a new version that is much simpler than my previous attempt. It's also more correct: the original commit changed some cursor alignment behavior, but the new commit keeps the existing behavior.

Some comments on my choices:

  1. I created a helper function _calc_display(). Doing that allowed me to simplify some control flow by doing return. Should this function be cached? It will be called often.
  2. The two clipdraw() calls seemed unnecessarily convoluted in drawing dispval in two segments: up to the cursor, and the segment after the cursor. I combined those to draw dispval in one call.
  3. I took advantage of clipdraw's builtin truncation character to avoid having to create one in _calc_display().

@midichef midichef force-pushed the editline_wide_chars branch from 7ca759a to eee33d0 Compare April 11, 2025 05:55
@midichef
Copy link
Contributor Author

midichef commented Apr 11, 2025

I added another commit cleaning up clipstr_start(). It takes out comments I'd left behind that were no longer correct or relevant. And expresses the loop variables a bit more naturally.

To expand on my note in clipstr_start about how the answers are incorrect for some cases: ideally the clipstr* family of functions would take a more sophisticated approach to clipping strings. clipstr_start uses a naive approach: it grows a suffix string backwards from the end of the string, until it's too big to fit in the given width. That general approach is incorrect for say, multiple combining characters. For instance, ỗ, which is the Vietnamese character "o" with two combining characters: a circumflex and a tilde above. The best design should give: clipstr_start('\u006f\u0302\u0303', 1)[0] == ỗ but the result is actually just the final character, the tilde: ̃. That's because dispwidth currently gives 2 when measuring the last two characters, so the substring search stops early, returning just the final tilde.

There's probably a library that handles clipping Unicode strings more thoroughly. Maybe that's a feature offered by uniseg, which has had recent updates this year. I'm not sure. in any case, incorporating that level of sophistication is beyond the scope of this PR.

For this PR, I think the approach used for clipstr_start() and clipstr_middle() is an acceptable match for vd's existing approach to clipping strings.

Copy link
Owner

@saulpw saulpw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whew! this has been quite a ride. This looks good to me by reading; @anjakefala can you do a quick perf pass with a large screen size compared to v3.1?

@midichef midichef force-pushed the editline_wide_chars branch from eee33d0 to c81f745 Compare April 14, 2025 07:26
@midichef
Copy link
Contributor Author

Okay, I just cleaned up the commits one last time, squashing most of the changes to clipstr_start() in eee33d0 into the earlier 071db3a that created clipstr_start().

The final state of the codebase is unchanged by this squash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants