Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

select.lua: select from the watch history with g-h #15655

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

guidocella
Copy link
Contributor

Implement saving watched paths and selecting them.

--osd-playlist-entry determines whether titles and/or filenames are shown. But unlike in show-text ${playlist} and select-playlist, "file" and "both" print full paths because history is much more likely to have files from completely different directories, so showing the directory conveys where files are located. This is particularly helpful for filenames like 1.jpg.

The last entry in the selector deletes the history file, as requested by Samillion.

The history could be formatted as CSV, but this requires escaping the separator in the fields and doesn't work with paths and titles with newlines, or as JSON, but it is inefficient to reread and rewrite the whole history on each new file, and doing so overwrites the history with an empty file when writing without disk space left. I went with an hybrid of one JSON array per line to get the best of both worlds. And I discovered afterwards that this was an existing thing called NDJSON or JSONL. Since there are these 2 competing standards it is not clear if the file extension should be ndjson or jsonl, so I just used txt.

watch_history_path is awkwardly documented along with the key binding because I don't think it's worth adding a select.lua section to the manual just for this. I will add it and move it there if I add more script-opts in the future.

Copy link

github-actions bot commented Jan 5, 2025

Download the artifacts for this pull request:

Windows
macOS

DOCS/man/mpv.rst Outdated
@@ -328,6 +328,12 @@ g-l
g-d
Select an audio device.

g-h
Select a file from the watch history. Requires
``--script-opt=select-save_watch_history=yes``. The history path is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be native option. The fact that it is implemented in select.lua shouldn't force use to use --script-opt=select- which is little bit verbose. Also I consider script-opts to something that should be used to already specific script behavior. Here we have whole separate feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean only save_watch_history should be an option but watch_history_path and any future config should be a script-opt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I mean all options should be an option.

Copy link
Contributor

@kasper93 kasper93 Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also on that note, I think the history saving logic should be moved to own script, select should be left for selection only imho.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The saving part is just 17 lines out of 85 though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I felt that it didn't make sense to separate them when select.lua has to be perfectly aware of which format it was saved in right above

@@ -18,6 +18,13 @@ License along with mpv. If not, see <http://www.gnu.org/licenses/>.
local utils = require "mp.utils"
local input = require "mp.input"

local options = {
save_watch_history = false,
watch_history_path = "~~state/watch_history.txt",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For symmetry with watch_later, maybe don't use .txt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

watch_later is a directory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you meant the files inside watch_later. Dunno I think those are a weird case. I think the txt extension makes it easy to open it in a text editor by default e.g. in Windows or with zsh suffix aliases which are what I use. I also used console_history.txt in the other PR so that would have to be changed too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, maybe use .json or .jsonl in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well as I wrote in the commit message I didn't use jsonl only because I can't figure out whether ndjson or jsonl is preferable. Will switch to jsonl.

Comment on lines 366 to 367
local path = mp.command_native({"normalize-path", mp.get_property("path")})
local title = mp.get_property("playlist/" .. mp.get_property("playlist-pos") .. "/title")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move those after if not history_file check.

player/lua/select.lua Outdated Show resolved Hide resolved
player/lua/select.lua Outdated Show resolved Hide resolved
player/lua/select.lua Outdated Show resolved Hide resolved
player/lua/select.lua Outdated Show resolved Hide resolved
player/lua/select.lua Outdated Show resolved Hide resolved
player/lua/select.lua Outdated Show resolved Hide resolved
@guidocella guidocella force-pushed the history branch 2 times, most recently from fd03b7c to d80ee92 Compare January 5, 2025 23:18
@mrfragger
Copy link

mrfragger commented Jan 6, 2025

This works well. Comparing it to memo.lua which is a history script. It appends at end of file. When items hit around 400 history entries or so it won't open in one or two seconds. Sometimes it takes 5 seconds or more. So I trim the history file just keeping the last 333 entries.

# run "bin/bash" "-c" "cd ~/.config/mpv; cat memo-history.log | tail -n 333 > memo.log ;rm memo-history.log; mv memo.log memo-history.log";show-text "Trimmed ~/.config/mpv/memo-history.log to 333 lines to speed up opening History memo" #! [History Chapters Bookmarks] > Trim History log if slow to open

It doesn't show duplicates which is fine. However, a suggestion I have if it doesn't create a problem with long text that won't fit on the osd that is. Not sure if it's wrapped or what. Is to add the current chapter within the file appending it after a # symbol.

In the bookmarker-menu.lua I modified it to save bookmarks with the current chapter. This will help differentiate rather than just timestamps or time position to indicate where one is within the file.

function makeBookmark(bname)
    local chaptername = mp.get_property("chapter-metadata/title") or ""
    if mp.get_property("path") ~= nil then
        if bname == nil then bname = mp.get_property("media-title") .. " #" .. chaptername .. " @ %t" end
        local bookmark = {
            name = parseName(bname),
            pos = mp.get_property_number("time-pos"),
            path = parsePath(mp.get_property("path")),
            version = 2
        }
        return bookmark
    else
        return nil
    end
end

input.conf

g-h script-binding select/select-watch-history #! [Utilities] > [select.lua] > Open History
g-w script-binding select/select-watch-later #! [Utilities] > [select.lua] > Select Watch Later

mpv.conf

save-watch-history=yes
write-filename-in-watch-later-config=yes

@guidocella
Copy link
Contributor Author

guidocella commented Jan 6, 2025

This has a clear history entry and is usable with tens of thousands of lines, in fact I was testing the performance of searching in 150k lines yesterday and it was still usable if we switch to the C version of fzy, though opening it is slow. It opens pretty fast now, 0.2 seconds for 100k lines. Apparently prepending to tables is very slow in Lua.

This doesn't trim duplicates but it doesn't really have duplicates as files are logged with different timestamps every time which we can be useful to check.

Text longer than the OSD is cropped. This appends to history when files are opened so there is no chapter or time position information at all. We have watch later files for resuming from the last position, so I assume you were talking about the watch later selection? That also doesn't show any time position, let alone chapters. I don't know if showing the time position before resuming is needed, I have never felt the need for that as it automatically resumes from that position. If you mean to use it for bookmarks of scenes watch later is not suitable for that since config files are deleted on resume, EDL is.

@guidocella guidocella force-pushed the history branch 2 times, most recently from 423c3ba to 30e0c38 Compare January 6, 2025 09:07
@mrfragger
Copy link

sounds great performance wise. "Clear history" sounds better than "Clear the history"

Oh wasn't aware time it appends upon open so yeah that's a no-go like you say. So watch-later probably wouldn't work since path is there already and text would be too long.

Forgot also on memo.lua I add chapters to the history entries

    if not mark_hidden then
        local chaptername = mp.get_property("chapter-metadata/title") or ""
        local playlist_pos = mp.get_property_number("playlist-pos") or -1
        local title = playlist_pos > -1 and mp.get_property("playlist/"..playlist_pos.."/title") or ""
        local title_length = #title + #chaptername + 2
        local timestamp = os.time()

        entry = timestamp .. "," .. (title_length > 0 and title_length or "") .. "," .. title .. " #" .. chaptername .. "," .. full_path
    elseif last_state then
        last_state.hidden_files[full_path] = last_state.current_page * 10000 + item_index
    end

@kasper93
Copy link
Contributor

kasper93 commented Jan 6, 2025

though opening it is slow.

If opening is slow, we should consider not using json. It brings no value. Simple separated values in each line will work the same and avoid parsing json.

This doesn't trim duplicates but it doesn't really have duplicates as files are logged with different timestamps every time which we can be useful to check.

I'd prefer for it to be de-duplicated in select, else searching for a file, with list this file with multiple dates, which makes searching history more troublesome.

@avih
Copy link
Member

avih commented Jan 6, 2025

I was testing the performance of searching in 150k lines

I don't have a real stake in this, but since it already came up before, IMO the range of input sizes should be discussed, and agreed upon, because this can affect the solution.

Personally I don't care about 150K lines of anything. I don't use playlists with more than 20 items, I don't have thousands of tracks I need to choose from, etc. I'm using my own console with my own history implementation (deduplicatd, recent-sorted), and I know that it just doesn't keep growing beyond orders of magnitude less than 150K, despite me using it very frequently for many years now, without cleaning up the history.

But maybe there could be cases of many many thousands of entries, like playlists, etc, that people agree should be handled well, and in that case the target sizes should be agreed.

If there's an agreement that such sizes are required to be handled well, then IMO more appropriate solutions should also be considered, like sqlite, or some performant key-value store etc.

Maybe specifically 150K can still be handled carefully without external tools/libraries, also on the RPI, but at some stage, if the size being mentioned keeps growing - as it seems to be happening all the time, it can becomes too big to keep handling it internally in some naive fashion IMO.

@guidocella
Copy link
Contributor Author

How can loading 150k lines not be slow JSON or not? CSV will not work or is not simple when paths or titles contain newlines, and a new parser may be slower if not written in C like parse_json. e.g. LuaJIT fzy is over 10 times slower than the C version. It opens perfectly fast with sane history sizes.

@kasper93
Copy link
Contributor

kasper93 commented Jan 6, 2025

How can loading 150k lines not be slow JSON or not?

Well, there are databases that processes billions of rows fast. It's all matter of perspective.

You brought up that it is slow. So I'm offering a different view on this. Whether we want to keep it slow is another question.

not written in C

There is nothing wrong with C. Whole mpv is written in C. watch-later is also C and I believe it correctly escapes filenames, no?

EDIT:

Also loading a full file on every search is slow, which could be easily avoided. Again the question is how much slowness is acceptable.

@guidocella
Copy link
Contributor Author

Databases can search through billions of rows quickly with indexes but if you actually retrieve all of them at once to use them somewhere else that is slow.

watch later indeed escapes incorrectly since it converts newlines to _, such files don't work in the watch later selector. There was also an issue about cookie options with newlines not working in watch later files.

Added deduplication.

@kasper93
Copy link
Contributor

kasper93 commented Jan 6, 2025

watch later indeed escapes incorrectly since it converts newlines to _, such files don't work in the watch later selector. There was also an issue about cookie options with newlines not working in watch later files.

Should this be fixed?

Comment on lines 394 to 398
for i, seen_path in ipairs(paths) do
if seen_path == path then
table.remove(items, i)
table.remove(paths, i)
break
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this n^2 search will be slow. How about we keep a map for such that map[path] = i and get i if exist, and instead of removing, just update the items[i] with the new date?

Copy link
Contributor Author

@guidocella guidocella Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that at first but it doesn't work because i is updated as you prepend items. Also I tried doing this with the console history and it was actually much slower than looping with huge histories, maybe huge tables with long string keys cause issues. I must have hallucinated this, maps are faster.

Copy link
Contributor Author

@guidocella guidocella Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested again with 100k+ lines and performance is almost the same between looping and keeping a map. Also updating items[i] as you said doesn't work because the new watched entry should be at the beginning. So while with the loop so we can just prepend items, with the map we have to decrement every i in the map after the removed one and revert tables at the end, that's probably slower.

Copy link
Contributor

@kasper93 kasper93 Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless, I would still like it to be gone from reading loop.

How about we add after for line in history_file:lines()

local seen = {}
local _items = {}
local _paths = {}

for i, path in ipairs(paths) do
    if not seen[path] then
        seen[path] = true
        table.insert(_items, items[i])
        table.insert(_paths, path)
    end
end

EDIT: wait, I'm still thinking.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should really use a BST to store history in this case since you want the history to be always sorted by date with frequent search/insert/delete. You can keep a map such that map[path] = date and then finding the entry with such date and moving it to the front would be only O(log n) instead of O(n).

Copy link
Contributor Author

@guidocella guidocella Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested that this is only marginally faster with 100k unique lines and 50% faster by repeatedly concating 600 history lines until you get a 100k lines file.

Copy link
Contributor Author

@guidocella guidocella Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind I was still prepending to the tables. It's 3 faster even with unique lines by just replacing prepending with appending.

Ignore my wrong measurements, this is over 40 times faster than before with unique lines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I like simple "append only" file, because we don't need to modify the rest of it.

Isn't modification needed if you want history deduplication at runtime without reloading the history file? That's where BST is useful here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it is reloaded on each search.

Do you think added complexity is worth it? After first file load it would be cached by the kernel anyway, so consecutive loads should be fast, except maybe next time after it is evicted. Keeping local state of history is valid, but not sure it is worth the added complexity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends whether if the current performance is good enough. I only mentioned the "big O" because there seemed to be some performance problems with a very large history file.

@guidocella
Copy link
Contributor Author

Should this be fixed?

It seems the author already had a fix in #13016, I didn't see that before. Though paths in comments would have to be fixed separately from options, select.lua doesn't have access to the config file parser. At least with JSON we get it working for free.


``--save-watch-history``
Whether to save which files are played (default: no). These can be then
selected with the default ``g-h`` key binding.
Copy link
Member

@sfan5 sfan5 Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

documenting it like this might make people think it's an option built into the mpv core, so it should say that it belongs to select.lua (or at least that it requires Lua)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it could have been kept as a script-opt if that's an issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it an issue? It's in implementation detail how it's implemented imho.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I think it would be nice to know what exactly you lose when you disable Lua, but it's not of high importance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should remain a script-opt. No other core options do nothing, like this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as many other options that do nothing if mpv is compiled without some library dependency.

See #14624: the direction is that core options should either warn or don't exist at all when compiled without dependency. And there will be even more complexity when dealing with this history feature which can be unloaded at runtime.

Copy link
Contributor Author

@guidocella guidocella Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add a 20 line script just to append to the history? Since the writing code is 20 lines vs 90 lines for the selection code, the selection code needs to know exactly how the writing code above formats the JSON, and they both needed the same watch_history_path script-opt, I thought they might as well be together. And if not using global options it would have been very confusing if users have to configure script-opts related to saving the history in history.lua and script-opts related to reading and formatting the history in select.lua, and that 1 of the 2 scripts has to call read_options with the name of the other script to get shared script-opts like watch_history_path. Not sane.

select.lua doesn't really handle selection, it gathers and formats data to be selected in console. I don't see how gathering and formatting history is different from any of its other selectors.

I don't understand what not overriding OSD styles has to do with options vs script-opts for these history ones. I made scripts respect existing OSD options when feasible to not make users configure the same thing multiple times. Actually this does exactly the same thing with --osd-playlist-entry. But IMO it makes sense that options that only have effect within a script are its script-opts, it makes it clear what script they affect, what script you need to edit if you want to alter the behavior, and that you lose that functionality by disabling Lua. It also avoids bloating global options if this gains several script-opts in the future, imagine if every OSC script-opt was a global option. Maybe there is value in having easier to type shortcuts like we already have in --ytdl-format and --ytdl-raw-options but dunno if that's worth losing clarity and consistency. At that point if we want to make it more usable a better way would be some GUI button that appends the option to enable history in mpv.conf.

Copy link
Contributor Author

@guidocella guidocella Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe only the writing part can be converted to C since that's small, after mp_delete_watch_later_conf(). That provides a reason to use global options.

Copy link
Contributor

@kasper93 kasper93 Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really need to put an end to this ongoing debate about Lua vs. C implementation.

This issue came up in the other PR as well. Initially, it was implemented in C, then it was requested to be reimplemented in Lua. Afterward, a major concern arose: "What if Lua isn't enabled?"... can we not troll ourselves again?

We need to make a clear decision: either we allow "core" functionalities to be implemented in Lua, or we don't. We can't keep standing in the middle of the road, indecisive, every time this comes up.

In my personal opinion which may not reflect opinion of others. I think implementing features like history in Lua is fine and many of mpv features could be implemented as such. It is also valid to implement it in C, in this cases, there is not really a difference, as the code to save/load watch-history is trivial in both cases (it's calling our json parser anyway). I'm fine with having it in Lua. But at the same time, I think it should be considered "core" option, meaning options should be top-level, not script-opt, as the fact it is implemented in Lua shouldn't dictate the user interface to configure such feature. And if this is a problem, we should stop implementing features in Lua.

I'm fine with the current state of this PR. I don't see technical issue with the current implementation. It's small, works. But if others don't agree on this, we will need to think of other implementation. But please make it clear that we don't want to use Lua, so we don't make this mistake again in the future.

The only concern about Lua implementation is that, each new script spawns new mpv_client with own event loop, which may not be a big deal, but the constant overhead is there. I agree though that putting everything in select.lua is bit out of place. But in the same time, spawning clients for 30 lines scripts is little bit meh.

EDIT: Also lua code will be bigger than compiled C code in final binary. So there is that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe only the writing part can be converted to C since that's small

Yeah, maybe. I think in this case small save function in C will resolve all concerns about options and need for separate history.lua and select.lua will have only selector, same as for watch-later. The save code in C will be less than it is in Lua actually...

@guidocella guidocella force-pushed the history branch 3 times, most recently from 02ef821 to 6afdfe2 Compare January 6, 2025 13:27
player/lua/select.lua Outdated Show resolved Hide resolved
@guidocella guidocella force-pushed the history branch 2 times, most recently from 382af52 to d0e0679 Compare January 6, 2025 14:56
player/lua/select.lua Show resolved Hide resolved
player/lua/select.lua Show resolved Hide resolved
@verygoodlee
Copy link
Contributor

history file may expose privacy-sensitive information, request to add a blacklist option to prevent some files from being stored in the history file, the syntax can be consistent with ytdl_hook-exclude .

local time, path, title = unpack(entry)

if not path then
mp.msg.warn(line .. " in " .. history_file_path .. " has invalid data.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put this in else branch of if entry then in main loop. And instead printing line, which will leak history in log and also if it is malformed it may not be printable, I would log i, number of the line that fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The else branch also runs if the entry is well formed but an already seen path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reorganized it again so what you suggested here can be done and it also no longer keeps the file lines in memory.

Comment on lines 419 to 427
for i, entry in pairs(entries) do
local item = entry.path
if entry.title and osd_playlist_entry == "title" then
item = entry.title
elseif entry.title and osd_playlist_entry == "both" then
item = entry.title .. " (" .. entry.path .. ")"
end
items[i] = entry.date .. item
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put this back it the main loop, can be inside if entry then branch. I will see about making it more generic in another PR.

}

char *title = (char *)find_non_filename_media_title(mpctx);
mpv_node items[3] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mpv_node items[3] = {
mpv_node items[] = {

return;

void *ctx = talloc_new(NULL);
char *history_path = mp_get_user_path(ctx, mpctx->global,
Copy link
Contributor

@kasper93 kasper93 Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be null when no-config is used. Or empty string, either way it has to be checked, before fopen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is watch_history.jsonl with --no-config --save-watch-history

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what happens it saves history in cwd?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.

elseif entry.title and osd_playlist_entry == "both" then
item = entry.title .. " (" .. entry.path .. ")"
end
items[i] = entry.date .. item
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
items[i] = entry.date .. item
local status, date = pcall(os.date, "(%Y-%m-%d %H:%M) ", time)
items[i] = (status and date or entry.time) .. item

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is showing UNIX timestamps to end users useful? It would only cause confusion.

player/lua/select.lua Outdated Show resolved Hide resolved
player/lua/select.lua Outdated Show resolved Hide resolved
Comment on lines 384 to 396
return {
["date"] = date,
["path"] = path,
["title"] = title,
}
Copy link
Contributor

@kasper93 kasper93 Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return {
["date"] = date,
["path"] = path,
["title"] = title,
}
return {
time = entry[1],
path = entry[2],
title = entry[3],
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or whatever, keep it as simple as possible.

@guidocella guidocella force-pushed the history branch 2 times, most recently from 782f5db to 30134d9 Compare January 7, 2025 16:25
player/lua/select.lua Outdated Show resolved Hide resolved
Watch History
-------------

``--save-watch-history``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the file format should be documented.

Comment on lines 1543 to 1561
mpv_node items[] = {
{.format = MPV_FORMAT_INT64, .u = {.int64 = time(NULL)}},
{.format = MPV_FORMAT_STRING, .u = {.string = mp_normalize_path(ctx, mpctx->filename)}},
{.format = MPV_FORMAT_STRING, .u = {.string = title}},
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on the file format:

  • Since JSON is used per line, each line should be an object with meaningful key names rather than an array. This way we can add new fields in any position without breaking existing parsers.
  • There should be a header indicating the format version, in case we need to make incompatible changes later.
  • Time stamp has only second precision, I would like to at least use milliseconds.

Copy link
Contributor

@kasper93 kasper93 Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since JSON is used per line, each line should be an object with meaningful key names rather than an array. This way we can add new fields in any position without breaking existing parsers.

I was thinking about this, but this would mean we save a lot of redundant characters, just for the sake of saving them. We use json export mostly to escape paths and filenames.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought of saving keys but didn't know if it's worth bloating the file. A version header is overkill, this isn't an API or a database of critical data, users can just clear the history in the unlikely event the format changes. I don't know why you would need milliseconds for watch history, they don't even work with os.date which errors with decimal values and currently not even the seconds are printed since they are noise.

Copy link
Contributor

@kasper93 kasper93 Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A version header is overkill, this isn't an API or a database of critical data, users can just clear the history in the unlikely event the format changes.

Header version would be good idea. But I think it will be easy to preserve compatibility with old version. Just add new fields to existing structure, don't change existing ones (even if they become unused). It should be fine.

Also having forward compatible structure allows to mix old and new in one file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this, but this would mean we save a lot of redundant characters, just for the sake of saving them.
A version header is overkill, this isn't an API or a database of critical data

Being a core functionality, the file can be expected to be parsed by third party scripts, and is part of the player interface. Other custom file formats currently used by mpv are written in a higher standard than this, such as EDL which is extensively documented and has a version header.

users can just clear the history in the unlikely event the format changes.

In the current implementation, mpv core isn't the one parsing the history. mpv can change format but it must be easily detectable so external programs can reliably know how to parse them. Users don't need to clear history if the external program supports all past and future formats.

they don't even work with os.date which errors with decimal values

You can just separate the integer part.

and currently not even the seconds are printed since they are noise.

Just because select.lua doesn't need it doesn't mean other programs aren't interested in it. I'm OK if you don't want to do it, but this also brings back to my file format point. If someone wants a high precision later and changes the timestamp unit to milliseconds, it should be clearly indicated in the file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being a core functionality, the file can be expected to be parsed by third party scripts, and is part of the player interface.

Even more reason to use extensible structure and only extend it, not break compatibility with existing tools.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think EDL is comparable since users have to generate EDL playlists by themselves so of course they need to know how to format them, I have my own published script generating EDL files to save bookmarks. Question is do we want to use this file to just show recent files or is it supposed to be some kind of documented database format? It's probably convenient to just open the jsonl file and check how each line is formatted and reuse it e.g. to make a script to show recent files in a different menu, but the Lua code to generate was literally 20 lines so it's not like it buys you much compared to creating your own history file. If someone wants to save arbitrary data like milliseconds for arbitrary use cases they're probably better off doing their own saving in file-loaded so they can save whatever they want in any format they want.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is still good to document the new format like @na-na-hi requested and once documented with assurances about the stability etc. we can discuss about the details. I think for it being very simple structure there is no need much, just description of the field and how possibly it can be extended/changed in the future.

json_write(&dst, &node);
dst = talloc_strdup_append(dst, "\n");

bool fail = fwrite(dst, strlen(dst), 1, history_file) != 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A short write here could leave garbage data behind and cause parsing error even when future writes succeed.

It should try to recover (truncate) the file back to its original state if that happens.

Copy link
Contributor

@kasper93 kasper93 Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you do that? ftell before write and truncate on errors? Generally fwrite is buffered so this won't fail on writes unless we are out of space and even then I think it will fail on fclose likely. To make sure it is written we would need to call fflush check that too and try to restore. But it may be likely not possible if storage is gone (in case of network) or remounted as RO. Realistically we could recover out of space error.

Generally speaking we can make it robust, but I don't think the impact is that big given that we will skip broken entries without crashing. IIRC zsh history has similar problem, because I could break it when my filesystem remouts to RO before sync...

EDIT: Generally to ensure consistency we shouldn't update file in-place. We should copy the file, update in the temporary file, close and rename when fully written. Rename is atomic and will replace a file with fully working one or not replace it at all if we had errors. Though this changes the simple "append to file" to read, append, replace.

EDIT2: Of course we should also call sync to ensure the file is actually written to the storage device.

EDIT3: Do you think this is a blocker for this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally to ensure consistency we shouldn't update file in-place. We should copy the file, update in the temporary file, close and rename when fully written.

When the history file is large, copying it every time a new file is played will result in a lot of disk IO, which is not cost-effective.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you do that? ftell before write and truncate on errors? Generally fwrite is buffered so this won't fail on writes unless we are out of space and even then I think it will fail on fclose likely.

ftruncate(2). The file size before modification is already known. fwrite is buffered, but we don't know what's the configured buffer size, and lines can be larger than buffer size.

Generally speaking we can make it robust, but I don't think the impact is that big given that we will skip broken entries without crashing.

This works only because lines are independent in JSONL format. It won't work if some other formats are used, like a regular JSON array for entries. And even in this case it still would be very annoying to see broken line warnings every time the history file is read.

It's better to at least make the warning inform the user to delete the bad lines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ftruncate(2). The file size before modification is already known. fwrite is buffered, but we don't know what's the configured buffer size, and lines can be larger than buffer size.

That's what I meant, but like I said before this will not fix any error except no disk space in practice. ftruncate will fail the same way if filesystem becomes unwritable. Also needs wrapper for Windows.

This works only because lines are independent in JSONL format. It won't work if some other formats are used, like a regular JSON array for entries.

Did anyone propose different format?

It's better to at least make the warning inform the user to delete the bad lines.

There is a warning during writting and reading, where else do you want a warning?

@guidocella guidocella force-pushed the history branch 2 times, most recently from a9b847a to 26fe1df Compare January 7, 2025 21:51

This file contains one JSON array per line. The first field is the UNIX
timestamp when the file was opened, the second field is its normalized path,
and the third file is its title, or missing if no title was present. Any new
Copy link
Contributor

@kasper93 kasper93 Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

title should be empty string if not available. Or follow nanahi suggestion and use proper key-value json.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also show how it looks like [timestamp, "file_path", "title"]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated it before reading your edit. I saved null as the 3rd column since that's more correct for unavailable values and easier to check for thrutiness in Lua, also I've had actual videos with an empty string as title. But yeah we can just use keys if predictable column order is an issue.

Copy link
Contributor

@kasper93 kasper93 Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But yeah we can just use keys if predictable column order is an issue.

I think we should try that, it will be easier to read and update this format. (the idea of adding null in every entry is iffy to me)


seen[path] = true

local status, date = pcall(os.date, "(%Y-%m-%d %H:%M) ", time)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This date format should be configurable and should include seconds by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a script-opt but there's nowhere to document it until I add a select.lua section to the docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also added hide_history_duplicates script-opt that's true by default since I prefer to show them.


This file contains one JSON array per line. The first field is the UNIX
timestamp when the file was opened, the second field is its normalized path,
and the third file is its title, or missing if no title was present. Any new
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also show how it looks like [timestamp, "file_path", "title"]

@guidocella guidocella force-pushed the history branch 2 times, most recently from 783f7fa to f92f186 Compare January 9, 2025 10:23
return entry.title .. " (" .. entry.path .. ")"
end

return entry.path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the media-title is not available, it will display only the filename.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I wrote in the commit message how are you supposed to understand the history with filenames like 1.jpg?

Copy link
Contributor

@kasper93 kasper93 Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Act like both always if that an issue. Titles are also often completely non informative.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both doesn't show paths either?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is both not showing path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both show filenames, wasn't your quote referring to the fact that this shows paths instead of filenames?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return entry.title .. " (" .. entry.path .. ")" shows title and path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean in show-text and select-playlist both shows filenames, while here I made it show the path because you usually have files from completely different directories unlike in those case so the full path is needed to understand the history.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you say that a full path is required to understand the history it should always be shown.

<title|filename> (path)


This file contains one JSON array per line. The first field is the UNIX
timestamp when the file was opened, the second field is its normalized path,
and the third file is its title, or missing if no title was present. Any new
Copy link
Contributor

@kasper93 kasper93 Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But yeah we can just use keys if predictable column order is an issue.

I think we should try that, it will be easier to read and update this format. (the idea of adding null in every entry is iffy to me)

@guidocella guidocella force-pushed the history branch 2 times, most recently from c653be6 to e7fbb73 Compare January 9, 2025 11:18
The history could be formatted as CSV, but this requires escaping the
separator in the fields and doesn't work with paths and titles with
newlines, or as JSON, but it is inefficient to reread and rewrite the
whole history on each new file, and doing so overwrites the history with
an empty file when writing without disk space left. So this uses a
hybrid of one JSON object per line to get the best of both worlds. This
is called NDJSON or JSONL.
@guidocella guidocella force-pushed the history branch 3 times, most recently from 55dff1a to 9f8f9b1 Compare January 9, 2025 14:41
Implement selection of the entries in the watch history.

The last entry in the selector deletes the history file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants