Add support for Linux timerfd #3934

FrankReh · 2024-10-02T21:57:38Z

Does or will Miri have the notion of elapsed time, or simulated elapsed time? Is there already anything close to the clocking necessary to support timerfd_create and the associated _settime and _gettime?

https://man7.org/linux/man-pages/man2/timerfd_create.2.html

The text was updated successfully, but these errors were encountered:

oli-obk · 2024-10-02T23:34:31Z

Yes, we support simulated time in isolation mode and use the real system time outside of it. We probably can only support CLOCK_MONOTONIC unless CLOCK_REALTIME without the ability to set the time is ok. The other clockids seem odd enough to be rarely used, but i'll happily be convinced otherwise and then we can think about what it would take for them

FrankReh · 2024-10-03T01:58:31Z

Thank you. I found some nice examples of features already using the timespecand the MONOTIC and REALTIME clocks.

Let me keep this open for a little while. I may have questions once I understand how the timeout for Futex works. There the thread blocked on a Futex can be awoken if the duration is reached (I'm assuming). Here, the timerfd would become ready to read, and could be associated with an epoll.

FrankReh · 2024-10-04T17:11:55Z

When might level-triggered epoll be supported by Miri? Would it be easy enough for a new-comer to support with what's already in Miri or is it waiting for blocking support or something else that's not there yet? Asking for a friend ...

Actually, asking because it is how the async_io::Timer from the smol projects is built. That's how I got to look at timerfd_create and timerfd_settime (for a monotonic clock) in the first place.

tiif · 2024-10-04T17:49:39Z

When might level-triggered epoll be supported by Miri?

If I were to do it, I am currently left with #3665.

Would it be easy enough for a new-comer to support with what's already in Miri or is it waiting for blocking support or something else that's not there yet?

It should be easier as we already have structure provided for edge-triggered epoll. But I imagine this will still be a big feature that involves quite a bit of change. There will also be a bit of git conflict involved as it is overlapping with #3665. I can probably help with review and answer some questions, but I am not sure how much time I will be able to commit. Any thoughts from @oli-obk as the primary reviewer of epoll architecture?

Actually, asking because it is how the async_io::Timer from the smol projects is built. That's how I got to look at timerfd_create and timerfd_settime (for a monotonic clock) in the first place.

Do they hit unsupported_error for level-triggered epoll? I am interested in seeing how they use it so we can design the test similar to that.

FrankReh · 2024-10-04T20:19:31Z

I see the FileDescription trait has a get_epoll_ready_events method and there are two implementations for it, eventfd and unnamed_socket. The timerfd would be a third. And from my first reading of epoll::check_and_update_one_event_interest, this will be called when there is an epoll_ctl/add and when an fd has had an event that made it ready. There is no mechanism for an event to signal when it is no longer ready, but it wasn't needed for edge-triggered, and it seems it won't be needed for level-triggered either. If the epoll_wait causes each of its events of interest to be polled for readiness, it would seem the model is primed to support edge-triggered too, or that would be pretty easy to add.

I'll look through it again later, and @oli-obk may have weighed in by then. Because at the moment, I don't see how this is edge-triggered. If a read on a fd only reads some of the available data (let's say the buffer provided was too small for all the data), at first blush it would seem the second epoll_wait would return right away because the fd poll (on the unnamed_socket) would have indicated it had more data to read (or more data could be written). I don't see where the fd keeps track of it already having reported that (which would make it edge-triggered I think).

Anyway, I think a small test is in order - I'll look at the epoll tests that are passing. Hmm. I don't see any passing tests that read using a buffer. There is the test_epoll_block_then_unblock, and that writes 5 bytes, but then the test is done when the epoll_wait is tested once. Maybe there could be more tests, to see what happens after a read with a buffer of length 5 is provided, and another where the buffer length was shorter. So we could see that the edge-triggering aspect of the epoll was working as intended.

RalfJung · 2024-10-04T21:14:09Z

Don't we have a test that calls epoll_wait twice and checks we only get events the first time? That checks for edge triggered, I would say. But if you think there is an interesting test that is missing, please make a PR. :)

FrankReh · 2024-10-04T21:38:46Z

I missed the other epoll test file. Yes it does test for ET. So I can figure out how that works. Thx.

FrankReh · 2024-10-10T17:33:40Z

miri/src/helpers.rs defines this helper function for reading a variable of type libc::timespec.

    /// Parse a `timespec` struct and return it as a `std::time::Duration`. It returns `None`
    /// if the value in the `timespec` struct is invalid. Some libc functions will return
    /// `EINVAL` in this case.
    fn read_timespec(&mut self, tp: &MPlaceTy<'tcx>) -> InterpResult<'tcx, Option<Duration>> {
        let this = self.eval_context_mut();
        let seconds_place = this.project_field(tp, 0)?;
        let seconds_scalar = this.read_scalar(&seconds_place)?;
        let seconds = seconds_scalar.to_target_isize(this)?;
        let nanoseconds_place = this.project_field(tp, 1)?;
        let nanoseconds_scalar = this.read_scalar(&nanoseconds_place)?;
        let nanoseconds = nanoseconds_scalar.to_target_isize(this)?;

        interp_ok(
            try {
                // tv_sec must be non-negative.
                let seconds: u64 = seconds.try_into().ok()?;
                // tv_nsec must be non-negative.
                let nanoseconds: u32 = nanoseconds.try_into().ok()?;
                if nanoseconds >= 1_000_000_000 {
                    // tv_nsec must not be greater than 999,999,999.
                    None?
                }
                Duration::new(seconds, nanoseconds)
            },
        )
    }

Would someone advice how to write a corresponding helper function to
read a libc::itimerspec variable, a read_itimerspec()?

libc defines itimerspec like this.

    pub struct itimerspec {
        pub it_interval: ::timespec,
        pub it_value: ::timespec,
    }

While libc defines timespec like this.

    // linux x32 compatibility
    // See https://sourceware.org/bugzilla/show_bug.cgi?id=16437
    pub struct timespec {
        pub tv_sec: time_t,
        #[cfg(all(target_arch = "x86_64", target_pointer_width = "32"))]
        pub tv_nsec: i64,
        #[cfg(not(all(target_arch = "x86_64", target_pointer_width = "32")))]
        pub tv_nsec: ::c_long,
    }

As itimerspec is defined as two timespec, can the helper read_timespec be called twice by read_itimerspec
or do the fields need to be flattened out and read in one go?

Thanks.

RalfJung · 2024-10-10T20:21:15Z

I don't think we need a shared read_itimerspec helper, but the timerfd implementation should call read_timespec twice, yes.

FrankReh · 2024-10-10T20:25:23Z

Okay. Can I be pointed to other code doing something like that? The arg decomposition is still a mystery to me.

RalfJung · 2024-10-11T06:35:39Z

If the argument for the C function has type struct itimerspec *, you begin by calling deref_pointer_as(arg, this.libc_ty_layout("itimerspec")). Then you use project_field to get places corresponding to the two fields, and you pass that to read_timespec.

FrankReh · 2024-10-12T17:04:03Z

How would the maintainers like to see timers added that are not directly tied to a thread blocking itself?

I see how Miri blocks threads with optional timeouts. The timerfd feature would seem to require something be added. The FD can be created and set to have a timeout but there is no user thread that is blocked. Rather the timer firing is reported through a read which may or may not be blocking or with an epoll becoming ready.

There are at least two options I can see. One is that the thread scheduler has a new queue or min-heap introduced where the scheduler asks for the min timeout right before where it already asks for the min blocked-thread timeout. When it finds an ancillary timeout has reached 0, it either runs the callback or puts it onto a runnable queue, depending on whether the context actually allows such callbacks at that point of the scheduler system. Or the notion of a kernel thread could be introduced whose job it was to block until the next ancillary timeout occurred, and run the callback on this kernel thread when it became time. Also the thread would probably have to be awoken whenever a new timeout was being introduced that might have a smaller duration than the existing minimum. So maybe the kernel thread is blocked on a futex with a timeout, using the futex to support the internal timer feature.

Of course there could be a design I don't see and it might already be there.

Thanks for feedback.

RalfJung · 2024-10-12T17:27:31Z

I would think you "just" make a read on timerfd block until the allotted time. No need to change anything about the scheduler or add a new notion of thread or anything like that.

epoll doesn't yet support blocking. I would anyway recommend to first add the new FD type without epoll support.

However, I know absolutely nothing about timerfd. :) So as per our process for larger contributions, please write a little example Rust code, with extensive comments, that showcases the APIs involved and explains what is supposed to happen. It sounds like there is a chance of the timeout changing for a thread that's already sleeping? That's a new pattern, which I think can be modeled by unblocking the thread, and then the thread's unblock handler has to deal with the case of a changed timeout.

The pattern should be: since only read can actually observe that a timer fired, nothing happens until someone calls read, and then in the read call we compute how many expirations there were since the last read call and act accordingly.

FrankReh · 2024-10-13T12:39:00Z

The pattern should be: since only read can actually observe that a timer fired, nothing happens until someone calls read, and then in the read call we compute how many expirations there were since the last read call and act accordingly.

That's not actually true. timerfd can be watched by epoll. That's in fact why it is interesting to me. And as of now epoll may be non-blocking only, but that would likely be expanded to including blocking reads on the epoll device at some point, at least for certain FDs, like timerfd, pipes and files.

And getting timers to work independently of blocked threads in Miri is interesting to me ultimately because timers are integral in getting io_uring to do interesting things and building up io_uring support in Miri is a long term goal of mine. There are a lot of interesting async runtimes being built around io_uring and they all have their own unsafe machinations, and getting those runtimes through Miri tests helps everybody who wants a robust, sound ecosystem.

Anyway, that's more information than you asked for. But sometimes I see the question "why" in this forum so I thought I would provide background one more time.

RalfJung · 2024-10-13T13:02:36Z

That's not actually true. timerfd can be watched by epoll. That's in fact why it is interesting to me. And as of now epoll may be non-blocking only, but that would likely be expanded to including blocking reads on the epoll device at some point, at least for certain FDs, like timerfd, pipes and files.

Yeah but it is futile to try and figure out the design for how timerfd will work with blocking epoll until we have some form of blocking epoll.

io_uring is even more speculative. I am not a fan of designing for entirely speculative constraints where we have no way of evaluating any design decisions.

FrankReh · 2024-10-13T13:05:34Z

Okay. I hope you will let me continue to ask my questions, perhaps on Zulip instead?

Thanks for you patience.

RalfJung · 2024-10-13T13:17:11Z

I think it would still be useful to add basic timerfd support now and slowly expand it, so I'll reopen the issue. :) But if you don't want to work on that, that's fine of course.

RalfJung changed the title ~~Linux timerfd_create(2)~~ Add support for Linux timerfd Oct 3, 2024

RalfJung added C-enhancement Category: a PR with an enhancement or an issue tracking an accepted enhancement A-shims Area: This affects the external function shims A-linux Area: affects only Linux targets A-files Area: related to files, paths, sockets, file descriptors, or handles labels Oct 3, 2024

FrankReh closed this as completed Oct 13, 2024

RalfJung reopened this Oct 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Linux timerfd #3934

Add support for Linux timerfd #3934

FrankReh commented Oct 2, 2024

oli-obk commented Oct 2, 2024

FrankReh commented Oct 3, 2024

FrankReh commented Oct 4, 2024

tiif commented Oct 4, 2024 •

edited

Loading

FrankReh commented Oct 4, 2024

RalfJung commented Oct 4, 2024 via email

FrankReh commented Oct 4, 2024

FrankReh commented Oct 10, 2024

RalfJung commented Oct 10, 2024

FrankReh commented Oct 10, 2024

RalfJung commented Oct 11, 2024

FrankReh commented Oct 12, 2024

RalfJung commented Oct 12, 2024 •

edited

Loading

FrankReh commented Oct 13, 2024 •

edited

Loading

RalfJung commented Oct 13, 2024

FrankReh commented Oct 13, 2024

RalfJung commented Oct 13, 2024

Add support for Linux timerfd #3934

Add support for Linux timerfd #3934

Comments

FrankReh commented Oct 2, 2024

oli-obk commented Oct 2, 2024

FrankReh commented Oct 3, 2024

FrankReh commented Oct 4, 2024

tiif commented Oct 4, 2024 • edited Loading

FrankReh commented Oct 4, 2024

RalfJung commented Oct 4, 2024 via email

FrankReh commented Oct 4, 2024

FrankReh commented Oct 10, 2024

RalfJung commented Oct 10, 2024

FrankReh commented Oct 10, 2024

RalfJung commented Oct 11, 2024

FrankReh commented Oct 12, 2024

RalfJung commented Oct 12, 2024 • edited Loading

FrankReh commented Oct 13, 2024 • edited Loading

RalfJung commented Oct 13, 2024

FrankReh commented Oct 13, 2024

RalfJung commented Oct 13, 2024

tiif commented Oct 4, 2024 •

edited

Loading

RalfJung commented Oct 12, 2024 •

edited

Loading

FrankReh commented Oct 13, 2024 •

edited

Loading