Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "Runtime basics" to the tutorial #408

Open
SeanTAllen opened this issue Oct 22, 2019 · 41 comments
Open

Add "Runtime basics" to the tutorial #408

SeanTAllen opened this issue Oct 22, 2019 · 41 comments

Comments

@SeanTAllen
Copy link
Member

While discussing ponylang/ponylang-website#502 (add quiescence FAQ).

  • Garbage collector basics (there are 2 entries in the appendix)
  • Quiescence
  • ASIO system
  • Env.exitcode

Perhaps a link to runtime options, also information on how to find more information.

@rhagenson
Copy link
Member

Any thoughts on where this chapter should go? My opinion in after Packages and before Testing as the former ends on the Standard Library and the latter starts into sort of advanced Pony usage.

@rhagenson
Copy link
Member

rhagenson commented Nov 20, 2019

Adding a task list to ensure I stay organized (this task list was build over the course of writing):

@rhagenson
Copy link
Member

After re-reading both the runtime-related appendices (Memory Allocation at Runtime, and Garbage Collection with Pony-ORCA), I think this information should be moved into the new chapter rather than repeated in the appendix.

Also, Memory Allocation at Runtime ends with an ellipsis. Was there something more intended to go there?

@SeanTAllen
Copy link
Member Author

@rhagenson I have no idea why the ellipsis is there.

@SeanTAllen
Copy link
Member Author

@rhagenson do you have all the info you need?

@rhagenson
Copy link
Member

@SeanTAllen I believe so. I have begun work over on https://github.com/rhagenson/pony-tutorial/tree/runtime-chapter Mostly so far it has been a refactor to move the runtime content from Appendices into the new chapter and rewrite that content to stress the pertinent details.

@rhagenson
Copy link
Member

Friendly update that I am back on top of this. Had to take a hiatus while, among other things, I prepared a tutorial submission for the largest conference in my field. That tutorial will include Pony, if accepted.

Sort of preemptive question so feel free to answer solely off the cuff: what have been some past references for any of this information? E.g., blog post on how the ASIO system works, video on quiescence, past issue/PR that involved ample discussion of garbage collection, etc. Looking to make the language consistent between what has been deemed good/helpful in the past and the tutorial.

@SeanTAllen
Copy link
Member Author

@rhagenson
Copy link
Member

Already had the ORCA paper, but not the video. Thank you for both.

Any similarly useful references for the other topics?

@EpicEric
Copy link
Contributor

There are some early discussions from the beginning of the #runtime stream in Zulip (that were actually copied from the Slack) that cover a few runtime topics.

@rhagenson
Copy link
Member

@EpicEric I had no idea there even was a runtime stream on Zulip. Thank you for pointing me toward it.

@rhagenson
Copy link
Member

rhagenson commented Dec 19, 2019

Reminder to self (and noted here for others to hold me to it):

  • Find good place to mention the number of scheduler threads is N where N is core count + 1 ASIO thread

The way I have it written now this detail would be forced or detract from the point wherever I put it. I am sure the content will change so not going to force it now when later unforced addition is still possible.

@SeanTAllen
Copy link
Member Author

Number of scheduler threads is N where N is the core count. There is additionally an asio thread that handles receiving asio messages, however, it never runs any actors. When we say "scheduler threads", the asio thread is not included in that.

@rhagenson
Copy link
Member

Understand the distinction now. Still will need to find a good place for this detail at a later time to ensure it is not just dropped in somewhere.

@SeanTAllen
Copy link
Member Author

Couple of things to know about scheduler threads.

You can change the default number using --ponymaxthreads to set to less than N where N is the number of cores.

By default, the runtime will stop using scheduler threads that "aren't needed". This helps keep excessive work stealing from happening. This can scale down to 0 scheduler threads. At 0, the only thread running will be the ASIO thread that is waiting to receive an event. Once an event is received, at least 1 scheduler thread will be started back up. You can set a minimum number of scheduler threads to always keep running using the --ponyminthreads option.

If you want to, you can turn off scheduler thread scaling by using --ponynoscale.

There is also a --ponysuspendtheshold that has an impact on scheduler thread scaling.

@rhagenson
Copy link
Member

Currently thinking, given the distinction just noted, that this might naturally fit into the ASIO system section in the reverse of the way stated here, i.e.: there is one ASIO thread + N scheduler threads...

@rhagenson
Copy link
Member

For all the runtime-related options, rather than spread them throughout the new chapter, how about a section called "Runtime Options" that is the last section in the chapter and covers these runtime configuration options like pinning ASIO, changing minimum scheduler thread count, etc?

@rhagenson
Copy link
Member

rhagenson commented Dec 19, 2019

What states can an actor be in besides: alive, blocked, dead, and muted?

  • Alive: running a behavior or processing a message from its queue
  • Blocked: completed execution and no messages waiting in its queue
  • Dead: blocked itself and all actors with a reference to it are blocked
  • Muted: attempted send to overloaded actor and itself is not overloaded (is the result of backpressue and will be scheduled once backpressure decreases)

These are the ones I know of by reading through the #runtime stream and past runtime content from the tutorial. I want to ensure I am not neglecting an actor state.

@SeanTAllen
Copy link
Member Author

SeanTAllen commented Dec 19, 2019

@rhagenson I'm not aware of Dead being used as a term.

  • Alive -> Scheduled
  • Muted -> Muted
  • Blocked -> Unscheduled
  • Dead -> There is no state for this in the runtime.

For Unscheduled, a distinction could be made between "has no messages and therefore doesn't exist in a queue for a scheduler thread" and "has messages and is waiting in Scheduler thread's queue".

That would give 4 states. But we don't have agreed-upon terminology for those 2 possible "unscheduled" states.

Blocked would be one possible term in for the first unscheduled state (and is noted in actor.c as being "logically blocked"). We don't have a name afaik for the 2nd of the 2. Generally "Blocked" is mostly used when the cycle detector is in use. I think it would be reasonable to use as you have defined.

There is also "overloaded" and "under pressure" that could be considered states as well that are separate.

Sorry if this doesn't help much. I'm trying to provide more info, You are asking good questions.

EDIT

I'm realizing that "Unscheduled" might be problematic as there is a flag you can set via the C api called FLAG_UNSCHEDULED to manually remove an actor from scheduling. It isn't used anymore but it exists. This conversation is making me realize that we should definitely discuss an RFC to remove.

EDIT 2

re: C api -> there is a C api that is exposed that allows you to control various parts of the runtime including starting it up, scheduling actors, creating them etc. It isn't used by Pony but could be used to embed the Pony runtime in other systems.

@SeanTAllen
Copy link
Member Author

For all the runtime-related options, rather than spread them throughout the new chapter, how about a section called "Runtime Options" that is the last section in the chapter and covers these runtime configuration options like pinning ASIO, changing minimum scheduler thread count, etc?

This sounds reasonable.

@rhagenson
Copy link
Member

@SeanTAllen Thank you for the information.

For your own knowledge of where "dead: cropped up, it is in the Appendix on GC/ORCA that is being moved to the new chapter.

When an actor has completed local execution and has no pending messages on its queue, it is _blocked_. An actor is _dead_, if it is blocked and all actors that have a reference to it are blocked, transitively. A collection of dead actors depends on being able to collect closed cycles of blocked actors.

Currently I use the three states: alive, blocked, dead and have not made mention of muted yet (same problem as the scheduler thread problem that I do not have a "natural" place for it yet). I then reuse the term "dead" in the Quiescence section to differentiate between collecting an individual actor and collecting a cycle of actors (i.e., it takes a cycle of dead actors to GC them all at once).

As for "overloaded" and "under pressure" I think given that those are both backpressure related I would categorize them into that system as the cause of muting. Of course given this is the tutorial I am trying to toe that line of just enough information at one time to be understood. Not suggesting it yet, but I would almost "hide" those backpressure states for now and put all three: muted, under pressure, and overloaded together in a backpressure-related chapter/section/FAQ/Appendix/etc.

I will progress with the "Runtime Options" section.

@SeanTAllen
Copy link
Member Author

@rhagenson well, apparently we are using "Dead" somewhere. I never knew that.

@SeanTAllen
Copy link
Member Author

As for "overloaded" and "under pressure" I think given that those are both backpressure related I would categorize them into that system as the cause of muting. Of course given this is the tutorial I am trying to toe that line of just enough information at one time to be understood. Not suggesting it yet, but I would almost "hide" those backpressure states for now and put all three: muted, under pressure, and overloaded together in a backpressure-related chapter/section/FAQ/Appendix/etc.

agreed.

@SeanTAllen
Copy link
Member Author

SeanTAllen commented Dec 19, 2019

@rhagenson I think that definition of dead is not quite right.

To be dead an actor also can't be registered with the asio system to receive events.

Or current definitions don't really take that into account.

Perhaps

Alive/Dead would be a good distinction

Alive: Has messages its queue or can receive messages (this includes ASIO events)
Dead: Has no message in its queue nor can it receive messages.

Running or Scheduled or Executing/Blocked/Waiting/Muted

Where "Running or Scheduled or Executing" is "currently processing its message queue"
Blocked is as you said
Waiting is "in the run queue for a scheduler with messages to process"
Muted is "not in a run queue for a scheduler. may or may not have messages in its queue"

An Alive actor can be Running, Blocked, Waiting or Muted.
A Dead actor can only be Blocked or Muted. (Although I'm not sure if the current implementation would consider a muted actor to be able to be collected by the GC- I would have to see what I did when I implemented muted).

@rhagenson
Copy link
Member

@SeanTAllen My response below just grew and grew here so a lot to respond to here.

First, to be sure there was no typo, did you mean to say Alive is having messages or the ability to receive messages rather than having no messages?

So to summarize my understanding, it would shake down as (borrowing <: "subtype of" notation):

Running <: Alive
Waiting <: Alive
Executing <: Alive
Muted <: Alive
Muted <: Dead
Blocked <: Dead

Therefore, Muted is the only subtype that can be applied to either Alive or Dead actors. From your definitions, I merged Scheduled into Waiting as I do not understand the distinction between "waiting for scheduling" and "being scheduled" (latter of which I assume places an actor as Running). I make the distinction between Running and Executing as loosely related to semantic "in a behavior" (Executing) and "has control of a scheduler thread" (Running) -- a Executing <: Running might then still be technically correct.

Running and Scheduled are not GCed. Blocked and Muted are grounds for GC, Dead is GCed as soon as possible. A backpressure transition due to overload places the actor into Waiting. (I want to say backpressure "kills" actors, but that is not the case given the Alive/Dead supertype names we are using here.)

Anything in here that I missed or is not consistent with your view?

@SeanTAllen
Copy link
Member Author

@rhagenson yes, Alive should have been "has messages in its queue". I've edited accordingly.

@SeanTAllen
Copy link
Member Author

SeanTAllen commented Dec 19, 2019

Running and Executing are the same thing.

Blocked is applicable to both Alive and Dead. Either can be blocked. But a blocked actor can be alive in that it can receive messages still.

Running <: Alive
Waiting <: Alive
Muted <: Alive
Muted <: Dead
Blocked <: Alive
Blocked <: Dead

@rhagenson
Copy link
Member

Got it. Consistent on the view of GCing as well? Dead actors are GCed, while Alive actors in Blocked/Muted are possibly GCed?

@SeanTAllen
Copy link
Member Author

Only Dead actors can be GCed. Alive means they can't be GCed because they are still capable of receiving messages.

Dead - can be GCed
Alive - can not be GCed

@rhagenson
Copy link
Member

I had a rebuttal based on backpressure along with what I had written so far in the chapter for quiescence, however after reading what I wrote again along with the hierarchy here it all agrees that an actor must be Dead to be GCed. All Alive states are some form of the actor still being active so whether it is actively Waiting due to backpressure or not that will not result in GC to reallocate resources from the cooperative scheduler.

Thank you for helping me clarify these states!

@SeanTAllen
Copy link
Member Author

@rhagenson you're welcome

@rhagenson rhagenson self-assigned this Dec 23, 2019
@rhagenson
Copy link
Member

rhagenson commented Dec 30, 2019

Points gained from Andrew Turley VUG video:

  • Actors can send themselves messages, therefore I need to check that Sean and I's Dead/Alive/etc definitions factor this in (i.e., just because an actor is not referenced by other actors it still can send a message to itself or send a reference to itself out if it knows other actors therefore is not quite dead...yet)
  • The cycle detector is a special actor that receives messages when an actor is blocked (i.e., no more messages in its queue) with a reference to itself and the actors it references, as well as when it is unblocked (messages in the queue, waiting for a thread)
  • GC tracing within an actor has six steps:
    1. All owned objects are marked unreachable
    2. All unowned objects with foreign reference count (FRC) > 0 marked unreachable
    3. Tracing from actor fields, mark reachable objects (owned or not)
    4. All owned objects with local reference count (LRC) > 0 are marked reachable
    5. Unreachable owned objects are collected
    6. Decrement messages are sent for unowned objects are are unreachable, and their FRC is set to 0

The final point above has the subtlety that objects are always owned by the actor that created them, even if that actor no longer has a local reference to the object. Therefore it is possible for an actor to be Alive simply because it created an object that is still referenced by some other actor.

@rhagenson
Copy link
Member

Per Sean and my's conversation above when we agreed on temporarily hiding the details of backpressure/muting/overloading/etc and place those details into their own cohesive unit I have decided the section just before Runtime Options (the last section in this Runtime chapter) should be on the backpressure system. This will currently place it after ASIO and Env.exitcode, but I foresee those sections perhaps moving earlier as the Quiescence chapter mentions ASIO and Env.exitcode should be moveable to earlier without causing issue (have not written it yet so time will tell).

@rhagenson
Copy link
Member

Update: I created/organized all the sections previously discussed in this thread so it is now (checked if full written):

  • Env.exitcode
  • Garbage Collector
  • ASIO Subsystem
  • Quiescence
  • Backpressure
  • Runtime Options

I feel like I can effectively write the Backpressure and Runtime Options sections. However I have the following questions for the other sections:

Env.exitcode

This is meant to cover Env.exitcode, of course, but I feel like I am missing the importance of having this be its own section. Does it need to be its own section or is the importance of it being in this chapter that the means to set exitcode is introduced in the tutorial?

ASIO Subsystem

As I have not yet needed to sink myself into the ASIO subsystem so I feel unprepared to write anything more than the very generic interplay between it and GC (which I have already written). For introducing the idea of asynchronous IO and its use in Pony are there existing blog posts or videos on the topic? I read through the #runtime channel on Zulip and still feel I have not yet plucked out the important details.

@rhagenson
Copy link
Member

Perhaps not a "Runtime Basic" but I had yet to mention that the default options of the runtime can be overwritten via the RuntimeOptions struct in builtin, see here. In brief, the --pony* CLI options can be set to different defaults via adding an FFI function to Main:

actor Main
  ...
  fun @runtime_override_defaults(rto: RuntimeOptions) =>
    ...

@rhagenson
Copy link
Member

rhagenson commented Sep 21, 2020

For when this issue continues and as part of the concept of quiescence it is important for the reader to understand that Main.create(...) can exit and the program will continue running if there are other actors exchanging messages and doing work.

This misunderstanding of the Pony runtime led to a conversation in Zulip on GC and actors in busy loops which appeared to draw the reasonable, but incorrect conclusion that since the program starts at Main.create(...) that it also exits when Main.create(...) exits.

@SeanTAllen
Copy link
Member Author

@rhagenson are you working on this or should be trying to find someone to work on it?

@rhagenson
Copy link
Member

I am not actively working on it, but I do have a branch in my fork with existing work. If this is a priority, I have no issue with someone taking it over. Otherwise, I will get a PR open before the end of this year. Wish I could say sooner, but I am not able to promise sooner.

@SeanTAllen
Copy link
Member Author

@rhagenson q... do you want to finish this off or should I take over?

@ponylang-main ponylang-main added the discuss during sync Should be discussed during an upcoming sync label Feb 10, 2022
@rhagenson
Copy link
Member

@SeanTAllen As much as I would like to say I am on top of this, I am not. My existing work exists over on https://github.com/rhagenson/pony-tutorial/commits/runtime-chapter, but has not been updated since early 2020. Please take it over and save as much of my former work as you see fit.

@SeanTAllen
Copy link
Member Author

I've grabbed ryan's commits from his fork and pushed to this repo on the branch runtime-basics

@jemc jemc removed the discuss during sync Should be discussed during an upcoming sync label Mar 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants