Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for embedded mode in the client #8007

Merged
merged 1 commit into from
Jan 29, 2024

Conversation

josevalim
Copy link
Contributor

This ensures that checking for code:ensure_loaded/1 in embedded mode (and other operations) do not send additional messages to the code server.

This pull request fully closes #7889. While previous work has fixed the slow path in the code server, this pull request makes it so the code server is avoided altogether.

Copy link
Contributor

github-actions bot commented Jan 9, 2024

CT Test Results

    2 files     66 suites   1h 1m 9s ⏱️
1 521 tests 1 268 ✅ 253 💤 0 ❌
1 715 runs  1 413 ✅ 302 💤 0 ❌

Results for commit e0ea6f5.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

ensure_modules_loaded_2(OnLoad, Error1);
embedded ->
Error2 = [{M, {error, embedded}} || {M, _} <- OnLoad] ++ Error1,
ensure_modules_loaded_2([], Error2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would just return {error, Error2} right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may not have any OnLoad modules on partition, which means that the line above does not add new errors and we call ensure_modules_loaded_2([], []). :)

@IngelaAndin IngelaAndin added the team:VM Assigned to OTP team VM label Jan 9, 2024
@josevalim josevalim force-pushed the jv-embedded-client branch 2 times, most recently from d5fc3c6 to 11b050e Compare January 22, 2024 13:03
Comment on lines 621 to 623
Opposite to <seemfa marker="#ensure_loaded/1">ensure_loaded/1</seemfa>,
concurrent calls to this function will attempt to load
modules concurrently. Furthermore, modules are loaded
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If find this sentence confusing.

Does that mean that if two processes call this function "at the same time", one or more of the modules can become loaded twice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is more that the modules will be read and processed more than once but only loaded once. Should we replace "will attempt to load" by "will read and prepare modules concurrently, but load them once"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle there's nothing that prevents us from avoiding this multiple-load issue the same way it's avoided for ensure_loaded/1 - it's just that when I implemented this mechanism for enusre_loaded/1 the multi-load scenario was so complex I couldn't implement the change it in a reasonable way, so I decided to skip it.

@bjorng bjorng self-assigned this Jan 23, 2024
@josevalim
Copy link
Contributor Author

All feedback addressed.

@jhogberg jhogberg added testing currently being tested, tag is used by OTP internal CI and removed testing currently being tested, tag is used by OTP internal CI labels Jan 25, 2024
This ensures that checking for code:ensure_loaded/1
in embedded mode (and other operations) do not send
additional messages to the code server.

This pull request fully closes erlang#7889. While previous
work has fixed the slow path in the code server, this
pull request makes it so the code server is avoided
altogether.
@josevalim
Copy link
Contributor Author

Rebased against master. :)

@jhogberg jhogberg added the testing currently being tested, tag is used by OTP internal CI label Jan 25, 2024
@jhogberg jhogberg merged commit b0ea005 into erlang:master Jan 29, 2024
16 checks passed
@jhogberg
Copy link
Contributor

Merged, thanks for the PR! :)

@telphan
Copy link

telphan commented Aug 6, 2024

@josevalim Using elixir 1.17.2 with OTP 27.0.1 (which should include this fix), we are encountering a similar issue and wondering if it is the same cause.
Our case is when running tests we seeing the following error:

Running ExUnit with seed: 41167, max_cases: 20

................................................................................................................................................................................................[test/duffel_white_label_web/live/flights/order_test.exs:52: Duffel.WhiteLabelWeb.Live.Flights.OrderTest."test / when order is fetched successfully shows details about the order (one way)"/1]
System.build_info() #=> %{
  version: "1.17.2",
  date: "2024-07-06T21:21:22Z",
  otp_release: "27",
  build: "1.17.2 (compiled with Erlang/OTP 27)",
  revision: "47abe2d"
}

....{'CRASH',{'DOWN',#Ref<0.1269573508.4118020097.123743>,process,{code_server,'nonode@nohost'},{{badmatch,error},[{code_server,reply_loading,4,[{file,"code_server.erl"},{line,1184}]},{code_server,finish_on_load_2,3,[{file,"code_server.erl"},{line,1415}]},{code_server,finish_on_load,3,[{file,"code_server.erl"},{line,1397}]},{code_server,loop,1,[{file,"code_server.erl"},{line,183}]}]}}}
....................** (EXIT from #PID<0.94.0>) exited in: GenServer.stop(#PID<0.1314.0>, :normal, :infinity)
    ** (EXIT) exited in: :sys.terminate(#PID<0.1314.0>, :normal, :infinity)
        ** (EXIT) an exception was raised:
            ** (ArgumentError) errors were found at the given arguments:

  * 1st argument: invalid destination

                :erlang.send(:code_server, {:code_call, #PID<0.1314.0>, :get_path})
                (kernel 10.0.1) code_server.erl:154: :code_server.call/1
                (kernel 10.0.1) code.erl:783: :code.all_available/0
                (elixir 1.17.2) lib/exception.ex:1661: UndefinedFunctionError.hint/4
                (elixir 1.17.2) lib/exception.ex:1639: UndefinedFunctionError.blame/2
                (ex_unit 1.17.2) lib/ex_unit/formatter.ex:336: ExUnit.Formatter.format_kind_reason/6
                (ex_unit 1.17.2) lib/ex_unit/formatter.ex:274: anonymous fn/7 in ExUnit.Formatter.format_test_failure/5
                (elixir 1.17.2) lib/enum.ex:1804: anonymous fn/2 in Enum.map_join/3
                (elixir 1.17.2) lib/enum.ex:4405: Enum.map_intersperse_list/3
                (elixir 1.17.2) lib/enum.ex:1804: Enum.map_join/3
                (ex_unit 1.17.2) lib/ex_unit/formatter.ex:273: ExUnit.Formatter.format_test_failure/5
                (junit_formatter 3.3.1) lib/formatter.ex:244: JUnitFormatter.generate_test_body/2
                (junit_formatter 3.3.1) lib/formatter.ex:230: JUnitFormatter.generate_testcases/2
                (junit_formatter 3.3.1) lib/formatter.ex:202: anonymous fn/2 in JUnitFormatter.generate_testsuite_xml/2
                (elixir 1.17.2) lib/enum.ex:2531: Enum."-reduce/3-lists^foldl/2-0-"/3
                (junit_formatter 3.3.1) lib/formatter.ex:201: JUnitFormatter.generate_testsuite_xml/2
                (elixir 1.17.2) lib/enum.ex:1711: anonymous fn/3 in Enum.map/2
                (stdlib 6.0.1) maps.erl:860: :maps.fold_1/4
                (elixir 1.17.2) lib/enum.ex:2543: Enum.map/2
                (junit_formatter 3.3.1) lib/formatter.ex:156: JUnitFormatter.handle_suite_finished/1

[os_mon] memory supervisor port (memsup): Erlang has closed
[os_mon] cpu supervisor port (cpu_sup): Erlang has closed

Could you point me in the right direction here?

@josevalim
Copy link
Contributor Author

It seems to be a duplicate of #8510. Do you have any native dependencies (NIFs) in your application? Try calling Code.ensure_loaded(ModuleWithNifs) in your test_helper.exs and see if it addresses the issue until there is a fix.

@telphan
Copy link

telphan commented Aug 6, 2024

I tried that but now I have a new crash:

Running ExUnit with seed: 424501, max_cases: 20

.................................................................................................................................{error,badarg,[{io,put_chars,[standard_error,[<<42,42,32,40,69,88,73,84,32,102,114,111,109,32,35,80,73,68,60,48,46,57,52,46,48,62,41,32,101,120,105,116,101,100,32,105,110,58,32,58,103,101,110,95,115,101,114,118,101,114,46,99,97,108,108,40,58,101,108,105,120,105,114,95,99,111,110,102,105,103,44,32,123,58,115,101,114,105,97,108,44,32,35,70,117,110,99,116,105,111,110,60,55,46,53,57,50,56,56,51,53,54,47,48,32,105,110,32,83,121,115,116,101,109,46,117,110,116,114,97,112,95,115,105,103,110,97,108,47,50,62,125,44,32,58,105,110,102,105,110,105,116,121,41,10,32,32,32,32,42,42,32,40,69,88,73,84,41,32,110,111,32,112,114,111,99,101,115,115,58,32,116,104,101,32,112,114,111,99,101,115,115,32,105,115,32,110,111,116,32,97,108,105,118,101,32,111,114,32,116,104,101,114,101,39,115,32,110,111,32,112,114,111,99,101,115,115,32,99,117,114,114,101,110,116,108,121,32,97,115,115,111,99,105,97,116,101,100,32,119,105,116,104,32,116,104,101,32,103,105,118,101,110,32,110,97,109,101,44,32,112,111,115,115,105,98,108,121,32,98,101,99,97,117,115,101,32,105,116,115,32,97,112,112,108,105,99,97,116,105,111,110,32,105,115,110,39,116,32,115,116,97,114,116,101,100>>,10,<<10>>]],[{file,"io.erl"},{line,198},{error_info,#{cause=>{io,arguments},module=>erl_stdlib_errors}}]},{'Elixir.Kernel.CLI',exec_fun,2,[{file,"lib/kernel/cli.ex"},{line,165}]},{'Elixir.Kernel.CLI',run,1,[{file,"lib/kernel/cli.ex"},{line,54}]},{init,start_it,1,[]},{init,start_em,1,[]},{init,do_boot,3,[]}]}
Runtime terminating during boot ({badarg,[{io,put_chars,[standard_error,[<<42,42,32,40,69,88,73,84,32,102,114,111,109,32,35,80,73,68,60,48,46,57,52,46,48,62,41,32,101,120,105,116,101,100,32,105,110,58,32,58,103,101,110,95,115,101,114,118,101,114,46,99,97,108,108,40,58,101,108,105,120,105,114,95,99,111,110,102,105,103,44,32,123,58,115,101,114,105,97,108,44,32,35,70,117,110,99,116,105,111,110,60,55,46,53,57,50,56,56,51,53,54,47,48,32,105,110,32,83,121,115,116,101,109,46,117,110,116,114,97,112,95,115,105,103,110,97,108,47,50,62,125,44,32,58,105,110,102,105,110,105,116,121,41,10,32,32,32,32,42,42,32,40,69,88,73,84,41,32,110,111,32,112,114,111,99,101,115,115,58,32,116,104,101,32,112,114,111,99,101,115,115,32,105,115,32,110,111,116,32,97,108,105,118,101,32,111,114,32,116,104,101,114,101,39,115,32,110,111,32,112,114,111,99,101,115,115,32,99,117,114,114,101,110,116,108,121,32,97,115,115,111,99,105,97,116,101,100,32,119,105,116,104,32,116,104,101,32,103,105,118,101,110,32,110,97,109,101,44,32,112,111,115,115,105,98,1

@josevalim
Copy link
Contributor Author

The error message is "** (EXIT from #PID<0.94.0>) exited in: :gen_server.call(:elixir_config, {:serial, #Function<7.59288356/0 in System.untrap_signal/2>}, :infinity)\n ** (EXIT) no process".

If it cannot find an Elixir process, it means either Erlang's stdlib/kernel or Elixir's application is shutdown, probably because something is going wrong elsewhere. It is hard to say what it is. You can try running tests with elixir --logger-sasl-reports true -S mix test and see if they give a better indication of which processes are terminating and where.

@telphan
Copy link

telphan commented Aug 6, 2024

It does not give me any more information. I can confirm with the core dump that 0.94.0 does not exist. Interestingly (or maybe not) over multiple runs it complains about the same PID, 0.94.0.

@telphan
Copy link

telphan commented Aug 18, 2024

@josevalim I have done a little bit more digging. Here are my findings:

  1. the bug does not seem to reproduce if I target one specific test (i.e. mix test test.exs:34)
  2. the bug always happens after the unit test "completed" (i.e. prints a green dot)
  3. the bug seems to present different errors, but all seems to be related to PID 0.94.0.
  4. when the setup is reduce the only two tests, the bug presents in the same way:
Logger - error: {removed_failing_filter,logger_translator}
** (EXIT from #PID<0.94.0>) exited in: GenServer.stop(#PID<0.1298.0>, :normal, :infinity)
    ** (EXIT) exited in: :sys.terminate(#PID<0.1298.0>, :normal, :infinity)
        ** (EXIT) {:DOWN, :code_server, {:load_module, #Reference<0.1073653998.2470051843.233907>, :xmerl, ~c"/Users/telphan/.asdf/installs/erlang/27.0.1/lib/xmerl-2.0/ebin/xmerl.beam", false, #Reference<0.1073653998.2469920771.233906>}}

Here is the current_stacktrace for 0.94.0 (at the end of test):

 [
   {Task, :await_receive, 3, [file: ~c"lib/task.ex", line: 877]},
   {Mix.Compilers.Test, :require_and_run, 3,
    [file: ~c"lib/mix/compilers/test.ex", line: 79]},
   {Mix.Compilers.Test, :require_and_run, 4,
    [file: ~c"lib/mix/compilers/test.ex", line: 29]},
   {Mix.Tasks.Test, :do_run, 3, [file: ~c"lib/mix/tasks/test.ex", line: 613]},
   {Mix.Task, :"-run_task/5-fun-3-", 3, [file: ~c"lib/mix/task.ex", line: 495]},
   {Mix.Task, :run_alias, 6, [file: ~c"lib/mix/task.ex", line: 574]},
   {Mix.Project, :in_project, 4, [file: ~c"lib/mix/project.ex", line: 463]},
   {File, :cd!, 2, [file: ~c"lib/file.ex", line: 1665]},
   {Mix.Task, :"-run_in_children_projects/2-fun-0-", 4,
    [file: ~c"lib/mix/task.ex", line: 659]},
   {Enum, :"-reduce/3-lists^foldl/2-0-", 3, [file: ~c"lib/enum.ex", line: 2531]},
   {Mix.Task, :run_in_children_projects, 2,
    [file: ~c"lib/mix/task.ex", line: 658]},
   {Mix.ProjectStack, :recur, 1,
    [file: ~c"lib/mix/project_stack.ex", line: 237]},
   {Mix.Task, :run_alias, 6, [file: ~c"lib/mix/task.ex", line: 574]},
   {Mix.CLI, :run_task, 2, [file: ~c"lib/mix/cli.ex", line: 96]},
   {:elixir_compiler_0, :__FILE__, 1,
    [
      file: ~c"/Users/telphan/.asdf/installs/elixir/1.17.2-otp-27/bin/mix",
      line: 2
    ]},
   {:elixir_compiler, :dispatch, 4,
    [file: ~c"src/elixir_compiler.erl", line: 77]},
   {:elixir_compiler, :compile, 4,
    [file: ~c"src/elixir_compiler.erl", line: 52]},
   {:elixir_compiler, :maybe_fast_compile, 2,
    [file: ~c"src/elixir_compiler.erl", line: 39]},
   {:elixir_lexical, :run, 3, [file: ~c"src/elixir_lexical.erl", line: 15]},
   {:elixir_compiler, :quoted, 3, [file: ~c"src/elixir_compiler.erl", line: 17]}
 ]}

Here is the current_stacktrace for 0.1298.0 (at the end of test):

 [
   {:gen_server, :loop, 7, [file: ~c"gen_server.erl", line: 2078]},
   {:proc_lib, :init_p_do_apply, 3, [file: ~c"proc_lib.erl", line: 329]}
 ]}

Here is the process info for 0.94.0:

Process.info(:c.pid(0, 94, 0)) #=> [
  current_function: {Task, :await_receive, 3},
  initial_call: {:erlang, :apply, 2},
  status: :waiting,
  message_queue_len: 0,
  links: [#PID<0.1291.0>, #PID<0.95.0>],
  dictionary: [
    {:elixir_checker_info, {#PID<0.94.0>, nil}},
    {{Duffel.Core.Repo, :dynamic_repo}, Duffel.Core.Repo},
    {:rand_seed,
     {%{
        type: :exsss,
        next: #Function<0.40079776/1 in :rand.exsss_next>,
        bits: 58,
        uniform: #Function<1.40079776/1 in :rand.exsss_uniform>,
        uniform_n: #Function<2.40079776/2 in :rand.exsss_uniform>,
        jump: #Function<3.40079776/1 in :rand.exsplus_jump>
      }, [15558275827859381 | 199511458207168151]}},
    {:elixir_module_binaries,
     [
       {SentrySourceCodePaths,
        <<70, 79, 82, 49, 0, 0, 8, 216, 66, 69, 65, 77, 65, 116, 85, 56, 0, 0,
          1, 48, 0, 0, 0, 32, 28, 69, 108, 105, 120, 105, 114, 46, 83, 101, 110,
          ...>>}
     ]}
  ],
  trap_exit: false,
  error_handler: :error_handler,
  priority: :normal,
  group_leader: #PID<0.69.0>,
  total_heap_size: 53194,
  heap_size: 6772,
  stack_size: 126,
  reductions: 21712297,
  garbage_collection: [
    max_heap_size: %{
      error_logger: true,
      include_shared_binaries: false,
      kill: true,
      size: 0
    },
    min_bin_vheap_size: 46422,
    min_heap_size: 233,
    fullsweep_after: 65535,
    minor_gcs: 52
  ],
  suspending: []
]

And here is the process info for 0.1298.0

Process.info(:c.pid(0, 1298, 0)) #=> [
  current_function: {:gen_server, :loop, 7},
  initial_call: {:proc_lib, :init_p, 5},
  status: :waiting,
  message_queue_len: 0,
  links: [#PID<0.1295.0>],
  dictionary: [
    "$initial_call": {JUnitFormatter, :init, 1},
    "$ancestors": [#PID<0.1295.0>, #PID<0.1291.0>, #PID<0.94.0>]
  ],
  trap_exit: false,
  error_handler: :error_handler,
  priority: :normal,
  group_leader: #PID<0.69.0>,
  total_heap_size: 1597,
  heap_size: 987,
  stack_size: 11,
  reductions: 622,
  garbage_collection: [
    max_heap_size: %{
      error_logger: true,
      include_shared_binaries: false,
      kill: true,
      size: 0
    },
    min_bin_vheap_size: 46422,
    min_heap_size: 233,
    fullsweep_after: 65535,
    minor_gcs: 3
  ],
  suspending: []
]

Do you think you can recognise this bug?

@telphan
Copy link

telphan commented Aug 18, 2024

Adding to my helper:

Code.ensure_loaded(:xmerl)

Produces a slightly different error:

Logger - error: {removed_failing_filter,logger_translator}
** (EXIT from #PID<0.94.0>) exited in: GenServer.stop(#PID<0.1298.0>, :normal, :infinity)
    ** (EXIT) exited in: :sys.terminate(#PID<0.1298.0>, :normal, :infinity)
        ** (EXIT) {:DOWN, :code_server, {:get_object_code_for_loading, :xmerl_xml}}

Everything else seems to stay the same.

@josevalim
Copy link
Contributor Author

This should be fixed by #8744.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:VM Assigned to OTP team VM testing currently being tested, tag is used by OTP internal CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

erl_prim_loader:loop/3 gets called extensively
6 participants