Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: ecmascript modules as primary build artifact #1161

Open
cdaringe opened this issue Oct 21, 2021 · 6 comments
Open

RFC: ecmascript modules as primary build artifact #1161

cdaringe opened this issue Oct 21, 2021 · 6 comments

Comments

@cdaringe
Copy link
Contributor

cdaringe commented Oct 21, 2021

context

JSOO has long been the sole, defacto provider mapping OCaml input to JavaScript output. This project has existed for well over a decade, and has the primary function of taking users' ML and converting it to immediately invocable or effectful JavaScript. This continues to be a great accomplishment.

Over the passed few years, the ECMAScript spec has evolved and developed a proper module system.

This module system development is of interest, because its stabilization enables a greater opportunity for ML to participate in the JavaScript ecosystem.

proposal

Provide a first class compile target to ESM.

justification

  • JS bundle peformance
    • status-quo: the current compiler (bundler) costs >75kb of javascript, just for hello world (with --profile=release). ref project
      • 75kb is a large baseline
      • the full runtime is compiled, even if only a subset is used
    • future: like elm's compiler, the more ocaml features that user code consumes, the more of the runtime we can incrementally include in the output
  • Improved portability
    • ESM output enables OCaml modules to be participate in JavaScript norms. Such norms include participation with common javascript tooling (browsers (e.g. <script src="..." module></script>), bundlers, analyzers, dead-code analyzers, minifiers, pretty printers--the works!)

hypothetical

The following hypothetical cases may be completely bogus. Consider these my temporary, envisioned target state, even if they are not necessarily achievable.

case - empty

Input:

(* main.ml *)
(* no content *)

Output:

// main.js
// no content

No source, no dist! 75k worth of savings, vs status quo :)

case - hello world

Input:

(* main.ml *)
let () = print_endline "Hello, world!"

Output (a):

// main.js
console.log("Hello, world!")

Output (b):

// main.js
import { print_endline } from "@jsoo/std";

print_endline("Hello, world!")

case - reduce

Input:

(* main.ml *)
let rec sum = function
  | [] -> 0
  | head::tail -> head + (sum tail)

Output:

// main.js
import { fn, match_case, match } from "@jsoo/fn";
import { pattern_length } from "@jsoo/lists";

export const sum = fn(
  match_case(match(pattern_length(0, true)), () => 0),
  match_case(match(pattern_length(1, false), [head, ...tail] => head + (sum tail)))
);
// ^or whatever the equivalent output would need to be,
// as this is just pseudo-code

Where the runtime is partitioned into ESM modules as well. E.g.:

// @jsoo/caml_runtime
export const caml_lists_iter_whatever(a,b,c) => { /*  */ };
// ... all the caml_ stuff!

// @jsoo/lists
export const pattern_length = (len, exact) => x => exact 
  ? x.length === len 
  : x.length >= len;

// @jsoo/fn
export const match = x => pattern => pattern(x);

// @jsoo/operators
export const caml_neg_float = x => (-x);
// ...

If may be the case that the emitted modules is more along the lines of:

// main.js
import { caml_apply, caml_match_case, caml_match, caml_pat } from "@jsoo/runtime";
// ^ psuedocode, clearly. not an expert in the caml_ bindings, or the
// feasibility of such a mapping :)

export const sum = caml_apply(
  caml_match_case(caml_match(caml_pat(0, true)), () => 0),
  caml_match_case(caml_match(caml_pat(1, false), [head, ...tail] => head + (sum tail)))
);

And the fully runtime is implemented by a plain, super ES module.

What's nice about this, is that only the bare minimum import graph gets used, versus a full runtime!

omissions

Omitted from this discussion are

  • how the FFI would play a role
  • how local imports would link into module outputs
  • how 3rd party ML code would participate

We could crack into all of these as interested!

user experience

  • Want to use esbuild? Parcel? webpack?
    With an ESModule target, browser-friendly ML code could be achievable anywhere ESM is used! npm install @jsoo/runtime, then setup a ML loader OR precompile .ml to modules!
  • reason/rescript toolchains compile ml-like files with similar output, albeit to commonjs. (esm -> commonjs 😄 , commonjs -> esm 😵‍💫 )

I didn't see such conversations on this topic in github, but may have missed it. Sorry if this is duplicated! If this is duped conversation, please feel free to eagerly close!

@hhugo
Copy link
Member

hhugo commented Oct 21, 2021

Thanks for taking the time to write all this.

I want to clarify one aspect for people reading this RFC

the full runtime is compiled, even if only a subset is used

What's nice about this, is that only the bare minimum import graph gets used, versus a full runtime!

js_of_ocaml is designed to only include the part of the runtime that it needs. It does not blindly including the whole runtime for not reason

$ cat test.ml 
let () = print_endline "Hello, world!"
$ ocamlc test.ml -o test.bc
$ js_of_ocaml test.bc
$ ls -lh test.js 
20K test.js

One of the reason for the size of the included runtime is that we try hard to keep the same semantic as regular OCaml.
For example, print_endline is not translated to plain console.log, instead the output will be buffered, and flushed when needed, similar to regular OCaml

@cdaringe
Copy link
Contributor Author

Thanks for the clarification, I’ll have to go back and revisit why the artifact I produced was so far off.

@hhugo
Copy link
Member

hhugo commented Oct 21, 2021

There are two ways to compile with jsoo:

Separate compilation (probably what you used), it is the default when using dune.
libraries and modules are compiled individually to javascript and then linked together. In that mode there is no deadcode elimination and the runtime is included in full. This mode should be used during development because of the short feedback loop.

Whole program compilation, used in dune when --profile release.
In that mode, jsoo does deadcode elimination and only include the part of the runtime it needs.
Depending on size of the dependencies, compilation can easily take 10s or more.

@cdaringe
Copy link
Contributor Author

I discovered that any consumption of the Js module is that which adds approx 50kb+ of javascript

$ ocamlfind ocamlc -package js_of_ocaml -package js_of_ocaml-ppx -linkpkg  js/main.ml -o js/main.bc
$ js_of_ocaml compile --opt=3 js/main.bc
$ ls -h js/

Tried various other optimization flags to js_of_ocaml without getting under 70k w/ the JS module. More investigation needed to better understand why that mod is so problematic in the rendered output.

Nonetheless, I still lobby that yielding ESM would be an improved artifact to offer users :)

@hhugo
Copy link
Member

hhugo commented Oct 27, 2021

The Js module usesPrintexc that in turn Printf.
Here is the sizes of stdlib modules compiled to js. Note that jsoo does deadcode elimination and that using a module doesn't mean including all of it. However, CamlinternalFormat contains a large number of mutually recursive functions, which can explain why deadcode elimination doesn't do a good job on it.

js_of_ocaml ~/.opam/4.12.0/lib/ocaml/stdlib.cma  --keep-unit-names -o .
$ wc -c ./*.js | sort -n -r
273406 total
 54197 ./CamlinternalFormat.js
 19626 ./Stdlib__scanf.js
 16547 ./Stdlib__format.js
 11169 ./Stdlib__ephemeron.js
 10577 ./Stdlib__list.js
  8916 ./Stdlib__hashtbl.js
  8870 ./CamlinternalOO.js
  8753 ./Stdlib__set.js
  8318 ./Stdlib__arg.js
  8242 ./Stdlib__map.js
  8139 ./Stdlib__float.js
  7911 ./Stdlib__bytes.js
  7693 ./Stdlib__filename.js
  6989 ./Stdlib__array.js
  6367 ./Stdlib__printexc.js
  6357 ./Stdlib__buffer.js
  6297 ./Stdlib.js
  5707 ./Stdlib__bigarray.js
  5549 ./Stdlib__weak.js
  5233 ./Stdlib__genlex.js
  4269 ./Stdlib__string.js
  4159 ./Stdlib__stream.js
  3335 ./Stdlib__random.js
  2981 ./Stdlib__gc.js
  2795 ./Stdlib__lexing.js
  2420 ./Stdlib__parsing.js
  2332 ./Stdlib__obj.js
  2274 ./CamlinternalFormatBasics.js
  1719 ./Stdlib__queue.js
  1619 ./Stdlib__seq.js
  1604 ./Stdlib__digest.js
  1483 ./Stdlib__int64.js
  1459 ./Stdlib__complex.js
  1377 ./Stdlib__result.js
  1212 ./Stdlib__stack.js
  1206 ./Stdlib__char.js
  1175 ./Stdlib__uchar.js
  1146 ./Stdlib__int32.js
  1110 ./Stdlib__printf.js
  1077 ./Stdlib__sys.js
  1006 ./Stdlib__option.js
   973 ./Stdlib__nativeint.js
   949 ./Stdlib__either.js
   903 ./Stdlib__marshal.js
   883 ./Stdlib__fun.js
   793 ./Stdlib__pervasives.js
   727 ./CamlinternalLazy.js
   612 ./Stdlib__bytesLabels.js
   538 ./Stdlib__listLabels.js
   449 ./Stdlib__lazy.js
   430 ./CamlinternalAtomic.js
   398 ./Stdlib__stringLabels.js
   366 ./Stdlib__arrayLabels.js
   358 ./Stdlib__int.js
   352 ./Stdlib__bool.js
   279 ./Stdlib__callback.js
   238 ./Stdlib__unit.js
   217 ./Stdlib__atomic.js
   208 ./Stdlib__moreLabels.js
   199 ./CamlinternalMod.js
   185 ./Stdlib__oo.js
   134 ./Stdlib__stdLabels.js

As you can see things related to format are rather big.

@cdaringe cdaringe reopened this Nov 13, 2021
@hhugo
Copy link
Member

hhugo commented Dec 22, 2021

related to #551

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants