Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking new HTML parser work #272

Open
3 of 4 tasks
tbranyen opened this issue Apr 14, 2022 · 3 comments
Open
3 of 4 tasks

Tracking new HTML parser work #272

tbranyen opened this issue Apr 14, 2022 · 3 comments
Labels
diffhtml Core API

Comments

@tbranyen
Copy link
Owner

tbranyen commented Apr 14, 2022

I am currently in the process of rewriting the >5 year old HTML parser that currently exists. The existing parser is a fork of node-fast-html-parser that is stripped down. Unfortunately the regexes are unnecessarily complex and the code is hard to work on. Instead I'm rewriting the parser to use a modern tokenizer approach and be zero-copy as possible for large payloads. I'm iterating the design with strong TDD, so I anticipate hundreds of new unit tests once this is complete.

Feature progress:

  • Significantly more reliable, fix bugs that currently exist in the parser, lots of unit tests
  • Support HTML comments
  • Smaller code footprint that is more specific towards VDOM
  • Better middleware introspection for the parser, helping the linter plugin

Future of the parser:

Post 1.0 launch, I want to invest time planning and building a parser compiled to WebAssembly that can then be plugged into any framework/runtime. This will not use regular expressions or anything hacky like the current parser. I think I'll need to solicit donations for that particular project or find some really passionate engineers who can help. Turns out this was easier than anticipated and will be added for the 1.0 slate.

@tbranyen tbranyen added the diffhtml Core API label Jan 8, 2023
@CetinSert
Copy link
Contributor

@tbranyen – does the new parser support multiline attributes? I remember that being an issue with the current one, though I am not 100% sure.

<button title="a
b
c">x</button>

@tbranyen
Copy link
Owner Author

tbranyen commented Jan 23, 2023

Looks good with the latest parser:

tim in ~/git/diffhtml/packages/diffhtml on fix-createstate-between-render (home) cat test.js
import { innerHTML, html, Internals } from './index.js';
//import { parse } from '../diffhtml-rust-parser/dist/parser.js';

//Internals.parse = parse;

console.log(html`
<button title="a
b
c">x</button>
`);
tim in ~/git/diffhtml/packages/diffhtml on fix-createstate-between-render (home) node test.js
{
  rawNodeName: 'button',
  nodeName: 'button',
  nodeValue: '',
  nodeType: 1,
  key: '',
  childNodes: [
    {
      rawNodeName: '#text',
      nodeName: '#text',
      nodeValue: 'x',
      nodeType: 3,
      key: '',
      childNodes: [],
      attributes: {}
    }
  ],
  attributes: { title: 'a\nb\nc' }
}

@tbranyen
Copy link
Owner Author

Looks good with WASM Rust parser as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
diffhtml Core API
Projects
None yet
Development

No branches or pull requests

2 participants