-
-
Notifications
You must be signed in to change notification settings - Fork 374
Feature request: add WARC output option #412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hello, in the meantime you can maybe use https://github.com/steffenfritz/html2warc ? |
Hi @midaspt, I'm very glad to learn that you're finding use for monolith! WARC can be simply done, I'll likely implement it around the same time as MHTML. The long story short, I'll make monolith first crawl the target document, download all assets into a store of sorts (cache), and then either build a monolithic HTML, MHTML, or WARC. This way it won't require too much redundant code, and will essentially be the same process for every output format. The first step right now is to revamp the caching mechanism, I'll work on it ASAP. Hi there @hugo-akaora, thank you for the link! It's in Python, but I'll use it as a reference, seems like a straightforward format. Cheers, |
Hello @snshn, nice if it can be implemented directly in monolith! <3 It would really great to be able to output multiple format at the same time :) I'll definitely use that feature! |
+1 on MHTML. Been waiting for this feature for almost a full year. BTW I have a very strong working understanding of MIME and happy to provide advice or perspective. Feel free to reach out about this. For example, I wrote https://github.com/jchook/mime-php |
Hi
@ll
.I have been using
monolith
more and more for webpage capture but couldn't find a way to make downloads in WARC format (as documented at https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.0/).I believe such an option would greatly enhance the reach of
monolith
as a general purpose utility.Anyways thanks for your great work as it is. 😎
The text was updated successfully, but these errors were encountered: