Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generation of Html Documentation #82

Open
rubenssoto opened this issue Oct 31, 2020 · 4 comments
Open

Generation of Html Documentation #82

rubenssoto opened this issue Oct 31, 2020 · 4 comments
Labels
enhancement New feature or request

Comments

@rubenssoto
Copy link

Hello :)

There is a great software that I use in my company called Great Expectations, its a tool to check data quality. They have a feature called data docs, it is HTML documentation about data quality checks, I host all html in an s3 bucket and all company could access.

https://greatexpectations.io/

Whale could have a feature like this, simple html with all table documentation and with some simple fields to search data.

thank you

@rubenssoto
Copy link
Author

https://docs.greatexpectations.io/en/latest/reference/core_concepts/data_docs.html

@rsyi rsyi added the enhancement New feature or request label Oct 31, 2020
@rsyi
Copy link
Owner

rsyi commented Oct 31, 2020

Hm I'll look into how feasible this might be in a low-effort way!

If the goal is just to make a basic interface available to others, I recently discovered gotty, which allows you to serve terminal apps on the web. It basically just lets users access the whale CLI from your browser (and it seems to support concurrent usage). I did some basic tests and it seems to work pretty nicely. If this sort of thing is sufficient, I can write up some quick docs. 😛

I'll look into rendering options as well, but until I/someone can get around to this, here are a few other options (@rubenssoto I think I mentioned these to you, so I'm guessing they're probably not satisfactory, but listing them here in case others are interested 😉 ):

  • If you use github, gitlab, aws codecommit etc., you can push your code there, and then leverage their markdown rendering + search capabilities (for instructions on how to set this up to function automatically using CI/CD pipelines, see the docs here).
  • Whale's parent company, Dataframe, has a hosted platform in the works (the catalog will be free for our early users) -- you can sign up for the waitlist here, and you'll be able to have a nice GUI with much richer collaborative functionality in the next few months.
  • A final option is Amundsen, which has a GUI, but it'll be quite a lot more work to set up (you'll need to set up a scheduler like airflow, write and manage the code to run the scraping job yourself, and manage around 6 or 7 microservices). Keeping your data backed up and stable in these sorts of self-hosted platforms will also require a bit of work as well.

(I'll start learning react in the meantime 😄 )

@rubenssoto
Copy link
Author

No problem @rsyi , I will try to use Git for now until data catalog interface is ready 👍
I like Amundsen but is much to take care, my team is only 3 people our goal is to make things simple and automatic.

I think that you already registered me in a beta list, [email protected].

I have some suggestion if make sense, please tell me, I will create an issue for it.

1 - Today all tables stay on same directory, so I think it could be more organized if had an option to create one directory for database.
2 - I don't if another sources has, but glue has location information, and it is a good info for example to people know table locality in datalake.

@rsyi
Copy link
Owner

rsyi commented Nov 1, 2020

Ah didn't know glue had additional info! Yeah both of those suggestions sound feasible. Open some issues and I'll take a look :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants