Skip to content

create-a-pipeline walkthrough inaccuracies #1521

@jorritsandbrink

Description

@jorritsandbrink

Documentation description

The text in https://dlthub.com/docs/walkthroughs/create-a-pipeline is not completely accurate:

  • "Run the github_api.py pipeline script to test that authentication headers look fine:"
    • The script does not test authentication headers. It yields dummy data. It does not access the GitHub API. The user needs to adjust the script first before the GitHub API is accessed, which happens only later in the walkthrough.
  • "Your API key should be printed out to stdout along with some test data."
    • API key is not printed (the print command is commented out).
  • "Modify github_api_resource in github_api.py to request issues data from your GitHub project's API:"
    • Code in docs and actual code do not match.

Code in docs:

@dlt.resource(write_disposition="replace")
def github_api_resource(api_secret_key: str = dlt.secrets.value):
    url = "https://api.github.com/repos/dlt-hub/dlt/issues"
    ...

Actual code:

@dlt.resource(write_disposition="append")
def github_api_resource(
    api_secret_key: str = dlt.secrets.value,
    org: str = "dlt-hub",
    repository: str = "dlt",
):
    ...
    api_url = f"https://api.github.com/repos/{org}/{repository}/issues"
    ...
  • "Uncomment the commented out code in main function in github_api.py, so that running the python github_api.py command will now also run the pipeline:"
    • Code in docs and actual code do not match. User has to replace source() with github_api_source().

Code in docs:

if __name__=='__main__':
    ...
    load_info = pipeline.run(github_api_source())
    ...

Actual code:

if __name__=='__main__':
    ...
    load_info = pipeline.run(source())
    ...
  • "Let's explore the loaded data with the command dlt pipeline <pipeline_name> show."
    • The given command dlt pipeline github_api_pipeline show throws an error. The pipeline name github_api_pipeline does not correspond with the actual pipeline name github_api.

These inaccuracies lead to friction when going through the walkthrough. This walkthrough is probably one of the early touchpoints for new (potential) users, and should be as seamless as possible.

Are you a dlt user?

Yes, I'm already a dlt user.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions