Skip to content

refactor common sources to work with dlt+ #624

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

djudjuu
Copy link
Contributor

@djudjuu djudjuu commented May 22, 2025

  • improving, documenting, or customizing an existing source (please link an issue or describe below)

Related Issues

note for reviever

assuming you have checked out verified-sources at the same height as dlt-plus you can do this in the playground folder to test them

- ➜  playground git:(feat/runDLT/betterRunnerv0) ✗ uv run dlt project init kafka duckdb --location ../../verified-sources --force 

then check dlt.yml and .dlt/secrets.toml for changes

@djudjuu djudjuu force-pushed the feat/623/refactor-sources-to-work-with-dlt-project branch from 8bfdf17 to e50a666 Compare May 22, 2025 17:14
@@ -28,7 +29,7 @@


@dlt.source
def asana_source() -> Any: # should be Sequence[DltResource]:
def asana_source(access_token: str = dlt.secrets.value) -> Sequence[DltResource]:
"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can just add the access token here, even though its not used, so that it shows up in the config
or remove it from the resources too and pass it down? but that seems less elegant

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it is not used, why add it? it will confuse users that do not look into the code and set a value there. Why do you want to add it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every resource needs this secret, so if they dont find it they wont run.
but as the code derives the secrets and configs from the source that gets linked (the method above) it won't pick it up otherwise. (or is that a bug? )

marcin mentioned the clean way to write sources would be to give the secret to the source and pass it to each resrouce, but I think in this example it's actually cleaner as it is

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you should forward the token to the resources by instantiating them. Just putting it there without consuming it is an antipattern and I don't think it will work if users create named sources.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do you like the idea of moving all resource definitions into the source method?
That was Alona's idea, who said we want to encourage users to group resources anyways...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that way access_token can just be a property of the source and all methods inside have access to it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for changing this. I think with the newest changes from @rudolfix on the core devel branch, inner resources now also pick up configuration values automatically, but let's keep it like this for now.

Copy link
Collaborator

@sh-rp sh-rp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I have some small quesionts

@@ -28,7 +29,7 @@


@dlt.source
def asana_source() -> Any: # should be Sequence[DltResource]:
def asana_source(access_token: str = dlt.secrets.value) -> Sequence[DltResource]:
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it is not used, why add it? it will confuse users that do not look into the code and set a value there. Why do you want to add it?

from dlt.sources import DltResource

from .helpers import get_stats_with_retry, parse_response


@dlt.source(name="bing_webmaster")
def source(
site_urls: List[str] = None, site_url_pages: Iterable[DictStrStr] = None
site_urls: List[str] = dlt.config.value,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should be "optional", no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. If they were not present, what page would you want to get the traffic statistics for then?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are right. But we have to update the docstring of this source then, it makes it seem as though they are optional.

Copy link
Collaborator

@sh-rp sh-rp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good, but there is at least one test you need to adapt, please check the output. Can you also talk to @anuunchin to figure out which test failures are expected because we do not have credentials and which ones you can't ignore?

Also did you test these changes out with dlt+ to see wether all verified sources can be at least created?

@djudjuu djudjuu force-pushed the feat/623/refactor-sources-to-work-with-dlt-project branch from 6d9da63 to 40e0f16 Compare June 4, 2025 09:51
@djudjuu
Copy link
Contributor Author

djudjuu commented Jun 4, 2025

Also did you test these changes out with dlt+ to see wether all verified sources can be at least created?

so this works for all:
entity_factory.get_source(source)

this works for some:
entity_factory.get_source(source)()

=> results in
image

@djudjuu
Copy link
Contributor Author

djudjuu commented Jun 5, 2025

on the failing tests:
postgres/duckdb:
failing due to rate limit
redshift failed due to some really long event, -> went from airflow apache repo to duckdb for github events api https://api.github.com/repos/duckdb/duckdb/events?per_page=100

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sources should show up nicely in dlt project
2 participants