Replies: 4 comments 1 reply
-
@cstruct here your answers:
We have a kick off session with the maintainers, me or some other maintainers will write back here after we discuss some open points. |
Beta Was this translation helpful? Give feedback.
-
Hi 👋 new-ish dbt-athena user here. I've been having some problems with dbt-athena updates lately, which I think boil down to that "future of the project" point in respect of what kind of PRs you accept, how they are implemented, and how you classify and handle resulting breaking changes. It's a bit of a convoluted and multi-issue story so I'll try and set it out for context before making some suggestions. ContextI'm a consultant working with clients in AWS at the moment (more about that at https://tempered.works/ if it's useful) and using dbt with Athena - had a good experiences with dbt on BigQuery over the past couple of years. I'm currently using [email protected] for multiple pipelines that have over 1k operations in a I always advocate for least-privilege permissions, in line with AWS best practice. Most if not all clients I'm aware of explicitly require a least-privilege approach and have processes in place to agree changes to permissions - typically these are time-consuming and there is always the chance that the client will decline the change. When we set up the first dbt pipeline, we figured out quite quickly what permissions were needed to run dbt-athena, agreed the permissions and got up and running. All was well for a couple of months. We try to keep our supply chains up to date (another typical infosec requirement), including packages like dbt-athena. When I tried to update from 1.3.5 a few weeks ago, my pipeline stopped working, throwing a permissions issue. The timing was problematic and there were no known vulnerabilities to address so we raised a ticket to look at as soon as time allowed. Issues Since v1.3.5Spin forward to now when I picked up that ticket a couple of weeks ago. I found the issue was down to a permission
Seems fair - I can probably make that argument without any problems. It's a PITA and I'd rather not spend time on it but OK. As it happened, I'd already forked the repo wanting to understand the problem and anticipating submitted a PR and I disabled the problematic call. Then my seeds blew up. Now I need S3 list and create permissions. Hence issue #279, tracking down the problem to PR#161 - to support large seed files. A tricky one to fix as the behaviour had been completely rewritten but I did my best and added some tests in PR#283. Next, issue #285 to deal with the need for S3 delete permissions to clean up iceberg tables. I haven't looked into it yet but I have a feeling that PR#249 in v1.4.5 will break me again as it'll need permissions to drop tables via Glue instead of via Athena. 😭 Point is - I'm not whining over a one-off. Each time I fixed something and rolled my fork forward I found a new issue. Had I taken the original issue and recommendation at face value, I would simply have run into more permissions issues once Wider Supply Chain ProblemsAs I mentioned, I also raised PR#277 to make the dbt-athena maintainers aware of a dependency I'd introduced in dbt-external-tables. I was unsure how best to handle the coupling (I picked up and old PR against the original SuggestionsI'm aware of and very sympathetic to the challenges of open source maintanance and I 👏 🙇 ❤️ you folks for taking that on. I want you to be successful and I applaud your goal of avoiding a need for any forks in the future. Breaking Changes PolicyIs there any doubt about whether these permissions changes are "breaking changes"? For me - if my pipeline works under one version of dbt-athena and breaks under a future version then it's a breaking change. The most important element of software that I (and by extension my clients) need is reliability - far more important than new features. A simple policy around how breaking changes are handled could be helpful - my original assumption was that an issue highlighting a breaking change and a PR to fix it would be accepted (subject to normal PR quality considerations) as a priority without argument. I have no problem raising PRs to patch up the problems but I've stopped as I became unconvinced they would be accepted. On that subject - I'm lucky in having a software development background, so rolling up my sleeves and diving in to fix stuff is no problem (subject to time constraints). When I move on to my next engagement, there may not be anyone remaining who has that skillset, which is a significant consideration for the advice I give to clients. I think it's fair to say that use of dbt-athena implies knowledge of SQL and how Athena works, but does not imply any skills in Python or macro debugging. In the event of a breaking change, that might be catastrophic for a team lacking those skills. I'm not sure what specific actions I suggest here - but prioritising fixes to breaking changes over new functionality seems sensible? Explicit Docs and Testing of IAM roleSpecifying an IAM role that should run dbt-athena in the codebase should be pretty straightforward. That policy can be used in testing to ensure the permissions footprint hasn't moved. I'll raise an issue for this and I can use the policy my team uses to define that minimal permissions footprint - it'll help with my PRs! (DONE #291) Automated Integration TestingI can't see evidence of automated integration testing taking place. Guessing that's because an AWS account for that purpose would be risky given the inability to set an actual billing limit? I can raise an issue for that too and look into how to solve it safely (curious about this myself!) Thanks!Sorry about the word count here - aside from my own problems, I can see downloads of dbt-athena steadily increasing and a distribution of versions being downloaded last week all the way from 1.3.1 to 1.4.5. I suspect these challenges are coming sooner or later and hopefully I'm helping you get ahead of it. Thanks for your hard work on dbt-athena! |
Beta Was this translation helpful? Give feedback.
-
Sorry, missed a suggestion Documented Feature Acceptance CriteriaHaving some criteria over whether a new feature (as opposed to a fix) will be accepted might be a good idea, not just for consistency, but to minimise risk of breaks and keep the complexity under control. Also useful to help avoid contributors investing time in something that won't be accepted. I think I see a pattern in dbt-athena where new features are added with the best of intentions, but for convenience or based on a perception without evidence. In adding those features, there are often subtle creeps in the permissions required or unintended consequences leading to subsequent conditional processing and exception handling. I know this will come off as critical and I apologise - but I feel I need to call out an example to justify the claim. I'll call out the recent switch to use Glue APIs to drop tables instead of dropping via Athena SQL. As best I can trace the lineage of that change, I can't find any evidence of a real-world problem that is addressed (I can't think of a pipeline I've been exposed to where dropping tables was a performance problem - it's the tests and materalizations that take the time). It did add a need for both Glue and S3 permissions - even in the case of Iceberg tables where AWS assert that:
This led to a confusing conversion for me with @nicor88 here because I expected dbt-athena to mirror Athena's behaviour, not Glue's. So to my understanding, a feature was added that didn't add any needed functionality or address any concrete issue, but will break permissions-sensitive consumers (i.e. me) and did add a need for confusing code to clean up Iceberg tables when they would be cleaned up by AWS if dropped via Athena SQL. |
Beta Was this translation helpful? Give feedback.
-
@nicor88 @thenaturalist I noticed that this project seems to have moved to the @dbt-labs org, and |
Beta Was this translation helpful? Give feedback.
-
First I'd like to open with that it's great that someone has taken the initiative to fork and revive this adapter under an organization. Hopefully this can sustain active maintenance and all of us avoid multiple abandoned forks.
@nicor88 requested me to submit my quoting support PR from the old repo here which I have done, this got me thinking about the governance of this project. As it stands there are no public members of this organization, there is no clear contribution guidelines or any code of conduct.
If this is to become a healthy project that people will adopt instead of their own forks I think there are some things that would be good to address:
Beta Was this translation helpful? Give feedback.
All reactions