Skip to content

Fix joins to include partial null results #767

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

jbrooks-stripe
Copy link
Collaborator

@jbrooks-stripe jbrooks-stripe commented May 24, 2024

Summary

We found some online/offline inconsistencies with GroupBys when using multiple keys where some contained nulls. The online behavior would aggregate on (key1, null), whereas the offline would result in nulls if any keys were null.

This corrects the offline joins to match the online behavior of partial aggregations.

Why / Goal

Test Plan

  • Added Unit Tests
  • Covered by existing CI
  • Integration tested

Checklist

  • Documentation update

Reviewers

@jbrooks-stripe jbrooks-stripe force-pushed the jbrooks-partial-null-results branch from 8f4f080 to f9a70ee Compare May 24, 2024 20:39
Copy link
Contributor

@nikhilsimha nikhilsimha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for the fix! There are some UT errors.

@jbrooks-stripe jbrooks-stripe marked this pull request as ready for review May 28, 2024 16:45
smcnamara2-stripe added a commit to smcnamara2-stripe/chronon that referenced this pull request Mar 10, 2025
* [PyApi] Fix group by validation NoAgg (#620)

* [PyApi] Fix group by validation NoAgg

* More test

* Add Flink module (#606)

* [wip] Add FlinkSource

# Conflicts:
#	build.sbt

* current wip

* Minimal working solution with tests

* Refactor + add SparkEval tests

* Scalafmt fixes

* Update Flink to only build on 2.12

* Tweaks to make build happy

* Try using getExecutionEnv to see if it passes CI

* Yank flaky test for now

* Refactor GroupByServingInfoParsed

# Conflicts:
#	flink/src/main/scala/ai/chronon/flink/AvroCodecFn.scala
#	flink/src/main/scala/ai/chronon/flink/FlinkJob.scala

* Fix build

* Use version matrix for Flink

* Address PR comments

* Add scaladocs, fix more review comments

# Conflicts:
#	flink/src/main/scala/ai/chronon/flink/AsyncKVStoreWriter.scala

* DailyFileCountEstimate: Smooth denominator to avoid being zero (#622)

* smooth the denominator

* comment

* Changes to allow PRs from forks

Signed-off-by: Nikhil <[email protected]>

* Update config.yml

Signed-off-by: Nikhil <[email protected]>

* Update config.yml

Signed-off-by: Nikhil <[email protected]>

* Update config.yml

Signed-off-by: Nikhil <[email protected]>

* Update config.yml

Signed-off-by: Nikhil <[email protected]>

* Update config.yml

Signed-off-by: Nikhil <[email protected]>

* Update config.yml

Signed-off-by: Nikhil <[email protected]>

* Update config.yml

Signed-off-by: Nikhil <[email protected]>

* Address slashes in branch names

* Create readme and other OSS material (#624)

* Creating docs and runnable quickstart

* Adding fake generated data to the quickstart (#628)

Adding data to quickstart and a few bugfixes.

* WIP (#631)

* add bulk merge for simple aggregations (#608)

* add bulk merge for simple aggregations

* default bulkMerge and fix test

* update map column agg

* use iterator

* remove arraybuffer

* use foldleft

* remove row agg

* follow up on zero partitions (#625)

* Update to not use $SHELL_FILE variable (#632)

Quickfix to readme instructions

Signed-off-by: Varant Zanoyan <[email protected]>

* Override start partition for backfill jobs for customized range backfill (#611)

* wip

* override start partition

* fix flake8 and test

* null check

* address comments.

* comments

* Replace println statements with a real logger (#636)

* Replacing println statements with a logger

* change log level to error (#639)

* change log level to error

* address comments.

* scalafmt

* clean up

* [Chronon][Logging] Remove logging backend (#640)

* Remove logging backend

* Remove trailing , for 2.11

* Setting version to 0.0.60

* Setting version to 0.0.61-SNAPSHOT

* Fix staging query early exit (#643)

* remove system exit

* clean up

* Setting version to 0.0.61

* Setting version to 0.0.62-SNAPSHOT

* set default

* [QuickStart] Docker Compose to create full prod environment. (#637)

* [QuickStart] Docker Compose for Chronon Components

* Add env

* Attempt to setup warehouseDir and metastore in spark env variables

* Load data with schema

* More updates

* Oh

* test

* New approach

* Remote working

* Only local

* Local

* Noentry

* OnlineImpl

* WIP

* Online Impl: coded

* More utils

* WIP

* WIP

* Provide

* Logging

* WIP

* Add readme

* End2End

* WIP

* Remove compiled files

* Prod quickstart files

* Simplify

* Delete confs for quickstart

* Remove from gitignore

* Move run.sh

* Quickstart V0

* Adding log flattener

* Stats + OOC + logStats + Logged table

* Update README

* Streaming WIP

* Online Streaming

* Log chronon

* typo

* Update with readme instructions

* Add more instructions

* remove build.sbt changes

* Fix port

* Load data on init

* Remove versions

* Additional comment on sbt assembly for online jar

* Update the number of sources available for Chronon in the doc. (#618)

Co-authored-by: Henry Saputra <[email protected]>

* Fixing command in quickstart README (#647)

* Adding log4j properties (#649)

* Update main readme to use docker flow (#648)

* Making README use dockerized flow

* Adding admin docs and examples

* adding more docs

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* Update api/py/test/sample/group_bys/quickstart/purchases.py

Co-authored-by: Nikhil <[email protected]>
Signed-off-by: Varant Zanoyan <[email protected]>

* Update docs/source/Aggregations.md

Co-authored-by: Nikhil <[email protected]>
Signed-off-by: Varant Zanoyan <[email protected]>

* Update docs/source/Aggregations.md

Co-authored-by: Nikhil <[email protected]>
Signed-off-by: Varant Zanoyan <[email protected]>

* Update docs/source/authoring_features/Source.md

Co-authored-by: Nikhil <[email protected]>
Signed-off-by: Varant Zanoyan <[email protected]>

* Update docs/source/authoring_features/Source.md

Co-authored-by: Nikhil <[email protected]>
Signed-off-by: Varant Zanoyan <[email protected]>

* Update docs/source/authoring_features/StagingQuery.md

Co-authored-by: Nikhil <[email protected]>
Signed-off-by: Varant Zanoyan <[email protected]>

* Update docs/source/getting_started/Concepts.md

Co-authored-by: Nikhil <[email protected]>
Signed-off-by: Varant Zanoyan <[email protected]>

* WIP

* Fixing todos in docs

* Fixing links

* Update docs/source/authoring_features/StagingQuery.md

Signed-off-by: Nikhil <[email protected]>

* WIP

* [Stats] derivations (#476)

* [Stats] Derivations

* 2.11 friendly

* Docsite - new theme sphinx book theme and updated logo

* toc sections

* css clean up

* fixes

* fixes

* Fixing a number minor issues in documentation (#656)

* Fixing a number minor issues in documentation

* CHIP-1 Online IR and GetRequest Caching (#629)

* Add CHIP-1

* Update

* Small tweaks, fixes

* Small tweaks, fixes

* Add diagrams

* tweak diagrams

* Update date

* Change: also apply caching to no-agg and snapshot accurate code paths.

* Add Step 1 Prod results

* Minor fixes to documentation (#659)

Minor documentation fixes

* Proofread edits to docs made while reading through docs (#658)

* Proof-reading docs

Co-authored-by: Varant Zanoyan <[email protected]>
Signed-off-by: mears-stripe <[email protected]>


---------

Signed-off-by: mears-stripe <[email protected]>
Co-authored-by: Varant Zanoyan <[email protected]>

* Vz  update main readme (#661)

* Making main project README consistent with quickstart

* Modifying the readme pointer

* [Release] 0.0.62 (#660)

* Release

* Version bump

* Improving Documentation for StagingQuery (#663)

* Update docs/source/authoring_features/StagingQuery.md

Co-authored-by: Pengyu Hou <[email protected]>
Signed-off-by: Varant Zanoyan <[email protected]>

---------

Signed-off-by: Varant Zanoyan <[email protected]>
Co-authored-by: Pengyu Hou <[email protected]>

* [Stats] Include left on backfill stats (#664)

* [Stats] Include left columns on backfill

* Update test

* More cleanup

* Reset 2.12 as main build

* Add forcebackfill mechanism

* Scalafmt

* Version compat exporter test

* Add tiling to fetcher codepaths (#531)

* wip

tiled ir basic impl

change stripe wording

rebase

wip

wip

use groupby metadata to check instead of param

cleanup

test added and working

rm unnecessary json in tests

failing test

fix col agg access

add more comprehensive tests

address pr feedback

* clean up indices defined in tests

* move key encoding to function in kvstore

* fix tilecodec linter error

* fix linter error asJava->toJava

* rm comma from row end

* fix tilecodec tests

* rm comma

* clean up comment

* fixes after rebase

* add dataset distinction

* clean up naming

* fix build issues

* scalafmt

* fix build issues

* rm old func

* address most pr feedback

* [Doc Bash] Add Derived Features section  (#665)

* start

* add

* refine

* Bring back mis-typed-key handling + add tests

* fix up mis-typed key in responses

* set table properties and output namespace for join source

* pep8

* wip

* clean up

* clean up

* wip

* fix failed ut

* clean up

* use TsUtils

* fix failed ut

* Default log level should be info

* Add odm & label doc

* fix

* doc update

* fix link

* Update docs/source/authoring_features/Bootstrap.md

Co-authored-by: Varant Zanoyan <[email protected]>
Signed-off-by: Sophie <[email protected]>

* Update docs/source/ODMScenarios.md

Co-authored-by: Varant Zanoyan <[email protected]>
Signed-off-by: Sophie <[email protected]>

* Update docs/source/ODMScenarios.md

Co-authored-by: Varant Zanoyan <[email protected]>
Signed-off-by: Sophie <[email protected]>

* Update docs/source/ODMScenarios.md

Co-authored-by: Varant Zanoyan <[email protected]>
Signed-off-by: Sophie <[email protected]>

* Update docs/source/LabelJoin.md

Co-authored-by: Varant Zanoyan <[email protected]>
Signed-off-by: Sophie <[email protected]>

* Update docs/source/ODMScenarios.md

Co-authored-by: Varant Zanoyan <[email protected]>
Signed-off-by: Sophie <[email protected]>

* fixes

* merge docs to Join.md

* update

* more update

* remove case number

* Early exit on pre-filled ranges + repl testing

* exception handling fix

* Update twine command to be more precise

With the new pip resolve, uploading tar balls will cause inconsistency, so we have to upload the wheel only.

In theory twine should throw an error when tar is being uploaded, but instead it silently allows the upload.

Signed-off-by: Nikhil <[email protected]>

* Setting version to 0.0.64

* Setting version to 0.0.65-SNAPSHOT

* Add more logging and error handling to partition pruning

* [StatsFetch] Ignore bad decodes (#677)

* use FastDateFormat in TsUtils.scala (#676)

* [DocBash] adding chaining feature documentation (#669)

* doc bash

* address comments.

* Update docs/source/authoring_features/ChainingFeatures.md

Co-authored-by: Varant Zanoyan <[email protected]>
Signed-off-by: Pengyu Hou <[email protected]>

* Update docs/source/authoring_features/ChainingFeatures.md

Co-authored-by: Varant Zanoyan <[email protected]>
Signed-off-by: Pengyu Hou <[email protected]>

* Update docs/source/authoring_features/ChainingFeatures.md

Co-authored-by: Varant Zanoyan <[email protected]>
Signed-off-by: Pengyu Hou <[email protected]>

* Update docs/source/authoring_features/ChainingFeatures.md

Co-authored-by: Varant Zanoyan <[email protected]>
Signed-off-by: Pengyu Hou <[email protected]>

* Update docs/source/authoring_features/ChainingFeatures.md

Co-authored-by: Varant Zanoyan <[email protected]>
Signed-off-by: Pengyu Hou <[email protected]>

* Update docs/source/authoring_features/ChainingFeatures.md

Co-authored-by: Varant Zanoyan <[email protected]>
Signed-off-by: Pengyu Hou <[email protected]>

* added in index.rst

---------

Signed-off-by: Pengyu Hou <[email protected]>
Co-authored-by: Varant Zanoyan <[email protected]>

* Enable parallel backfill in run.py (#672)

* wip

* wip

* wip

* added ut

* fix indent

* added blank line

* simplification

* Update run.py

update description. 

Signed-off-by: Pengyu Hou <[email protected]>

---------

Signed-off-by: Pengyu Hou <[email protected]>

* [Release] 66

* prep for renaming master branch to main

* undo changes to non chronon refs

* undo changes to non chronon refs

* convert assignment to equality check

* Fix some Markdown violations in CONTRIBUTE (#687)

As above, I noticed some rendering issues in the GitHub UI and followed
(most of) the linter recommendations to address. Also the link from the
README was broken.

* Remove github actions - we have circle ci (#689)

* Deprecate spark 2_11 tests (#691)

* Deprecate spark 2_11 tests

* remove ref

* Remove embedded from our build (#692)

* Remove embedded from our build

* nits

* Revert "Remove embedded from our build (#692)" (#693)

This reverts commit 596dacc03b04c46d62e949712930aa538351a93f.

* Update run.py (#696)

add missing space

Signed-off-by: Pengyu Hou <[email protected]>

* Add tiled implementation of the Flink app (#627)

* Add custom triggers

* Move triggers

* Add KeySelector

* Comments

* Rename tiling package to window

* WIP runTiledGroupByJob

* Comment-out AsyncKVStoreWriterTest.scala ? question mark ?

* Add ChrononFlinkRowAggregators

* Refactor AvroCodec slightly

* Add TiledAvroCodecFn

* Add LateEventCounter

* Finish runTiledGroupByJob

* Add ChrononFlinkRowAggregationFunctionTest

* Add missing @Test decorator

* Add KeySelector tests

* Add e2e tiled test

* Scalafmt

* Comments

* Uncomment AsyncKVStoreWriterTest

* Remove slot sharing so that test finally halts

* Tweak strings in key selector test

* Rename files, change comments

* keyToBytes in process function should convert to array first

* Refactor tiled Flink test, use watermark strategy

* Improve e2e test so that we check actual tile IRs

* rm debug=true

* Use log4j

* Remove comment

* Minor clean up, change comments

* scalafmt

* Add missing getSmallestWindowResolutionInMillis

* Add missing tiledCodec

* Enable debug logs it tests

* Info instead of debug

* Fix lack of isolation in test sink

* Make BaseAvroCodecFn abstract

* Update FlinkJob comments

* Comment

* Move getSmallestWindowResolutionInMillis to GroupByOps

* Use new GroupByOps method, fix mistake

* Revert "Move getSmallestWindowResolutionInMillis to GroupByOps"

* Use toScala, use multiline strings

* Use logger.debug

* Scalafmt

* Fix compile error

* Add documentation on the tiled architecture and the Flink job (#657)

* WIP tiled architecture

* Add diagrams

* First draft done

* Typos

* Type

* Update Tiled_Architecture.md based on comments

* Split into two docs, Flink and Tiling

* Add numbers for latency decrease

* Minor changes addressing PR comments

* Proofreading

* feat: isolating Join Part computation (#684)

* feat: allow only backfill selected join parts

* unit test

* add logging

---------

Co-authored-by: Donghan Zhang <[email protected]>

* Rename CONTRIBUTE to CONTRIBUTING (#703)

Trivial change so GitHub picks it up and shows the guidelines e.g.
on github.com/airbnb/chronon/contribute

https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/setting-guidelines-for-repository-contributors

* [TableUtils] Prevent hard failures on duplicate setups (#681)

* [TableUtils] Prevent hard failures on duplicate setups

* Return a dataframe?

* Add brickhouse resource

* Resource: jar renaming

* Use a local UDF

* Spark 2.4/Spark 3 friendly solution

* Extraneus input

* [Driver] Allow AtMillis on fetch CLI (#697)

* [StagingQuery] allow java serializer for staging queries (#694)

* [StagingQuery] allow java serializer for staging queries

* Add argument to javaFetcher

* Update version.sbt (#707)

SNAPSHOT should be in the suffix

Signed-off-by: Cristian Figueroa <[email protected]>

* Add local env runtime variables override for local testing  (#698)

* add env variable

* remove unused change

* remove unused change

* add new line to the end of the file

* update unit tests

* fix bugs in run.py

* fix bug

* use prod env for default team

* Update api/py/ai/chronon/repo/run.py

Co-authored-by: Nikhil <[email protected]>
Signed-off-by: hanyuli1995 <[email protected]>

* Update api/py/ai/chronon/repo/run.py

Co-authored-by: Nikhil <[email protected]>
Signed-off-by: hanyuli1995 <[email protected]>

* Update api/py/ai/chronon/repo/run.py

Co-authored-by: Nikhil <[email protected]>
Signed-off-by: hanyuli1995 <[email protected]>

* Update api/py/ai/chronon/repo/run.py

Co-authored-by: Nikhil <[email protected]>
Signed-off-by: hanyuli1995 <[email protected]>

---------

Signed-off-by: hanyuli1995 <[email protected]>
Co-authored-by: yuli_han <[email protected]>
Co-authored-by: Nikhil <[email protected]>

* Fix build by removing trailing whitespace (#714)

* Join Spark optimization (#705)

* optimizations for small joins, and option for iceberg table creation

* Use python3 explicitly for twine uploads + Update package version (#715)

* chore: add pre-commit and the hooks (#717)

* feat: add pre commit

* requirements

* documentation

* blank line

---------

Co-authored-by: Donghan Zhang <[email protected]>

* Update readme.md

Signed-off-by: Nikhil <[email protected]>

* Setting version to 0.0.68

* Setting version to 0.0.69-SNAPSHOT

* Add Flink and Tiled Architecture docs to the website (#708)

Add Flink and Tiled Architecture to the website

* Updating Chronon site with the new logo

* dark bg logo

* into and index should be same

* _static prefix to images

* normalize intro and index

* py lint

* Fix CatalystUtil conversions to handle array inputs (#711)

# Conflicts:
#	online/src/test/scala/ai/chronon/online/test/CatalystUtilTest.scala

* Addressing feedback from Stripe's OSS team (#713)

* OSS prep

* Fixes

* Fix

* Fixes

* Add reference to chronon.ai

* Rephrase

* correctlly link to chronon.ai

* Add Caio and Divya to AUTHORS

* Derivation error handling to be non-blocking and return partial responses  (#716)

* derivation partial response

* add a comment

* wip

* fix compilation error

* clean up

* renaming

* wip

* scalafmt

* more logging

* apply rename when derivation fails

* Fix code formatting in Flink.md (#720)

Fix formatting of code in Flink.md

* Setting version to 0.0.69

* Setting version to 0.0.70-SNAPSHOT

* Update devnotes.md (#722)

Signed-off-by: Pengyu Hou <[email protected]>

* [Fetcher] Don't throw for old query time; Use datadog to count it instead (#724)

use datadog to measure old query timestamp

* Update version to 0.0.70 (#726)

* resolve

* Setting version to 0.0.71-SNAPSHOT

---------

Signed-off-by: Pengyu Hou <[email protected]>

* [StagingQuery] Set sparkSession to not use Kryo in Driver. (#710)

* Final fix

* added comments.

Signed-off-by: Pengyu Hou <[email protected]>

* scalafmt

---------

Signed-off-by: Pengyu Hou <[email protected]>
Co-authored-by: Pengyu Hou <[email protected]>

* [Fetcher] Adding feature name to the exception message (#727)

resolve conflicts

* Support derivation for groupBy online fetch (#712)

* add initial code for group by derivation

* resolve conflict

* fix bug

* resolve conflict

* format

* remove unused log

* resolve conflict

* fix bug

* add tests

* fix bugs

* resolve conflict

* resolve conflict

* remove duplicate code

* reformat

* fix unit tests

* update the error handling

* format

* error handling behavior update

* fix bug

* reformat

* fix bug

* remove conflict code

* fix bug

* resolve conflict

* format change

* fix bug

* add exception metrics logging

---------

Co-authored-by: yuli_han <[email protected]>
Co-authored-by: yuli_han <[email protected]>

* Refactor Driver.scala and expose skip first hole as param  (#729)

* driver refactor

* adding group by name

* BaseJoin -> JoinBase

* chipping away at compile issues

* more changes, tests still failing

* some logs, analyzer failures

* explicitly passing in tableutils to DfWithStats

* debugging

* i guess keymapping isn't getting applied unless renamedLeftDf is invoked?

* ran down the analyzer issue

* reduce row count in fetchertest

* fixing lag and chainingfetchertest

* reducing mutation count in test

* still struggling to reproduce

* halfway figured out the validation data availability issue

* bump version

* Fix where clasuses for LHS non partitioned temporal events render query (#211)

* Fix where clasuses for LHS temporal events render query

* Add unit test to verify result

* fix exception if no wheres provided

* Filter out India data (#210)

* Filter out India data

* empty -> null

* temporally disable testing in CI

* Undo temporarily disabling tests in CI

* Remove unused imports

* Add APPROX_HISTOGRAM_K Operation (#207)

Add APPROX_HISTOGRAM_K Operation

* Update date range test (#213)

* Update date range test

* Fix broken test

* rename namespace

* Fix bug in CatalystUtil causing `where`s to not be applied (#739) (#218)

* Add catalyst util test

* Apply Piyush's patch

* No-op changes

* Make test succeed & update name

* Reduce repetition

* Add test for correct filtering

* Add unsafe projection to fix tests

* Add comment

Co-authored-by: Caio Camatta (Stripe) <[email protected]>

* Add skew and kurtosis operations (#217)

* Update ApproxHistogram operation to only use approx values above k (#219)

* Databricks Integration V0 (#206)

* Add some hacks to get databricks notebooks working

* Add run join to pysparkutls

* Allow time column to be overrideable

* add new table utils temporarily to runJoin

* add new table utils temporarily to runJoin

* parsed strings have quotes in them for some reason

* add checks to python side to verify setup is correct for running in a notebook

* Add run methods for group bys + joins

* Finished adding analyzer + validator

* Remove unnecessary imports

* Fix comments in PySparkUtils

* Revert unneeded change to TableUtils

* Fix team + name for join

* Fix typo

* Init constants provider when the Notebooks wrappers are created

* Spark 3.3.0 changes. Also cleaned up databricks extensions.

* Successfully validating + analyzing + running a gb

* Push changes to databricks extensions + pyspark utils b4 opening separate pr to have analyzer + validator return case class

* Stop returning java obj for validate

* Add print stmts to hold us over until we figure out logging

* Make requested changes from @mears

* Change output namespace to

* switch to base table utils

* All set for merge

* bumb version

* Update spark/src/main/scala/ai/chronon/spark/PySparkUtils.scala

Co-authored-by: Ben Mears <[email protected]>

* Update spark/src/main/scala/ai/chronon/spark/PySparkUtils.scala

Co-authored-by: Ben Mears <[email protected]>

---------

Co-authored-by: Ben Mears <[email protected]>

* Add metrics around constructGroupByResponse ops (#223)

* one more fix

* saveWithTableUtils

* missed a python change

* adding else

* tests back in

* didn't pass stuff all the way in

* testing bootstrap issues

* undoing some debugging

* one more log

* integrating FlagStore (https://github.com/airbnb/chronon/pull/686/)

* don't try to find module name when name is set (#225)

* Don't try to get the module_name if name is specified

* Bump version

* Remove use of DBUtils (#226)

* Have Shepherd Databricks use DatabricksTableUtils so that we can write iceberg tables to `chronon_poc_usertables` (#228)

* Have Join use DatabricksTableUtils

* Successfully writing iceberg tables to chronon_poc_usertables for Joins

* Clean up code

* Remove comment

* Update table utils to point at DatabricksTableUtils and update print stmts for validate + see cluster details

* Remove unused imports

* Update version + namespace assertion

* Stop using implicit base table utils in group by backfill

* Handle case of no table properties

* All working for GroupBys + Joins

* Fix typo

* Shepherd Databricks Raw S3 Prefix Support (#229)

* Shepherd Databricks Raw S3 Prefix Support

* Create S3Utils

* Shepherd Databricks Staging Query Support  (#230)

* Shepherd Databricks Raw S3 Prefix Support

* Add Staging Query for Databricks Notebooks

* Create S3Utils

* Drop the table before the run to ensure we arent having two different queries populate the same staging query table

* Add assertion for the namespace

* Move databricks items to new package + create a contant provider for databricks + have databricks extensions .py point at the new constant provider (#232)

* Update approx histogram outputs to use Java Maps (#231)

* Bump shared Skycfg libraries to cadaa835ca180a13057f0922c7c050e99484f234 (#233)

Updated automatically using `/Users/prashanthpai/stripe/pay-server/config/kubernetes/bin/vendor-skycfg`
Upgrading shared Skycfg libraries in chronon.

Old version: e9700f69822c0e874b86380fe59ce400d404e34f
New version: cadaa835ca180a13057f0922c7c050e99484f234

Changes:
cadaa835ca180 Ban Consul for all Shared MSP services (#811568)
5037391ee62f4 Add mspaa to config_srv (#812783)
094df68881786 [sn-proxyless] implement inplace registration skyconfig (#812296)
7e98c8169ac59 Introduce skycfg interface to gem overrides (#811653)
d0fb2e3a25ae2 [go-profiler-sidecar] support multiple prof modes (#811639)
2101df0a1b281 [issuing] try updating mb per worker to fix oom in QA and Preprod monster-issuing-msp (#810958)
da4469de2517b Increase memory per worker pod for monster risk (#811919)
9db38d35b1fe2 [charges-render-srv] Enable host isolation (#788728)
bfea2ec8a9f9e allow users to specify the prof_modes (#810512)
b8c3a0fe71f30 Reapply "Scale Monster in CMH for HADR's full failover launch" (#804710) (#810503)
76b9ec9345d93 Add missing bapi pieces to bapling skycfg (#809089)
e778bacae7e08 Monster scaleup and transaction ack consumer scaleup  (#810160)
24472cc98a200 Simplify granularity for host isolation allowlist (#810352)
526cef201d10c [faas] Set max_surge to 20% for faas worker rolling deployments (#809952)
00ce282c00294 [kproxy-batch] bump Envoy file descriptor usage to 1.5X of fleet default (#809936)
bff0a5062668c Revert "Revert "Revert "Allow metrics-sidecar to accept a batch_write_size and num_workers parameter""" (#809894)
2949989baa619 Adding 50 pods to monster general in northwest due to increased traffic (#809448)
214d62341c54d Revert "Revert "Allow metrics-sidecar to accept a batch_write_size and num_workers parameter"" (#808509)
b1992991ea3d5 [consumer-bridge-srv] Enable host isolation (#790519)
184fbee70a3e9 [Event Manager] scale up Event Manager concurrency to gain parity with the new Revolve topic (#807195)
58e92ad366954 Increase memory on sigma workers (#808183)
22e7affc13fad Scaled Object Fixes + Immediate Scaling Support + Default Fallback (#806914)
72e3de977ca14 Revert "Allow metrics-sidecar to accept a batch_write_size and num_workers parameter" (#806163)
b77661286edd8 Handle None in Golden Config 'platform_options.disable_consul' (#805385)
57d90c80cd456 Allow metrics-sidecar to accept a batch_write_size and num_workers parameter (#805268)
95f41cab771b7 Enroll admin-srv in weekend down scaling (#805636)
aca841637cb40 Revert "Scale Monster in CMH for HADR's full failover launch" (#804710)
ed6888eb0c3eb Use KEDA Paused-Replicas annotation to pause scaling during deployments (#804612)
6558b706df7c9 Add new supported instance types to shared MSP (#803001)
cdd741983a614 don't let any non-QA small worker cluster go below nominal (#803942)
7536b05861d0e avoid harm to QA Monster general during deployments (#803967)
c19fe14217311 scale up Monster workers in QA northwest (#803832)
33a624c62be67 don't let preprod consume worker clusters go below nominal (#803773)
bd091ef838522 Increase replica count for consumer replicas in monster general (#803601)
406f32c9f9caf Scale Monster in CMH for HADR's full failover launch (#803379)
86c31c92c9a01 config/kubernetes: add cronjob.active_deadline_seconds (#802391)
d132b36afb943 Add volume mount for /pay/conf in memcontainer (#802674)
41d0df68fff41 Scale down monster webhooks control plane in preprod (#802635)
4d9d73d8ee842 Update memcontainer bootstrap command to use proxy image. (#802198)

* Set output types for approx histograms to LongType (#234)

* Update output types of ApproxHistograms (#235)

* Fix params order for databricks join assertions (#236)

* Fix params order for databricks join assertions

* Bump version

* Andrewlee/ir coffee originate split up gbu (#237)

* grabbing changes from https://git.corp.stripe.com/stripe-private-oss-forks/chronon/pull/220/

* adding spark constant

* forgot to remove local spark hack

* only cache if we're splitting

* unpersist keyed before generating KvRdd

* no tests

* put tests back in

* fixing hourlyjointest

* fixes from merge

* adding debug logs for async feature logging

* [Revert] always print fetcher debug logs

* fixes from https://github.com/airbnb/chronon/pull/767

* initial PR feedback

* more PR feedback

* wip, working on catching exceptions

* Revert "wip, working on catching exceptions"

This reverts commit b4a55093c671162611d8fae7b7314c2c545eab49.

* Remove some todos

* Fix GroupByUpload rebase miss

* Remove tile layering feature flag check and enable it by default (#254)

enable tile layering by default

* Add unit test for tile aggregation with [6hr, 20m] tiles (#257) (#259)

* Add test

* use long

* Port over fix from #198 to new rebase location (#261)

* Remove feature flag check for online IR caching (#260)

Remove feature flag check for online IR caching (#247)

* Remove flag check for caching

* Fix tests

* Fix silly mistake

* fix the missing external features in the _comparison_v2 table

* Remove notebook name from prefix  (#264)

* Stop prefixing table name with the notebook name. It's causing too many issues.

* bump version

* don't need re anymore

* fix org.apache.spark.sql.AnalysisException: cannot resolve '' given input columns:  bug

* Revert "Merge remote-tracking branch 'origin/andrewlee/rebase-2024-03-25' into andrewlee/rebase-2024-03-25"

This reverts commit d403755b3b6ac884bf63e0fc84632ba808e346ca, reversing
changes made to a445d7072a37906b67d61e33ff6732e4fdb5687f.

Revert "fix org.apache.spark.sql.AnalysisException: cannot resolve '' given input columns:  bug"

This reverts commit a445d7072a37906b67d61e33ff6732e4fdb5687f.

Revert "fix the missing external features in the _comparison_v2 table"

This reverts commit d315bd7f4ca04f1797ed970b17fae93f925cdc6f.

* Remove notebook name from prefix  (#264)

* Stop prefixing table name with the notebook name. It's causing too many issues.

* bump version

* don't need re anymore

* fix consistencyJob external features always diffing at 100% (#271)

use comparisonDfNoExternalCols instead of comparisonDf
fix the missing external features in the _comparison_v2 table

* Audit Fetcher metrics - Add tags and remove redundant or unused metri… (#273)

Audit Fetcher metrics - Add tags and remove redundant or unused metrics (#269)

* tag kv store latency and join latency

* remove unnecessary kv store metrics

* remove more unnecessary kv store metrics

* Tag derivations

* More changes

* Fix compile error

* Change how join metrics are collected in fetchGroupBys

* Use ns

* rename var

* refactor

* add log to help debug schemahash missing issue (#279)

* Set default value of offline_schedule to None.

* Add model transformation python and thrift to rebase branch (#299)

* Add model transformation thrift objects + python constructors

* bump version

* won't use the python version for this branch so revert unecessary changes

* [rebase] Create and traverse avro schemas once per task (#301)

* Cleanup and isolate the schema perf changes

* add an _

* Also update inmemorystream

* Add option to force bloom filters (#320)

* Use String instead of Array comparisons in bootstrap covering sets (#323)

* Update Model Transformation thrift for rebase (#328)

* Update thrift file for rebase

* Move sha to inference spec

* Remove sha - have it be part of the model backend params

* [rebase] Custom spark scan parallelism (#317)

* Allow configuration of spark scan parallelism

* Use 1 task for meta rows

* Enable forcing additional bloom filters (#332)

* Fix join test flaky bug and add log (#331)

* add log for assertions in JoinTest to debug flacky tests

* fix the flaky test due to dfTemp not being used

* revert the change on ds <= '2021-01-01'

* Rebase branch gbu avoid count (#318)

* avoid calling count()

address comments

use Long instead of Int.Fix build error

fix build

fix build. use Long

fix build. Use different table name in tests

* try to avoid JoinTest failure

* add comments in JoinTest

* less pref version

* revert the stats.get.count

* add log for TableUtils.scala

* enhance the unit test assertions to debug the random failures

* fix the param readout when its set to an invalid value

* add log for debugging the local run

* fix bug. add println log

* add unit tests address comments

* remove tests that relies on println. CI/CD Build will not capture the printed lines

* fix unit tests using unique table names

* typo

* remove duplicated loglines

* Update date logic on rebase branch (#339)

* Update date logic in S3Utils

* Update date logic in S3Utils

* fixes

* Camweston/python api rebase (#333)

* Audit Fetcher metrics - Add tags and remove redundant or unused metrics (#269)

* tag kv store latency and join latency

* remove unnecessary kv store metrics

* remove more unnecessary kv store metrics

* Tag derivations

* More changes

* Fix compile error

* Change how join metrics are collected in fetchGroupBys

* Use ns

* rename var

* refactor

* Wrap BOUNDED_UNIQUE_COUNT Operation to take arguments (#280)

* Wrap BOUNDED_UNIQUE_COUNT Operation to take arguments

* Bump version

* Update Henson ownership info (#281)

* TTL cache retries more frequently on failures (#275)

* TTL cache retries more frequently on failures

* Testing

* Merge everything into one class

* Test with current thread executor

* Rollout new failure TTL (#288)

* Guard changed failure TTL behind feature flag

* Debug logs

* Change FF call to match Java API

* Add python + thrift changes for model transformations (#303)

* Add python + thrift changes for model transformations

* Fix thrift typo

* Enable skip validations for bootstrapping new tables that are leaving behind feature groups. (#302)

* Bump shared Skycfg libraries to f60cb6a0cb84b95aa6f63094df4d92f1c806a7db (#309)

Updated automatically using `/Users/hans/stripe/pay-server/config/kubernetes/bin/vendor-skycfg`
Upgrading shared Skycfg libraries in chronon.

Old version: 36eb4d2c0117e1b4d5dec4699313702ac37618a6
New version: f60cb6a0cb84b95aa6f63094df4d92f1c806a7db

Changes:
 * f60cb6a0cb84b Increase Ruby 3.3 VWA slot 1 -> 5M slots (#880645)
 * adcf09c400f0b Use next ruby for 100% of MSP services (#882695)
 * 254659a7e15c2 Finish confidant-sidecar cleanup (#881201)
 * 59f30141541b7 Revolve Monster Autoscaling: Set immediate autoscaling and lower minimum replica count to make difference more pronounced (#881862)
 * 1b3cde15f3f51 Scale down monster mobile hostset to reduce costs (#881981)
 * 76715884589e0 Revolve Monster: Add test scheduled scaling for CMH consume replicas (#881658)
 * 946e850e0a7c5 Shift 50% of upper priority m-b-s/a-r-r-s hosts to Ruby 3.3 (#880582)
 * ecbb87f1c4227 Remove confidant-init container (#879728)
 * 57386efb8fd88 Mark shards RC (#880469)
 * 665b93fc1834f Lower half of m-b-s/a-r-r-s -> 50% Ruby 3.3 (#880466)
 * 3bc80c21783fa [skycfg] Add label: henson.stripe.io/onebox-original-max-replicas (#879263)
 * 3c547f367a060 Remove MSP doc references to Confidant (#879654)
 * ec9c10f66b5fe Revert "Remove Confidant sidecar and confidant.sky" (#879603)
 * 78019d1941a4d Remove Confidant sidecar and confidant.sky (#878893)
 * 4ab03020c1bbd Add preferred tier to bom mspbb (#879112)
 * 43c9b9accf508 Remove confidant.sky secret usages (#877099)
 * 7efc01dcf61ae Add preferred availability tier to msp_shards.sky (#876868)
 * 487fd39a2b784 Reduce monster-sweeper instance size to r5_large in QA (#877007)
 * c744483d39d62 Add support for annotating sirn on MSP resources (#875749)
 * caa5b5fa9457a Remove unused flags from pandora.sky (#876834)
 * ea452726ae0c5 monster-general scaling (#876512)
 * 78c4c90e99ff0 Do not vendor kubernetes config in stripe-js-v3. (#876441)
 * 589c014b355aa Increase max for model-ingestion in bom (#876083)
 * d2467afe85fa0 Remove unused confidant-sidecar flags in `pay-server-cron_test.sky` (#875741)
 * 1cfa94d4214f5 Update responsible_team from async-processing to core-events (#875955)
 * e4fa41914cd65 Mark prod/east/mspaa GA (#875663)
 * b3416e4e243d3 Add workers label to bapi_service pods (#874057)
 * 66d971c6e8542 Enable autoscaling for monster workers general/qa/bom (#873836)
 * 1ad9ac1743cff Set up autoscaling for model-ingestion workers in BOM (#874515)
 * c0b238871fdb8 All A1xx (except super important) -> 100% Ruby 3.3 (#874039)
 * 8ef5d1371d5ff Mark preprod/cmh/mspcc RC (#873243)
 * fd894c3722ba1 Bump a1xx final slice to 50% Ruby 3.3 (#873366)
 * fe83130a3f639 Fix monster shard migrations in QA (#872949)
 * 9517a02fe6c12 Move final set of a1xx priority tiers -> 25% Ruby 3.3 (#872797)
 * d65dff5dbbfe0 Revolve Monster Config: Add a comment asking people to let us know if they change revolve monster host set config (#872549)
 * c6ca379580401 Super important p5 performance -> 100% Ruby 3.3 (#872444)
 * 0ea8d27e5724b Add priority tier labels to containers when a service uses host isolation (#871764)
 * 205adf26a2047 A1xx livemode et al -> 100% Ruby 3.3 (#872406)
 * 8ac3f65ce7fab Remove new shards to unlock deploys (#872241)
 * 21e8d0fe032cd Generate unique daily branches when vendoring skycfg (#871241)
 * e9d8fcd758fde [skycfg] Update skycfg libraries to support onebox deploys (#868398)
 * 6ebab71e856c6 Enabling bin-packing in Prod/NW/mspruby (#871649)
 * ef03f43dd76e5 Enable binpacking in test shards (#871525)
 * 9a2d35ed2fcd1 Default value of priority_tier should be 'livemode' (#871389)
 * 165862bb76934 Mark shards RC (#871195)
 * ae2ac7b4a8768 Bump a1xx livemode et al to 50% (#870835)
 * 86d17c45df0a6 25% livemode/p4/p3/p2 for a1xx (important but not super important) services (#870387)
 * 919314a2952ef Dedupe groupNames when building SecurityGroupPolicy CRD (#867914)
 * 14401add0a333 [IAM-CONTROL-PLANE-BRIDGE] Enable Host Isolation (#864791)
 * 69d090abcc1b8 [envoy-config-srv] allow enabling outbound websockets in skyconfig (#870114)
 * f339b05402573 Mark shards GA (#870223)
 * 70208c59c1328 Add new shards to all available shards (#869633)
 * d77404931180f Ruby 3.3 a1xx testmode -> 100%, super important p5 -> 5% (#868472)
 * fb0f57024d39b Address more QA Monster Consumer Delivery Lag Issue (#868291)
 * 874ff197776f5 prod `monster-general` YJIT 100% -> 0% (#868215)
 * eca57d1b7a937 Enable bin-packing only in mspaa across QA, Preprod and Prod (#863052)
 * 8724ea14c2fbd Add map on service-name for SecurityGroupPolicy matchLabels (#867889)
 * 8dd9c1681709b Standardize another batch of baplings to use standardized provisioning (#867415)
 * c2bb1ad629dab testmode/p5 A1xx -> 50% Ruby 3.3 (#867536)
 * 4f1c57eb1e316 Disable bin-packing in prod/cmh,bom shards (#867488)
 * 5fbd580db36d7 Standardize bapling skycfg customization (#865383)
 * 0bad721bf1ef4 Remove preprod workers from consumer-intel, checkout (#867084)
 * cdcd76a5f570e Add availability tier to metrics sidecar (#867088)
 * 4a2880f3ec233 [PPRO-FPI-SRV] Enable Host Isolation (#865099)
 * dd36257996a9b [RUN_OBS-109034] Disable `sox_control` for `production-profiler-delete-old-profiles` (#866117)
 * 963c3dbfb9946 Remove skycfg unvendored to zoolander (#866607)
 * 7825df6c0e226 A1xx batch/batch-critical priority tier -> 100% Ruby 3.3 (#866536)
 * ca13dfdb45fbf [sn-proxyless] add proxyless service registration validations (#865794)
 * e7fb20509bf3a exclude growth i18n crons from SOX (#864975)
 * 65f5a2d36e29c Finalize min counts for new autoscaling (#865596)
 * 1903b325e0e88 A1xx batch/batch-critical -> 50% Ruby 3.3 (#865552)
 * fb0e79035f8dc exclude filesforusers jobs from sox compliance (#864785)
 * e47a10486eb18 Mark new shards RC (#864406)
 * 0d3220cc79956 Remove myself from pay-server priority tier reviewers (#864970)
 * f68609c6f3b65 Add explicit replica counts for preprod (#864892)
 * 870413bd85e3c A1xx synthetics priority tier -> 100% Ruby 3.3 (#864685)
 * 865ee3ae6ccb7 Set up new autoscaling for model-ingestion (#862874)
 * 1c9d1b2b3d916 Set up new autoscaling for monster-sigma in prod (#862899)
 * fd6be108854f7 All A1xx services -> 100% preprod Ruby 3.3 (#864585)
 * 45bc6d5e2f7e1 Bump puma to 6.4.2 for 10% of capital-api-srv hosts (#863116)
 * 9e1a5bbc03efe A200 100% Ruby 3.3 (#862848)
 * 66640e8034354 Add new shards to msp-shards-config (#863575)
 * c4f9f0da418d5 Monster autoscaling migration: set model-ingestion and sigma to max replicas (#862839)
 * 398e373b81b60 Enable top half of A200 50% (#861524)
 * 2d6dd9d40b6fa Set up new autoscaling for sigma in QA (#860576)
 * b3b31af58ee4e Mark shards GA (#861480)
 * c9345eb328c98 YJIT 10% -> 50% in prod `monster-general` (#861064)
 * cf251c08aa82d Save some money in monster-general (#861078)
 * 7f06e1b8edda8 Return mainland by default in get_account_for_shard (#861002)
 * 453ec8a7e43c3 Disable old Sigma autoscaling in QA (#860563)
 * 2deb9623d83f9 Enable Security Group Policies for non-mainland shards by default (#860413)
 * 16bd2b211cb43 [API-6908] Bulk provision many baplings (#860193)
 * 20cf7198056ee RUN_OBS-107834: disable sox for production-profiler-delete-old-profiles (#860210)
 * eea403bc4cebe YJIT 0 -> 10% prod in `monster-general` (#860048)
 * 74d5e0af1ed81 Mark new shards as RC (#859263)
 * c1311be83339d [API-6908] bapling scaffolding for accounts-bapi-srv (#859439)
 * bbd874fc98f7d Update QA nw and cmh link monster replicas (#859087)
 * eee6963aa6f58 YJIT 50% -> 100%  in `monster-general` in `preprod` and `qa` (#859155)
 * a43547d532a52 Add new shards to all available shards (#857826)
 * dc1589ad5771f Address QA Monster Consumer Delivery Lag Issue (#858710)
 * 30d1483d12ca2 Enable YJIT to 50% in `monster-general` in `preprod` and `qa` (#844517)

* Bump shared Skycfg libraries to 27d5aa12e4005340d8d6c7af14c47fb3a890093e (#316)

Updated automatically using `/Users/lanfeng/stripe/pay-server/config/kubernetes/bin/vendor-skycfg`
Upgrading shared Skycfg libraries in chronon.

Old version: f60cb6a0cb84b95aa6f63094df4d92f1c806a7db
New version: 27d5aa12e4005340d8d6c7af14c47fb3a890093e

Changes:
 * 27d5aa12e4005 [skycfg] read shards from config-srv (#887403)
 * cf4881ef83154 Define configs for new `monster-link-preference` service (#886430)
 * 62da4597c1dfc Remove unused flags from pay-server cron (#887880)
 * ed249513afb64 Removed plee from priority tiering reviewers (#887370)
 * 60908160e1190 remove references of consul-agent-init-image from skycfg (#887983)
 * 33a1854d9d436 Rollout yjit to 100% for baplings (#887205)
 * bf560466ad6bd Allow manual credentials-proxy setting overrides for Pandora (#885508)
 * 81b5278a38100 Add `ronaldso` as a reviewer in training for Connections FDP (#884200)
 * 314a14f1eef2e Do not add external scaling control annotation for bg services (#885405)
 * 1b8624cc3ab27 Revert "Label stripe_pod with cost attribution team" (#884891)
 * 2ba4d29107318 Turn yjit on for 50% of baplings (#884787)
 * 11ebd7633dbaa Forbid adding account-cluster SG to deployments (#884366)
 * 27ed8ec4738c8 Set aside memory overhead for g4dn.xlarge ASG (#884265)
 * 6ea6a23dcf77c Turn on yjit for 10% of pods in all baplings (#883821)
 * e810dba271756 Revolve Monster: Fix scheduled scaling cron rules, always scale down on 2nd (#884008)
 * f841423d05f98 Scale up fanout workers and reduce workers per pod for general (#884002)
 * 6e5fd8f656a79 Remove new shards (#883950)
 * 0bf369678928e [FUNDING-1418] Create  Confirmation of Payee Requester Service (#883660)
 * 219aa5c64f66d Revolve Monster: Setup scheduled scaling for CMH and NW (#882785)

* [master] Create and traverse avro schemas once per task (#296)

* Cleanup and isolate the schema perf changes

* add an _

* Add yesterday no dash to S3 Utils (#325)

* Add logging to S3Utils

* Add ability to use yesterday_ds_nodash

* Fix end date - 1

* Fix end date - 1

* Fix spacing

* Add metadata to model transformations (#321)

* Add metadata to model transformations

* Bump version

* Add logic to handle metadata for MT

* Add util scripts

* Add sha

* Remove print stmts

* Move sha to inference spec

* Remove sha and have it be part of the backend params for inference spec

* Fix typo

* [master] Custom spark scan parallelism (#330)

* Allow configuration of spark scan parallelism

* Use 1 task for meta rows

* Fix the flacky test and add log for assertions in JoinTest for future debug of flacky tests (#326)

* add log for assertions in JoinTest to debug flacky tests

* fix the flaky test due to dfTemp not being used

* Experiment

* Update version

* Fixed all tests except for pyspark test

* Revert dev.in

* revert pyspark change

* Revert GBU

* Revert GBU

---------

Co-authored-by: Caio Camatta <[email protected]>
Co-authored-by: Jeffrey Brooks <[email protected]>
Co-authored-by: Yi Zhao <[email protected]>
Co-authored-by: David Han <[email protected]>
Co-authored-by: Hans Nielsen <[email protected]>
Co-authored-by: Lanfeng Sun <[email protected]>
Co-authored-by: Spencer McNamara <[email protected]>
Co-authored-by: Hai Wang <[email protected]>

* Add pyspark converter to rebase (#346)

* Experiment

* Added converter to rebase

* Reset GBU (pulled in a different commit by accident)

* Restore GBU

---------

Co-authored-by: Spencer McNamara <[email protected]>

* Wrap where clauses with parenthesis in CatalystUtil (#354)

* Store evaluation of empty results during write to compute it once (#356)

Store evaluation of empty results to compute it once

* 20240926 master rebase (#360)

* Rollout new failure TTL (#288)

* Guard changed failure TTL behind feature flag

* Debug logs

* Change FF call to match Java API

* TTL cache retries more frequently on failures (#275)

* TTL cache retries more frequently on failures

* Testing

* Merge everything into one class

* Test with current thread executor

* Add yesterday no dash to S3 Utils (#325)

* Add logging to S3Utils

* Add ability to use yesterday_ds_nodash

* Fix end date - 1

* Fix end date - 1

* Fix spacing

* Enable skip validations for bootstrapping new tables that are leaving behind feature groups. (#302)

* Fix bug with bootstrap partition column overrides (#334)

* Merge conflicts

---------

Co-authored-by: Yi Zhao <[email protected]>
Co-authored-by: Cam Weston <[email protected]>
Co-authored-by: David Han <[email protected]>

* 20240927 rebase (#361)

* Audit Fetcher metrics - Add tags and remove redundant or unused metrics (#269)

* tag kv store latency and join latency

* remove unnecessary kv store metrics

* remove more unnecessary kv store metrics

* Tag derivations

* More changes

* Fix compile error

* Change how join metrics are collected in fetchGroupBys

* Use ns

* rename var

* refactor

* Wrap BOUNDED_UNIQUE_COUNT Operation to take arguments (#280)

* Wrap BOUNDED_UNIQUE_COUNT Operation to take arguments

* Bump version

* Update Henson ownership info (#281)

* TTL cache retries more frequently on failures (#275)

* TTL cache retries more frequently on failures

* Testing

* Merge everything into one class

* Test with current thread executor

* Rollout new failure TTL (#288)

* Guard changed failure TTL behind feature flag

* Debug logs

* Change FF call to match Java API

* Add python + thrift changes for model transformations (#303)

* Add python + thrift changes for model transformations

* Fix thrift typo

* Enable skip validations for bootstrapping new tables that are leaving behind feature groups. (#302)

* Bump shared Skycfg libraries to f60cb6a0cb84b95aa6f63094df4d92f1c806a7db (#309)

Updated automatically using `/Users/hans/stripe/pay-server/config/kubernetes/bin/vendor-skycfg`
Upgrading shared Skycfg libraries in chronon.

Old version: 36eb4d2c0117e1b4d5dec4699313702ac37618a6
New version: f60cb6a0cb84b95aa6f63094df4d92f1c806a7db

Changes:
 * f60cb6a0cb84b Increase Ruby 3.3 VWA slot 1 -> 5M slots (#880645)
 * adcf09c400f0b Use next ruby for 100% of MSP services (#882695)
 * 254659a7e15c2 Finish confidant-sidecar cleanup (#881201)
 * 59f30141541b7 Revolve Monster Autoscaling: Set immediate autoscaling and lower minimum replica count to make difference more pronounced (#881862)
 * 1b3cde15f3f51 Scale down monster mobile hostset to reduce costs (#881981)
 * 76715884589e0 Revolve Monster: Add test scheduled scaling for CMH consume replicas (#881658)
 * 946e850e0a7c5 Shift 50% of upper priority m-b-s/a-r-r-s hosts to Ruby 3.3 (#880582)
 * ecbb87f1c4227 Remove confidant-init container (#879728)
 * 57386efb8fd88 Mark shards RC (#880469)
 * 665b93fc1834f Lower half of m-b-s/a-r-r-s -> 50% Ruby 3.3 (#880466)
 * 3bc80c21783fa [skycfg] Add label: henson.stripe.io/onebox-original-max-replicas (#879263)
 * 3c547f367a060 Remove MSP doc references to Confidant (#879654)
 * ec9c10f66b5fe Revert "Remove Confidant sidecar and confidant.sky" (#879603)
 * 78019d1941a4d Remove Confidant sidecar and confidant.sky (#878893)
 * 4ab03020c1bbd Add preferred tier to bom mspbb (#879112)
 * 43c9b9accf508 Remove confidant.sky secret usages (#877099)
 * 7efc01dcf61ae Add preferred availability tier to msp_shards.sky (#876868)
 * 487fd39a2b784 Reduce monster-sweeper instance size to r5_large in QA (#877007)
 * c744483d39d62 Add support for annotating sirn on MSP resources (#875749)
 * caa5b5fa9457a Remove unused flags from pandora.sky (#876834)
 * ea452726ae0c5 monster-general scaling (#876512)
 * 78c4c90e99ff0 Do not vendor kubernetes config in stripe-js-v3. (#876441)
 * 589c014b355aa Increase max for model-ingestion in bom (#876083)
 * d2467afe85fa0 Remove unused confidant-sidecar flags in `pay-server-cron_test.sky` (#875741)
 * 1cfa94d4214f5 Update responsible_team from async-processing to core-events (#875955)
 * e4fa41914cd65 Mark prod/east/mspaa GA (#875663)
 * b3416e4e243d3 Add workers label to bapi_service pods (#874057)
 * 66d971c6e8542 Enable autoscaling for monster workers general/qa/bom (#873836)
 * 1ad9ac1743cff Set up autoscaling for model-ingestion workers in BOM (#874515)
 * c0b238871fdb8 All A1xx (except super important) -> 100% Ruby 3.3 (#874039)
 * 8ef5d1371d5ff Mark preprod/cmh/mspcc RC (#873243)
 * fd894c3722ba1 Bump a1xx final slice to 50% Ruby 3.3 (#873366)
 * fe83130a3f639 Fix monster shard migrations in QA (#872949)
 * 9517a02fe6c12 Move final set of a1xx priority tiers -> 25% Ruby 3.3 (#872797)
 * d65dff5dbbfe0 Revolve Monster Config: Add a comment asking people to let us know if they change revolve monster host set config (#872549)
 * c6ca379580401 Super important p5 performance -> 100% Ruby 3.3 (#872444)
 * 0ea8d27e5724b Add priority tier labels to containers when a service uses host isolation (#871764)
 * 205adf26a2047 A1xx livemode et al -> 100% Ruby 3.3 (#872406)
 * 8ac3f65ce7fab Remove new shards to unlock deploys (#872241)
 * 21e8d0fe032cd Generate unique daily branches when vendoring skycfg (#871241)
 * e9d8fcd758fde [skycfg] Update skycfg libraries to support onebox deploys (#868398)
 * 6ebab71e856c6 Enabling bin-packing in Prod/NW/mspruby (#871649)
 * ef03f43dd76e5 Enable binpacking in test shards (#871525)
 * 9a2d35ed2fcd1 Default value of priority_tier should be 'livemode' (#871389)
 * 165862bb76934 Mark shards RC (#871195)
 * ae2ac7b4a8768 Bump a1xx livemode et al to 50% (#870835)
 * 86d17c45df0a6 25% livemode/p4/p3/p2 for a1xx (important but not super important) services (#870387)
 * 919314a2952ef Dedupe groupNames when building SecurityGroupPolicy CRD (#867914)
 * 14401add0a333 [IAM-CONTROL-PLANE-BRIDGE] Enable Host Isolation (#864791)
 * 69d090abcc1b8 [envoy-config-srv] allow enabling outbound websockets in skyconfig (#870114)
 * f339b05402573 Mark shards GA (#870223)
 * 70208c59c1328 Add new shards to all available shards (#869633)
 * d77404931180f Ruby 3.3 a1xx testmode -> 100%, super important p5 -> 5% (#868472)
 * fb0f57024d39b Address more QA Monster Consumer Delivery Lag Issue (#868291)
 * 874ff197776f5 prod `monster-general` YJIT 100% -> 0% (#868215)
 * eca57d1b7a937 Enable bin-packing only in mspaa across QA, Preprod and Prod (#863052)
 * 8724ea14c2fbd Add map on service-name for SecurityGroupPolicy matchLabels (#867889)
 * 8dd9c1681709b Standardize another batch of baplings to use standardized provisioning (#867415)
 * c2bb1ad629dab testmode/p5 A1xx -> 50% Ruby 3.3 (#867536)
 * 4f1c57eb1e316 Disable bin-packing in prod/cmh,bom shards (#867488)
 * 5fbd580db36d7 Standardize bapling skycfg customization (#865383)
 * 0bad721bf1ef4 Remove preprod workers from consumer-intel, checkout (#867084)
 * cdcd76a5f570e Add availability tier to metrics sidecar (#867088)
 * 4a2880f3ec233 [PPRO-FPI-SRV] Enable Host Isolation (#865099)
 * dd36257996a9b [RUN_OBS-109034] Disable `sox_control` for `production-profiler-delete-old-profiles` (#866117)
 * 963c3dbfb9946 Remove skycfg unvendored to zoolander (#866607)
 * 7825df6c0e226 A1xx batch/batch-critical priority tier -> 100% Ruby 3.3 (#866536)
 * ca13dfdb45fbf [sn-proxyless] add proxyless service registration validations (#865794)
 * e7fb20509bf3a exclude growth i18n crons from SOX (#864975)
 * 65f5a2d36e29c Finalize min counts for new autoscaling (#865596)
 * 1903b325e0e88 A1xx batch/batch-critical -> 50% Ruby 3.3 (#865552)
 * fb0e79035f8dc exclude filesforusers jobs from sox compliance (#864785)
 * e47a10486eb18 Mark new shards RC (#864406)
 * 0d3220cc79956 Remove myself from pay-server priority tier reviewers (#864970)
 * f68609c6f3b65 Add explicit replica counts for preprod (#864892)
 * 870413bd85e3c A1xx synthetics priority tier -> 100% Ruby 3.3 (#864685)
 * 865ee3ae6ccb7 Set up new autoscaling for model-ingestion (#862874)
 * 1c9d1b2b3d916 Set up new autoscaling for monster-sigma in prod (#862899)
 * fd6be108854f7 All A1xx services -> 100% preprod Ruby 3.3 (#864585)
 * 45bc6d5e2f7e1 Bump puma to 6.4.2 for 10% of capital-api-srv hosts (#863116)
 * 9e1a5bbc03efe A200 100% Ruby 3.3 (#862848)
 * 66640e8034354 Add new shards to msp-shards-config (#863575)
 * c4f9f0da418d5 Monster autoscaling migration: set model-ingestion and sigma to max replicas (#862839)
 * 398e373b81b60 Enable top half of A200 50% (#861524)
 * 2d6dd9d40b6fa Set up new autoscaling for sigma in QA (#860576)
 * b3b31af58ee4e Mark shards GA (#861480)
 * c9345eb328c98 YJIT 10% -> 50% in prod `monster-general` (#861064)
 * cf251c08aa82d Save some money in monster-general (#861078)
 * 7f06e1b8edda8 Return mainland by default in get_account_for_shard (#861002)
 * 453ec8a7e43c3 Disable old Sigma autoscaling in QA (#860563)
 * 2deb9623d83f9 Enable Security Group Policies for non-mainland shards by default (#860413)
 * 16bd2b211cb43 [API-6908] Bulk provision many baplings (#860193)
 * 20cf7198056ee RUN_OBS-107834: disable sox for production-profiler-delete-old-profiles (#860210)
 * eea403bc4cebe YJIT 0 -> 10% prod in `monster-general` (#860048)
 * 74d5e0af1ed81 Mark new shards as RC (#859263)
 * c1311be83339d [API-6908] bapling scaffolding for accounts-bapi-srv (#859439)
 * bbd874fc98f7d Update QA nw and cmh link monster replicas (#859087)
 * eee6963aa6f58 YJIT 50% -> 100%  in `monster-general` in `preprod` and `qa` (#859155)
 * a43547d532a52 Add new shards to all available shards (#857826)
 * dc1589ad5771f Address QA Monster Consumer Delivery Lag Issue (#858710)
 * 30d1483d12ca2 Enable YJIT to 50% in `monster-general` in `preprod` and `qa` (#844517)

* Bump shared Skycfg libraries to 27d5aa12e4005340d8d6c7af14c47fb3a890093e (#316)

Updated automatically using `/Users/lanfeng/stripe/pay-server/config/kubernetes/bin/vendor-skycfg`
Upgrading shared Skycfg libraries in chronon.

Old version: f60cb6a0cb84b95aa6f63094df4d92f1c806a7db
New version: 27d5aa12e4005340d8d6c7af14c47fb3a890093e

Changes:
 * 27d5aa12e4005 [skycfg] read shards from config-srv (#887403)
 * cf4881ef83154 Define configs for new `monster-link-preference` service (#886430)
 * 62da4597c1dfc Remove unused flags from pay-server cron (#887880)
 * ed249513afb64 Removed plee from priority tiering reviewers (#887370)
 * 60908160e1190 remove references of consul-agent-init-image from skycfg (#887983)
 * 33a1854d9d436 Rollout yjit to 100% for baplings (#887205)
 * bf560466ad6bd Allow manual credentials-proxy setting overrides for Pandora (#885508)
 * 81b5278a38100 Add `ronaldso` as a reviewer in training for Connections FDP (#884200)
 * 314a14f1eef2e Do not add external scaling control annotation for bg services (#885405)
 * 1b8624cc3ab27 Revert "Label stripe_pod with cost attribution team" (#884891)
 * 2ba4d29107318 Turn yjit on for 50% of baplings (#884787)
 * 11ebd7633dbaa Forbid adding account-cluster SG to deployments (#884366)
 * 27ed8ec4738c8 Set aside memory overhead for g4dn.xlarge ASG (#884265)
 * 6ea6a23dcf77c Turn on yjit for 10% of pods in all baplings (#883821)
 * e810dba271756 Revolve Monster: Fix scheduled scaling cron rules, always scale down on 2nd (#884008)
 * f841423d05f98 Scale up fanout workers and reduce workers per pod for general (#884002)
 * 6e5fd8f656a79 Remove new shards (#883950)
 * 0bf369678928e [FUNDING-1418] Create  Confirmation of Payee Requester Service (#883660)
 * 219aa5c64f66d Revolve Monster: Setup scheduled scaling for CMH and NW (#882785)

* [master] Create and traverse avro schemas once per task (#296)

* Cleanup and isolate the schema perf changes

* add an _

* Add yesterday no dash to S3 Utils (#325)

* Add logging to S3Utils

* Add ability to use yesterday_ds_nodash

* Fix end date - 1

* Fix end date - 1

* Fix spacing

* Add metadata to model transformations (#321)

* Add metadata to model transformations

* Bump version

* Add logic to handle metadata for MT

* Add util scripts

* Add sha

* Remove print stmts

* Move sha to inference spec

* Remove sha and have it be part of the backend params for inference spec

* Fix typo

* [master] Custom spark scan parallelism (#330)

* Allow configuration of spark scan parallelism

* Use 1 task for meta rows

* Fix the flacky test and add log for assertions in JoinTest for future debug of flacky tests (#326)

* add log for assertions in JoinTest to debug flacky tests

* fix the flaky test due to dfTemp not being used

* Fix bug with bootstrap partition column overrides (#334)

* avoid calling count() for GBU if not nessesary (#315)

* avoid calling count()

address comments

use Long instead of Int.Fix build error

fix build

fix build. use Long

fix build. Use different table name in tests

* add log line

* sync changes from rebase branch into master branch

* fix tests using unique table manes

* enhance log

* add log

* Update date logic (#335)

* Add converter (#341)

* Automatically add 'day' + '_internal_time_column' to pass through fields + ensure model_sha is set in backend params (#347)

* Camweston/change mt transformation deps (#350)

* Stop setting deps in join creation

* Stop setting deps in join creation

* Add metadata (#351)

* Wrap where clauses with parenthesis in CatalystUtil (#353)

* Bump shared Skycfg libraries to db55b3d4e48f4ef94155134857d9af496d24998f (#355)

Updated automatically using `/Users/emanolios/stripe/pay-server/config/kubernetes/bin/vendor-skycfg`
Upgrading shared Skycfg libraries in chronon.

Old version: 27d5aa12e4005340d8d6c7af14c47fb3a890093e
New version: db55b3d4e48f4ef94155134857d9af496d24998f

Changes:
 * db55b3d4e48f4 Add hadoop-capacity-agent crons to SOX exempt list (#904682)
 * 4a390c758dde6 Scale up webhooks instances for 9/24 BFCM Load Test (#905379)
 * 8cf3d491c1621 Created a mapping from instance types to their override instance type, and stripe_pod overrides the instance type if the override flag is present (#901737)
 * 6d8d27fb50325 Plumb through the params for hack_kick jitter, and set it to 15m for experiments (#904248)
 * 297b1026e6dfc Onboard monster-capital to autoscaling in CMH, fix autoscaling config for monster-billing (#904054)
 * 31b676a9afacd [Workflow Engine] Roll out Einhorn v10.2.0.stripe to all MSP-based QA services (#903320)
 * 2ecf621ed3c0f Bump critical crons -> Ruby 3.3 (#901042)
 * 655818c530b20 [sn-proxyless] fix bug in proxyless registration filter match (#903662)
 * 4c0fc7a635eed Scale down CMH `payouts` fanout Monster consumers  (#902361)
 * 4fb544440b641 Tune high_priority_webhook_consumer memory usage on BOM (#903408)
 * 4f17ce1298e75 Add autsocaling to billing workers + Scale down CMH (#903336)
 * fe27925f114dc BFCM Post load test scale down (#903272)
 * feecd7d970c7c Scale down CMH `connect` Monster consumers (#901750)
 * 8f9ce509717f3 add new instance in skycfg (#901314)
 * ba920392f3434 Scale down CMH `cards_authz` Monster consumers (#902341)
 * 378ab423c8c2b Scale up webhooks instances for 9/18 BFCM Load Test (#901867)
 * 1929b5f34bdd0 Migrate monster-test to Ubuntu Noble container images (#901934)
 * a16d262fd1772 Revolve Monster: Setup QA NW scheduled scaling config for testing (#901999)
 * 1a6e225652098 Fix cron enabled shard list (#900210)
 * 69949768383f6 Scale down CMH `review-tools` Monster consumers  (#901818)
 * 29e1bb1188d0a Scale down CMH `connections` Monster consumers (#901709)
 * abb6e90628f30 Scale down CMH `local-payment-methods` Monster consumers (#901196)
 * 903351bd2c364 Scale down CMH `payouts` Monster consumers  (#901249)
 * b04155ee58b89 Scale down CMH `verifications` Monster consumers (#901136)
 * df70ba679ae6b Scale down CMH Messaging Monster consumers (#901028)
 * 545f23c96d519 Scale link-preference monster host set (#900763)
 * d533db2643d13 Add global annotation plugin to skycfg templates to support golden config (#890935)
 * 573d4412a413b Scale down CMH `webhooks-low-cpu-canary` Monster host set (#900971)
 * 244b666726e0f Scale down CMH Issuing Monster host set (#900824)
 * 717d3b4f4836a Bump mb per worker from 2.2 to 2.6GB (#900803)
 * 66bd00f4a50db Scale down `search-desc-connect` Monster host set (#900755)
 * 944a4f9dab54f Scale down CMH `merchant-fraud-ml` Monster host set (#900674)
 * 82fb783753780 [Workflow Engine] Roll out Einhorn v10.2.0.stripe to A400 MSP-based QA services (#898805)
 * b65e28c361490 Update monster-general autoscale policies (#900067)
 * 0bb3dd3bf2358 ORCH_RUNTIME-107: Add shared_msp flag to the Shard Configuration (#900328)
 * e9cbed262393a Scale monster general workers to 35% of NW (#899700)
 * 20fd90d59be43 low&standard priority crons -> Ruby 3.3 (#898792)
 * e0ac1ce4e4106 [RUN_DATADISC-360] remove datahub from the list of vendored repos. (#895630)
 * 572774324d853 Skycfg plugin to manage de-regionalize role annotation (#890899)
 * 993b2cdf77a02 [skycfg] Skip creating a scaled object for isolated deploys (#898806)
 * df9cfb224acbd ORCH_RUNTIME-46: Add shard_msp flag to the all_deployable_shards_info function for returns shared MSP shards (#898550)
 * 01bb44402b6ff ORCH_RUNTIME-46: Do not return decommissioning MSP shards to Henson from skycfg (#898484)
 * 13063e8510466 Revert "ORCH_RUNTIME-57: Scale deployment to zero for decommisioning … (#898480)
 * 9227140d77364 Add worker cycle options to issuing-auth-grpc-srv config (#897876)
 * 017595d7eca8a QA&preprod cron -> Ruby 3.3 (take 2) (#893559)
 * 1eafdca9ed57c Update team slug f…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants