Releases: cisco-open/flame
v0.4.0
What's Changed
- misc: bump fiab image version to v0.3.0 by @myungjin in #452
- feat: FedGFT implementation by @GustavBaumgart in #453
- misc: update flame bibtex by @myungjin in #456
- feat: SCAFFOLD implementation by @GustavBaumgart in #454
- Added dummy UI files and changed Dockerfile by @raresgaia123 in #457
- Added UI yaml files by @raresgaia123 in #458
- Added http support for apiserver. by @raresgaia123 in #460
- Added React app debug logs. by @raresgaia123 in #459
- feat: handle cors by @myungjin in #461
- misc: update addlicense file for react app by @myungjin in #464
- feat: tasklet start time by @GustavBaumgart in #465
- Replaced dummy react app with dashboard react. by @raresgaia123 in #463
- fix: clear metrics and tasklet alias by @GustavBaumgart in #466
- docs: remove MacOS from fiab system documentation by @GustavBaumgart in #467
- Updated UI files, configmap MLFlow api URL. by @raresgaia123 in #468
- Updated documentation for Dashboard. by @raresgaia123 in #469
New Contributors
- @raresgaia123 made their first contribution in #457
Full Changelog: v0.3.0...v0.4.0
v0.3.0
What's Changed
- docs: restructuring, mac OS quickstart addition, and fix by @jaemin-shin in #445
- misc: hiearchical mnist for pytorch by @myungjin in #446
- fix: flamectl job update by @myungjin in #447
- misc: update hier mnist example by @myungjin in #448
- fix: recv_fifo for asyncfl by @myungjin in #449
- feat+refactor: simplify schema and code api by @myungjin in #450
- doc: control plane api document updated by @myungjin in #451
Full Changelog: v0.2.4...v0.3.0
v0.2.4
What's Changed
- misc: update image version in fiab by @myungjin in #432
- refactor: weight_decay moved to config.hyperparameters section by @GustavBaumgart in #433
- fix: trainer hang in coordinated syncfl by @myungjin in #435
- fix: incorrect schema for coordinated syncfl example by @myungjin in #436
- feat: allow misc field in config file by @GustavBaumgart in #434
- fix: async FL algorithm to follow FedBuff in aggregation by @jaemin-shin in #438
- fix: pydantic version specification by @GustavBaumgart in #439
- fix: hybrid FL aggregation process without member check by @jaemin-shin in #440
- misc: refresh rest api document by @myungjin in #442
- docs: quickstart and FL basics by @GustavBaumgart in #443
- feature: task rest api revision by @myungjin in #444
Full Changelog: v0.2.3...v0.2.4
v0.2.3
Key Changes
- New approaches: feddyn, oort, FedBalancer, synchronous hierarchical FL with coordinator, coordinated asyncfl, and differential privacy
- Metrics collection support: collect cpu/gpu utilization, ram and vram usage, bytes sent and received and the time to execute functions
- Local registry support: model and metrics are saved in a local directory, which allows experiments without mlflow
- misc improvements:
- support for channel leave functionality in the p2p backend
- support for simplifying tasklet composition primitives
- protobuf package version upgrade to be compatible with tensorflow v2.12.0
- deployer: fix bug to update pod status incorrectly
- fix a bug that removes model updates from a disk cache when the total cache data is larger than its size
- fix openapi specification mismatch between a released image and the source
What's Changed
- example/implementation for Oort by @jaemin-shin in #369
- feddyn implementation pytorch by @GustavBaumgart in #370
- update for Oort: overcommitment support by @jaemin-shin in #372
- fix: remove non-orchestration mode by @openwithcode in #374
- update for Oort: accurate calculation of round duration with timestamp by @jaemin-shin in #375
- feat: replica support for roles by @openwithcode in #376
- fix: remove errors regarding group association to SDK config files by @jaemin-shin in #378
- doc: update supported algorithms/mechanisms by @myungjin in #377
- fix: missing delta weights computation in middle aggregator by @myungjin in #382
- fix: incorrect groupAssociation in lib example configs by @myungjin in #383
- refactor: move syncfl files by @myungjin in #384
- Synchronous orchestrator architecture by @elqurio in #379
- feat: per-channel backend support by @myungjin in #381
- fix: hardening coordinated syncfl by @myungjin in #385
- refactor: auto-formatting sdk files with black by @myungjin in #386
- misc: example for synchronous hierachical FL with coordinator by @myungjin in #388
- feat+refactor+fix: per-channel backend support in SDK by @myungjin in #387
- fix: zero weight upload from trainer in hybrid mode by @jaemin-shin in #389
- feat: optimize composer's extensibility by @myungjin in #390
- Remove unnecessary config examples by @elqurio in #391
- example/implementation for FedBalancer, with a new sampler category by @jaemin-shin in #380
- refactor: remove legacy code by @myungjin in #393
- feat: coordination revision by @myungjin in #394
- updated feddyn implementation pytorch by @GustavBaumgart in #392
- chore: syncfl_hier_coord_mnist schema update by @myungjin in #395
- feat: channel leave functionality for p2p backend by @myungjin in #398
- feat: additional optimization for channel leave by @myungjin in #399
- Impove tasklet composition extensibility by @elqurio in #397
- Move examples to the root of the lib/python dir by @elqurio in #400
- feddyn algorithm correction by @GustavBaumgart in #401
- fix: slow tx task termination by @myungjin in #402
- Fix alias already exists for coord_syncfl by @elqurio in #406
- Monitor deployed tasks and update status in case of crashes by @openwithcode in #404
- new feature: add differential privacy by @jaemin-shin in #403
- feat: coordinated asyncfl by @myungjin in #407
- fix: Resolve the get tasks command by job ID to return the expected tasks details of a job by @openwithcode in #412
- fix: refactor examples tests by @openwithcode in #409
- bump: upgrade version of go to 1.18 by @openwithcode in #411
- feat: track tasklet runtime by @GustavBaumgart in #405
- feat: communication cost metric collection by @GustavBaumgart in #414
- documentation: fix typos in README.md and docs by @jaemin-shin in #416
- doc: add citation for flame arxiv paper by @myungjin in #417
- refactor: optimize create job task of control plane by @jaemin-shin in #415
- fix: disable automatic eviction of diskcache at aggregation by @jaemin-shin in #419
- feat: utilization/memory usage on CPU and GPU by @GustavBaumgart in #420
- fix: GPU/CPU monitoring termination by @GustavBaumgart in #424
- fix: protobuf version error by @GustavBaumgart in #425
- feat: local registry by @GustavBaumgart in #421
- fix: fix oort utility calculation and add medmnist example for oort by @jaemin-shin in #423
- refactor: monitoring scope adjustment in deployer by @myungjin in #427
- feat: local registry timestamp by @GustavBaumgart in #426
- misc: add debug message in deployer by @myungjin in #428
- fix: incorrect task status update by deployer by @myungjin in #429
- fix: check payload before accumulating bytes received by @GustavBaumgart in #430
- fix: deployer pod state update by @myungjin in #431
New Contributors
- @jaemin-shin made their first contribution in #369
Full Changelog: v0.2.2...v0.2.3
v0.2.2
Key Changes
Control Plane & SDK: Use the previous defined groupAssociation role information to handle all the tasks creation. This change was useful in order to make to make it easier to identify the connections between roles and to correctly define the used topology
Control Plane: split the job.json file from examples to dataSpec.json and modelSpec.json to make it easier to understand which fields are used and for what purpose
Helpers: Created a minikube script to start the cluster for macOS
Fiab: enforce the local cluster to use the current released flame version
What's Changed
- Fix linting issue by @openwithcode in #367
- Create a run minikube script by @alexandruuBytex in #363
- misc: config update on async mnist example by @myungjin in #364
- add download file function by @GustavBaumgart in #365
- Handle role group associations into control plane and sdk by @openwithcode in #366
- build: use the flame version for which the code was released by @openwithcode in #371
Full Changelog: v0.2.1...v0.2.2
v0.2.1
Key Changes
- Control Plane: add initial support for group association for roles in order to identify the connections thru different channels
- Flame SDK: Configuration handling has been refactored
- Utils: Created a diagnose script to allow an easier debugging
- Flame SDK: Implemented pre-commit configuration for the project
- Control Plane: refactoring to align the code with generated one from OpenAPI
What's Changed
- fix: out of order message delivery by @myungjin in #354
- gpu and cpu compatibility pytorch by @GustavBaumgart in #350
- Development by @openwithcode in #361
Full Changelog: v0.2.0...v0.2.1
v0.2.0
Key Changes
- Asynchronous FL (SDK, control plane): Asynchronous FL algorithm/mechanism is implemented. Examples for Async FL are added in the library (lib/python/flame/examples/{async_hier_mnist, async_mnist}) and in the control plane example folder (flame/examples/asyncfl_hier_mnist).
- FedProx (SDK): FedProx algorithm is implemented in the python library.
- Add tensorflow support for FedYogi, FedAdam, and FedAdaGrad
- Building of flame base image changed from CPU version to cude 11.3 version
- Fiab diagnosis script (control plane): A basic script to obtain the log information on fiab environment’s status is implemented.
- Bugfix on local DNS setup in linux (fiab): Under different distributions, local DNS setup fails. A bugfix for this is added. Test is done under ubuntu and archlinux.
- Configurable Deployer (control plane): deployer can take a different job template file via configuration. All the command-line arguments are replaced with a configuration file.
- Documentation (doc): update documentation to setup fiab cluster locally and flame SDK guide.
What's Changed
- misc: base image update by @myungjin in #302
- misc: fix flame slack workspace link error by @myungjin in #303
- doc: fiab guide revision by @myungjin in #304
- doc: refactor fiab instructions by @myungjin in #307
- documentation for running python locally relocated by @GustavBaumgart in #310
- medmnist example update by @GustavBaumgart in #312
- misc: add license header by @myungjin in #314
- misc: refactor restapi error message handling by @myungjin in #313
- example and documentation for medmnist keras/pytorch by @GustavBaumgart in #320
- feat+fix: grpc support for hierarchical fl by @myungjin in #321
- documenation for metaserver by @GustavBaumgart in #322
- feat: asynchronous fl by @myungjin in #323
- fix+refactor: asyncfl loss divergence by @myungjin in #330
- fix: conflict bewtween integer tensor and float tensor by @myungjin in #335
- refactor: config for hybrid example in library by @myungjin in #334
- misc: asynchronous hierarchical fl example by @myungjin in #340
- chore: clean up examples folder by @myungjin in #336
- fix: workaround for hybrid mode with two p2p backends by @myungjin in #345
- fix: distributed mode by @myungjin in #344
- example/implementation for fedprox by @GustavBaumgart in #339
- Create diagnose script by @alexandruuBytex in #348
- refactor+fix: configurable deployer / lib regularizer fix by @myungjin in #351
New Contributors
- @GustavBaumgart made their first contribution in #310
Full Changelog: v0.1.7...v0.2.0
v0.1.7
Key Changes
- Installation steps for ubuntu, amazon linux 2 and mac OS are automated via a script.
- Flame-in-a-Box (FIAB) env (A small scale dev env) is improved to use a latest official flame image (from docker hub) without need for building a flame image locally. This feature is mainly for users who wish to try out flame quickly.
- mlflow's version upgrade caused incompatibility issue on keras module, which made metrics report to mlflow throw an error. This issue was fixed.
- job template file is updated to prefer node with GPU resource. If gpu resource is not available in a cluster, a job is scheduled on non-gpu nodes.
What's Changed
- build(deps): bump docker/metadata-action from 3.3.0 to 4.1.1 by @dependabot in #269
- build(deps): bump github.com/cenkalti/backoff/v4 from 4.1.2 to 4.1.3 by @dependabot in #99
- build(deps): bump golangci/golangci-lint-action from 3.3.0 to 3.3.1 by @dependabot in #266
- build(deps): bump github.com/cbroglie/mustache from 1.3.1 to 1.4.0 by @dependabot in #177
- build(deps): bump google.golang.org/protobuf from 1.27.1 to 1.28.1 by @dependabot in #199
- build(deps): bump go.uber.org/zap from 1.21.0 to 1.23.0 by @dependabot in #219
- build(deps): bump protobuf from 3.19.4 to 3.19.5 in /lib/python by @dependabot in #236
- chore: comparison of openapi directories by @myungjin in #273
- fix: mlflow flavor "keras" not found by @myungjin in #274
- Add --local-img flag to the shell script to use dev tag for local testing by @Divlik in #275
- Installation steps for ec2 instance by @RaviKhandavilli in #280
- chore: a skeleton script to configure minikube by @myungjin in #284
- updated fiab documentation by @RaviKhandavilli in #285
- chore: automate minikube installation in amzn2 by @myungjin in #287
- ubuntu fiab prerequisites installer by @RaviKhandavilli in #288
- fix: installation script for amzn2 by @myungjin in #290
- chore: add badge for flame slack channel by @myungjin in #291
- updated ubuntu fiab documentation by @RaviKhandavilli in #289
- mac fiab prerequisites automation by @RaviKhandavilli in #294
- updated fiab installation documents by @RaviKhandavilli in #295
- updated deployer script to use public image by @RaviKhandavilli in #296
- misc: bump up mongodb chart version in fiab by @myungjin in #300
- conf: update for gpu in job template by @myungjin in #301
Full Changelog: v0.1.6...v0.1.7
v0.1.6
What's Changed
- grpc max message length update by @myungjin in #227
- nginx proxy-body-size update in fiab by @myungjin in #228
- build(deps): bump actions/setup-go from 3.2.1 to 3.3.0 by @dependabot in #218
- build(deps): bump codecov/codecov-action from 3.1.0 to 3.1.1 by @dependabot in #231
- pyenv installation steps by @RaviKhandavilli in #234
- event driven approach by @myungjin in #237
- delete code api endpoint by @RaviKhandavilli in #240
- Update setup guideline for Ubuntu 20.04 by @myungjin in #241
- Chore: Update support info by @myungjin in #246
- delete design code by @RaviKhandavilli in #245
- Add delete schema endpoint by @Divlik in #247
- Deletedesign by @RaviKhandavilli in #251
- fix: typo and logic error by @GaoxiangLuo in #255
- Chore: update coding style in CONTRIBUTING.md by @myungjin in #256
- delete design by @RaviKhandavilli in #254
- build(deps): bump golangci/golangci-lint-action from 3.2.0 to 3.3.0 by @dependabot in #257
- build(deps): bump actions/setup-go from 3.3.0 to 3.3.1 by @dependabot in #252
- build(deps): bump step-security/harden-runner from 1.4.5 to 1.5.0 by @dependabot in #239
- build(deps): bump actions/checkout from 3.0.2 to 3.1.0 by @dependabot in #242
- Add mongo document deletion for delete schema by @Divlik in #253
- feat: 'continue' primitive in tasklet by @myungjin in #261
- deployment status by @RaviKhandavilli in #262
- CI/CD add github publish action on release to push docker image by @Divlik in #268
New Contributors
- @RaviKhandavilli made their first contribution in #234
- @Divlik made their first contribution in #247
Full Changelog: v0.1.5...v0.1.6
v0.1.5
What's Changed
- Multi-cluster support: Added skeleton code for computes endpoints by @dhruvsgarg in #159
- misc: minor update of api spec by @myungjin in #160
- feature: fetch latest weights in distributed learning by @GaoxiangLuo in #148
- optimize: reduce distributed learning code complexity by @GaoxiangLuo in #161
- feature: minio support in fiab by @myungjin in #163
- typo: use correct func tags for distributed learning by @GaoxiangLuo in #166
- fix: message mixed up in distributed learning mode by @myungjin in #167
- build(deps): bump step-security/harden-runner from 1.4.3 to 1.4.4 by @dependabot in #164
- feature: hybrid mode with ring allreduce by @myungjin in #170
- misc: add missing license by @myungjin in #173
- Making docker driver default. by @ritvik-verma in #171
- Initial implementation for deployer registration by @dhruvsgarg in #179
- build(deps): bump actions/setup-go from 3.2.0 to 3.2.1 by @dependabot in #176
- Establishing deployer notifier connection by @dhruvsgarg in #183
- fix: deadlock in hybrid fl mode by @myungjin in #172
- Misprint by @ritvik-verma in #180
- harden flame.sh stop process by @myungjin in #184
- docs: run python library solely by @GaoxiangLuo in #187
- Mongodb impl for deployer registration by @dhruvsgarg in #181
- Redefine proto for job notification by @dhruvsgarg in #188
- Adding proto messages for deployer events by @dhruvsgarg in #189
- build(deps): bump github.com/stretchr/testify from 1.7.1 to 1.8.0 by @dependabot in #162
- Minor update to job notification in proto by @dhruvsgarg in #192
- Initial end-to-end pipeline for deployer by @dhruvsgarg in #193
- Adding few restAPIs for deployer by @dhruvsgarg in #194
- Implement basic pipeline for deployer by @dhruvsgarg in #197
- Adding apikey to deployer apis by @dhruvsgarg in #196
- Populating deployment config for deployer by @dhruvsgarg in #198
- Added checks for dataset registration by @dhruvsgarg in #200
- Minor formatting updates by @dhruvsgarg in #204
- Added deployer for default compute by @dhruvsgarg in #203
- Automated deployment notification to computes by @dhruvsgarg in #202
- Duplicating controllers deployment code into deployer by @dhruvsgarg in #205
- Added servAcc and roleBinding for deployer by @dhruvsgarg in #207
- Enabled deployment of resources by computes by @dhruvsgarg in #208
- Separated flame-control and flame-deployer by @dhruvsgarg in #210
- Removed deprecated io-ioutil API calls by @dhruvsgarg in #211
- Updated spec for deploymentConfig by @dhruvsgarg in #212
- Initial error handling for deployer by @dhruvsgarg in #213
- Bug fix for deployment of tasks by @dhruvsgarg in #216
- build(deps): bump step-security/harden-runner from 1.4.4 to 1.4.5 by @dependabot in #214
- build(deps): bump actions/checkout from 629c2de402a417ea7690ca6ce3f33229e27606a5 to 3.0.2 by @dependabot in #215
- support for gRPC-based p2p backend (EXPERIMENTAL) by @myungjin in #222
- p2p backend: heart beat by @myungjin in #223
Full Changelog: v0.1.4...v0.1.5