From 1c32daf1787e9e42fe741916490d2bd3727c35d1 Mon Sep 17 00:00:00 2001 From: Igor Beliakov <46579601+weisdd@users.noreply.github.com> Date: Mon, 4 Apr 2022 19:39:07 +0200 Subject: [PATCH] Added support for assumed roles (#14) - Key changes: - Added support for autoconfiguration through Assumed roles (disabled by default, can be enabled through `ASSUMED_ROLES: true`): - In environments, where OIDC-role names match names of namespaces, ACLs can be constructed on the fly (e.g. `["role1", "role2"]` will give access to metrics from namespaces `role1` and `role2`; `kube.*` - to namespaces starting with `kube.*`, `.*` - to all metrics). The roles specified in `acl.yaml` are still considered and get merged with assumed roles; - Thanks to [@aberestyak](https://github.com/aberestyak/) for the idea; - Logs: - Log OIDC roles when debug is enabled. The field will contain all roles present in the token, not only those that are considered during ACL generation process. --- .golangci.yml | 6 ++- CHANGELOG.md | 9 ++++ README.md | 20 +++++---- cmd/lfgw/acl.go | 47 ++++++++++++-------- cmd/lfgw/acl_test.go | 94 +++++++++++++++++++++++++++++++++++++--- cmd/lfgw/main.go | 9 ++++ cmd/lfgw/middleware.go | 2 + docs/filtering.md | 7 +++ docs/similar-projects.md | 2 + 9 files changed, 163 insertions(+), 33 deletions(-) create mode 100644 docs/filtering.md diff --git a/.golangci.yml b/.golangci.yml index b2b3a9d..84bf4b5 100644 --- a/.golangci.yml +++ b/.golangci.yml @@ -1,3 +1,7 @@ +# TODO: remove this section once all linters support go 1.18 +run: + go: "1.17" + linters: enable: - gocritic @@ -13,7 +17,5 @@ linters: # - unparam linters-settings: - gosimple: - go: "1.18" gocyclo: min-complexity: 16 diff --git a/CHANGELOG.md b/CHANGELOG.md index 33458f2..9691886 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,14 @@ # CHANGELOG +## 0.10.0 + +- Key changes: + - Added support for autoconfiguration through Assumed roles (disabled by default, can be enabled through `ASSUMED_ROLES: true`): + - In environments, where OIDC-role names match names of namespaces, ACLs can be constructed on the fly (e.g. `["role1", "role2"]` will give access to metrics from namespaces `role1` and `role2`; `kube.*` - to namespaces starting with `kube.*`, `.*` - to all metrics). The roles specified in `acl.yaml` are still considered and get merged with assumed roles; + - Thanks to [@aberestyak](https://github.com/aberestyak/) for the idea; + - Logs: + - Log OIDC roles when debug is enabled. The field will contain all roles present in the token, not only those that are considered during ACL generation process. + ## 0.9.0 - Key changes: diff --git a/README.md b/README.md index f1a485c..63f61c4 100644 --- a/README.md +++ b/README.md @@ -2,20 +2,20 @@ LFGW is a trivial reverse proxy based on `httputil` and `VictoriaMetrics/metricsql` with a purpose of dynamically rewriting requests to Prometheus-like backends. -More specifically, it manipulates label filters in metric expressions to reduce the scope of metrics exposed to an end user based on user's OIDC-roles. +More specifically, it manipulates label filters in metric expressions to reduce the scope of metrics exposed to an end user based on user's OIDC-roles. The process is described in more details [here](docs/filtering.md). + +Target setup: `grafana -> lfgw -> Prometheus/VictoriaMetrics`. ## Key features -* a user is not restricted to just one LFGW-role, you are free to specify a few; -* non-opinionated proxying of requests with valid jwt tokens - there's Prometheus to decide whether request is valid or not; * ACL-based request rewrites with implicit deny; -* supports Victoria Metrics' PromQL extensions; -* since it's based on `VictoriaMetrics/metricsql` library, which has way simpler interface than `prometheus`, there is no need to write a separate implementation for every type of MetricExpr on the planet; +* a user can have multiple roles; +* support for autoconfiguration in environments, where OIDC-role names match names of namespaces ("assumed roles" mode; thanks to [@aberestyak](https://github.com/aberestyak/) for the idea); * [automatic expression optimizations](https://pkg.go.dev/github.com/VictoriaMetrics/metricsql#Optimize) for non-full access requests; -* it's based on the middleware pattern, so it's easy to implement other, non-OIDC, modes should the need be; * support for different headers with access tokens (`X-Forwarded-Access-Token`, `X-Auth-Request-Access-Token`, `Authorization`); +* requests to both `/api/*` and `/federate` endpoints are protected (=rewritten); * requests to sensitive endpoints are blocked by default; -* requests to both `/api/*` and `/federate` endpoints are protected (=rewritten). +* compatible with both [PromQL](https://prometheus.io/docs/prometheus/latest/querying/basics/) and [MetricsQL](https://github.com/VictoriaMetrics/VictoriaMetrics/wiki/MetricsQL). ## Similar projects @@ -27,13 +27,17 @@ Docker images are published on [ghcr.io/weisdd/lfgw](https://github.com/weisdd/l ## Configuration -OIDC roles are expected to be present in `roles` within a jwt token. +### Requirements for jwt-tokens + +* OIDC-roles must be present in `roles` claim; +* Client ID specified via `OIDC_CLIENT_ID` must be present in `aud` claim (more details in [environment variables section](#Environment variables)), otherwise token verification will fail. ### Environment variables | Module | Variable | Default Value | Description | | -------------------- | --------------------------- | ------------- | ------------------------------------------------------------ | | **General settings** | | | | +| | `ASSUMED_ROLES` | `false` | In environments, where OIDC-role names match names of namespaces, ACLs can be constructed on the fly (e.g. `["role1", "role2"]` will give access to metrics from namespaces `role1` and `role2`). The roles specified in `acl.yaml` are still considered and get merged with assumed roles. Role names may contain regular expressions, including the admin definition `.*`. | | | `ENABLE_DEDUPLICATION` | `true` | Whether to enable deduplication, which leaves some of the requests unmodified if they match the target policy. Examples can be found in the "acl.yaml syntax" section. | | | `OPTIMIZE_EXPRESSIONS` | `true` | Whether to automatically optimize expressions for non-full access requests. [More details](https://pkg.go.dev/github.com/VictoriaMetrics/metricsql#Optimize) | | | | | | diff --git a/cmd/lfgw/acl.go b/cmd/lfgw/acl.go index 2bf7460..84186ca 100644 --- a/cmd/lfgw/acl.go +++ b/cmd/lfgw/acl.go @@ -154,13 +154,20 @@ func (app *application) rolesToRawACL(roles []string) (string, error) { for _, role := range roles { acl, exists := app.ACLMap[role] - if !exists { - return "", fmt.Errorf("Some of the roles are unknown") - } - if acl.RawACL == "" { - return "", fmt.Errorf("%s role contains empty rawACL", role) + if exists { + // NOTE: You should never see an empty definitions in .RawACL as those should be removed by toSlice further down the process. The error check below is not necessary, is left as an additional safeguard for now and might get removed in the future. + if acl.RawACL == "" { + return "", fmt.Errorf("%s role contains empty rawACL", role) + } + rawACLs = append(rawACLs, acl.RawACL) + } else { + // NOTE: Strictly speaking, it's getACL who is expected to pass a filtered list of roles in case Assumed roles are disabled. The error check below is not necessary, is left as an additional safeguard for now and might get removed in the future. + if !app.AssumedRoles { + return "", fmt.Errorf("Some of the roles are unknown and assumed roles are not enabled") + } + // NOTE: Role names are not linted, so they may contain regular expressions, including the admin definition: .* + rawACLs = append(rawACLs, role) } - rawACLs = append(rawACLs, acl.RawACL) } rawACL := strings.Join(rawACLs, ", ") @@ -171,31 +178,37 @@ func (app *application) rolesToRawACL(roles []string) (string, error) { return rawACL, nil } -// getACL takes a list of roles found in an OIDC claim, removes those that the app is not aware of (not present in app.ACLMap) and then constructs an ACL based on the remaining ones. +// getACL takes a list of roles found in an OIDC claim and constructs and ACL based on them. If assumed roles are disabled, then only known roles (present in app.ACLMap) are considered. func (app *application) getACL(oidcRoles []string) (ACL, error) { - var roles []string + roles := []string{} + assumedRoles := []string{} - // Filter out unknown OIDC roles for _, role := range oidcRoles { _, exists := app.ACLMap[role] if exists { + if app.ACLMap[role].Fullaccess { + return *app.ACLMap[role], nil + } roles = append(roles, role) + } else { + assumedRoles = append(assumedRoles, role) } } + + if app.AssumedRoles { + roles = append(roles, assumedRoles...) + } + if len(roles) == 0 { return ACL{}, fmt.Errorf("no matching roles found") } - // One role, so we can return existing ACL + // We can return a prebuilt ACL if there's only one role and it's known if len(roles) == 1 { role := roles[0] - return *app.ACLMap[role], nil - } - - // If a user has a fullaccess role, there's no need to check any other one. - for _, role := range roles { - if app.ACLMap[role].Fullaccess { - return *app.ACLMap[role], nil + acl, exists := app.ACLMap[role] + if exists { + return *acl, nil } } diff --git a/cmd/lfgw/acl_test.go b/cmd/lfgw/acl_test.go index 329d04b..592e0e9 100644 --- a/cmd/lfgw/acl_test.go +++ b/cmd/lfgw/acl_test.go @@ -383,18 +383,18 @@ func TestApplication_rolesToRawACL(t *testing.T) { t.Run("1 known role", func(t *testing.T) { roles := []string{"multiple-values"} - got, err := app.rolesToRawACL(roles) - want := "ku.*, min.*" + + got, err := app.rolesToRawACL(roles) assert.Nil(t, err) assert.Equal(t, want, got) }) t.Run("multiple known roles", func(t *testing.T) { roles := []string{"multiple-values", "single-value"} - got, err := app.rolesToRawACL(roles) - want := "ku.*, min.*, default" + + got, err := app.rolesToRawACL(roles) assert.Nil(t, err) assert.Equal(t, want, got) }) @@ -410,6 +410,28 @@ func TestApplication_rolesToRawACL(t *testing.T) { _, err := app.rolesToRawACL(roles) assert.NotNil(t, err) }) + + // Assumed roles enabled + app.AssumedRoles = true + + t.Run("0 known roles, 1 is unknown (assumed roles enabled)", func(t *testing.T) { + roles := []string{"unknown-role"} + want := "unknown-role" + + got, err := app.rolesToRawACL(roles) + assert.Nil(t, err) + assert.Equal(t, want, got) + }) + + t.Run("multiple roles, 1 is unknown (assumed roles enabled)", func(t *testing.T) { + roles := []string{"multiple-values", "single-value", "unknown-role"} + want := "ku.*, min.*, default, unknown-role" + + got, err := app.rolesToRawACL(roles) + assert.Nil(t, err) + assert.Equal(t, want, got) + }) + } func TestApplication_GetACL(t *testing.T) { @@ -476,7 +498,7 @@ func TestApplication_GetACL(t *testing.T) { assert.Equal(t, want, got) }) - t.Run("multiple roles, no full access", func(t *testing.T) { + t.Run("multiple roles, 1 is unknown, no full access", func(t *testing.T) { roles := []string{"single-value", "multiple-values", "unknown-role"} knownRoles := []string{"single-value", "multiple-values"} @@ -485,13 +507,73 @@ func TestApplication_GetACL(t *testing.T) { want := ACL{ Fullaccess: false, - RawACL: rawACL, LabelFilter: metricsql.LabelFilter{ Label: "namespace", Value: "default|ku.*|min.*", IsRegexp: true, IsNegative: false, }, + RawACL: rawACL, + } + + got, err := app.getACL(roles) + assert.Nil(t, err) + assert.Equal(t, want, got) + }) + + // Assumed roles enabled + app.AssumedRoles = true + + t.Run("0 known roles, 1 is unknown (assumed roles enabled)", func(t *testing.T) { + roles := []string{"unknown-role"} + + want := ACL{ + Fullaccess: false, + LabelFilter: metricsql.LabelFilter{ + Label: "namespace", + Value: "unknown-role", + IsRegexp: false, + IsNegative: false, + }, + RawACL: "unknown-role", + } + + got, err := app.getACL(roles) + assert.Nil(t, err) + assert.Equal(t, want, got) + }) + + t.Run("multiple roles, 1 is unknown (assumed roles enabled)", func(t *testing.T) { + roles := []string{"multiple-values", "single-value", "unknown-role"} + + want := ACL{ + Fullaccess: false, + LabelFilter: metricsql.LabelFilter{ + Label: "namespace", + Value: "ku.*|min.*|default|unknown-role", + IsRegexp: true, + IsNegative: false, + }, + RawACL: "ku.*, min.*, default, unknown-role", + } + + got, err := app.getACL(roles) + assert.Nil(t, err) + assert.Equal(t, want, got) + }) + + t.Run("multiple roles, 1 is unknown, 1 gives full access (assumed roles enabled)", func(t *testing.T) { + roles := []string{"multiple-values", "admin", "unknown-role"} + + want := ACL{ + Fullaccess: true, + LabelFilter: metricsql.LabelFilter{ + Label: "namespace", + Value: ".*", + IsRegexp: true, + IsNegative: false, + }, + RawACL: ".*", } got, err := app.getACL(roles) diff --git a/cmd/lfgw/main.go b/cmd/lfgw/main.go index 55a4f56..f5c718c 100644 --- a/cmd/lfgw/main.go +++ b/cmd/lfgw/main.go @@ -32,6 +32,7 @@ type application struct { SafeMode bool `env:"SAFE_MODE" envDefault:"true"` SetProxyHeaders bool `env:"SET_PROXY_HEADERS" envDefefault:"false"` ACLPath string `env:"ACL_PATH" envDefault:"./acl.yaml"` + AssumedRoles bool `env:"ASSUMED_ROLES" envDefault:"false"` OIDCRealmURL string `env:"OIDC_REALM_URL,required"` OIDCClientID string `env:"OIDC_CLIENT_ID,required"` Port int `env:"PORT" envDefault:"8080"` @@ -74,6 +75,14 @@ func main() { zerolog.SetGlobalLevel(zerolog.DebugLevel) } + if app.AssumedRoles { + app.logger.Info().Caller(). + Msg("Assumed roles mode is on") + } else { + app.logger.Info().Caller(). + Msg("Assumed roles mode is off") + } + app.ACLMap, err = app.loadACL() if err != nil { app.logger.Fatal().Caller(). diff --git a/cmd/lfgw/middleware.go b/cmd/lfgw/middleware.go index d87ccdd..01303a0 100644 --- a/cmd/lfgw/middleware.go +++ b/cmd/lfgw/middleware.go @@ -140,6 +140,8 @@ func (app *application) oidcModeMiddleware(next http.Handler) http.Handler { } app.enrichLogContext(r, "email", claims.Email) + // NOTE: The field will contain all roles present in the token, not only those that are considered during ACL generation process + app.enrichDebugLogContext(r, "roles", strings.Join(claims.Roles, ", ")) acl, err := app.getACL(claims.Roles) if err != nil { diff --git a/docs/filtering.md b/docs/filtering.md new file mode 100644 index 0000000..43d3090 --- /dev/null +++ b/docs/filtering.md @@ -0,0 +1,7 @@ +# How filtering works + +By default, Grafana works with data sources in a so called [server mode](https://grafana.com/docs/grafana/latest/datasources/prometheus/#prometheus-settings). It means that a user doesn't have a direct access to a data source, thus all requests are proxied through Grafana. When this mode is combined with [OAuth authentication](https://grafana.com/docs/grafana/latest/auth/generic-oauth/) and [Forward Oauth Identity](https://grafana.com/docs/grafana/latest/developers/plugins/add-authentication-for-data-source-plugins/#forward-oauth-identity-for-the-logged-in-user) option is enabled, requests sent by Grafana to a Prometheus-like backend contain user's access token. + +In an identity provider such as Keycloak, we can add custom client roles and pass them in, say, `roles` claim (claim name could be different, but lfgw does not currently allow any other name). That's where lfgw comes into play. By tying roles to a list of namespaces (either full names or regexps), we can tell lfgw which metric expressions have to be modified (to reduce the scope) and which are allowed to be passed as is. + +When a metric expression is extracted from GET-parameters or a POST-form that Grafana sends, lfgw manipulates `namespace` label in each selector according to an acl. Once it's done, the updated request is forwarded to the Prometheus-like backend. Examples of ACL can be found in [README.md](../README.md#aclyaml-syntax). diff --git a/docs/similar-projects.md b/docs/similar-projects.md index 0151106..866e88a 100644 --- a/docs/similar-projects.md +++ b/docs/similar-projects.md @@ -9,6 +9,7 @@ Link: [Prometheus filter proxy](https://github.com/hoffie/prometheus-filter-prox Minuses: - based on Prometheus library, so might not support some of Victoria Metrics' extensions; +- does not support autoconfiguration; - it's an overly simple implementation that relies only on URI paths to define the scope of available metrics, so a user might potentially get access to any metrics should the URL become exposed; - it's based on http client, thus unlikely to be ready for high volumes of requests; - does not allow to filter out requests to sensitive endpoints (like `/admin/tsdb`); @@ -31,6 +32,7 @@ Pluses: Minuses: - a user cannot have multiple roles (`When you have multiple roles, the first one that is mentioned in prometheus-acls will be used.`); +- does not support autoconfiguration; - based on Prometheus library, so might not support some of Victoria Metrics' extensions; - does not allow to filter out requests to sensitive endpoints (like `/admin/tsdb`); - does not rewrite requests to `/federate` endpoint (at least, at the time of writing);