Skip to content

Commit

Permalink
Added support for assumed roles (#14)
Browse files Browse the repository at this point in the history
- Key changes:
  - Added support for autoconfiguration through Assumed roles (disabled by default, can be enabled through `ASSUMED_ROLES: true`):
    - In environments, where OIDC-role names match names of namespaces, ACLs can be constructed on the fly (e.g. `["role1", "role2"]` will give access to metrics from namespaces `role1` and `role2`; `kube.*` - to namespaces starting with `kube.*`, `.*` - to all metrics). The roles specified in `acl.yaml` are still considered and get merged with assumed roles;
    - Thanks to [@aberestyak](https://github.com/aberestyak/) for the idea;
  - Logs:
    - Log OIDC roles when debug is enabled. The field will contain all roles present in the token, not only those that are considered during ACL generation process.
  • Loading branch information
weisdd authored Apr 4, 2022
1 parent 057a0b3 commit 1c32daf
Show file tree
Hide file tree
Showing 9 changed files with 163 additions and 33 deletions.
6 changes: 4 additions & 2 deletions .golangci.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
# TODO: remove this section once all linters support go 1.18
run:
go: "1.17"

linters:
enable:
- gocritic
Expand All @@ -13,7 +17,5 @@ linters:
# - unparam

linters-settings:
gosimple:
go: "1.18"
gocyclo:
min-complexity: 16
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
# CHANGELOG

## 0.10.0

- Key changes:
- Added support for autoconfiguration through Assumed roles (disabled by default, can be enabled through `ASSUMED_ROLES: true`):
- In environments, where OIDC-role names match names of namespaces, ACLs can be constructed on the fly (e.g. `["role1", "role2"]` will give access to metrics from namespaces `role1` and `role2`; `kube.*` - to namespaces starting with `kube.*`, `.*` - to all metrics). The roles specified in `acl.yaml` are still considered and get merged with assumed roles;
- Thanks to [@aberestyak](https://github.com/aberestyak/) for the idea;
- Logs:
- Log OIDC roles when debug is enabled. The field will contain all roles present in the token, not only those that are considered during ACL generation process.

## 0.9.0

- Key changes:
Expand Down
20 changes: 12 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,20 @@

LFGW is a trivial reverse proxy based on `httputil` and `VictoriaMetrics/metricsql` with a purpose of dynamically rewriting requests to Prometheus-like backends.

More specifically, it manipulates label filters in metric expressions to reduce the scope of metrics exposed to an end user based on user's OIDC-roles.
More specifically, it manipulates label filters in metric expressions to reduce the scope of metrics exposed to an end user based on user's OIDC-roles. The process is described in more details [here](docs/filtering.md).

Target setup: `grafana -> lfgw -> Prometheus/VictoriaMetrics`.

## Key features

* a user is not restricted to just one LFGW-role, you are free to specify a few;
* non-opinionated proxying of requests with valid jwt tokens - there's Prometheus to decide whether request is valid or not;
* ACL-based request rewrites with implicit deny;
* supports Victoria Metrics' PromQL extensions;
* since it's based on `VictoriaMetrics/metricsql` library, which has way simpler interface than `prometheus`, there is no need to write a separate implementation for every type of MetricExpr on the planet;
* a user can have multiple roles;
* support for autoconfiguration in environments, where OIDC-role names match names of namespaces ("assumed roles" mode; thanks to [@aberestyak](https://github.com/aberestyak/) for the idea);
* [automatic expression optimizations](https://pkg.go.dev/github.com/VictoriaMetrics/metricsql#Optimize) for non-full access requests;
* it's based on the middleware pattern, so it's easy to implement other, non-OIDC, modes should the need be;
* support for different headers with access tokens (`X-Forwarded-Access-Token`, `X-Auth-Request-Access-Token`, `Authorization`);
* requests to both `/api/*` and `/federate` endpoints are protected (=rewritten);
* requests to sensitive endpoints are blocked by default;
* requests to both `/api/*` and `/federate` endpoints are protected (=rewritten).
* compatible with both [PromQL](https://prometheus.io/docs/prometheus/latest/querying/basics/) and [MetricsQL](https://github.com/VictoriaMetrics/VictoriaMetrics/wiki/MetricsQL).

## Similar projects

Expand All @@ -27,13 +27,17 @@ Docker images are published on [ghcr.io/weisdd/lfgw](https://github.com/weisdd/l

## Configuration

OIDC roles are expected to be present in `roles` within a jwt token.
### Requirements for jwt-tokens

* OIDC-roles must be present in `roles` claim;
* Client ID specified via `OIDC_CLIENT_ID` must be present in `aud` claim (more details in [environment variables section](#Environment variables)), otherwise token verification will fail.

### Environment variables

| Module | Variable | Default Value | Description |
| -------------------- | --------------------------- | ------------- | ------------------------------------------------------------ |
| **General settings** | | | |
| | `ASSUMED_ROLES` | `false` | In environments, where OIDC-role names match names of namespaces, ACLs can be constructed on the fly (e.g. `["role1", "role2"]` will give access to metrics from namespaces `role1` and `role2`). The roles specified in `acl.yaml` are still considered and get merged with assumed roles. Role names may contain regular expressions, including the admin definition `.*`. |
| | `ENABLE_DEDUPLICATION` | `true` | Whether to enable deduplication, which leaves some of the requests unmodified if they match the target policy. Examples can be found in the "acl.yaml syntax" section. |
| | `OPTIMIZE_EXPRESSIONS` | `true` | Whether to automatically optimize expressions for non-full access requests. [More details](https://pkg.go.dev/github.com/VictoriaMetrics/metricsql#Optimize) |
| | | | |
Expand Down
47 changes: 30 additions & 17 deletions cmd/lfgw/acl.go
Original file line number Diff line number Diff line change
Expand Up @@ -154,13 +154,20 @@ func (app *application) rolesToRawACL(roles []string) (string, error) {

for _, role := range roles {
acl, exists := app.ACLMap[role]
if !exists {
return "", fmt.Errorf("Some of the roles are unknown")
}
if acl.RawACL == "" {
return "", fmt.Errorf("%s role contains empty rawACL", role)
if exists {
// NOTE: You should never see an empty definitions in .RawACL as those should be removed by toSlice further down the process. The error check below is not necessary, is left as an additional safeguard for now and might get removed in the future.
if acl.RawACL == "" {
return "", fmt.Errorf("%s role contains empty rawACL", role)
}
rawACLs = append(rawACLs, acl.RawACL)
} else {
// NOTE: Strictly speaking, it's getACL who is expected to pass a filtered list of roles in case Assumed roles are disabled. The error check below is not necessary, is left as an additional safeguard for now and might get removed in the future.
if !app.AssumedRoles {
return "", fmt.Errorf("Some of the roles are unknown and assumed roles are not enabled")
}
// NOTE: Role names are not linted, so they may contain regular expressions, including the admin definition: .*
rawACLs = append(rawACLs, role)
}
rawACLs = append(rawACLs, acl.RawACL)
}

rawACL := strings.Join(rawACLs, ", ")
Expand All @@ -171,31 +178,37 @@ func (app *application) rolesToRawACL(roles []string) (string, error) {
return rawACL, nil
}

// getACL takes a list of roles found in an OIDC claim, removes those that the app is not aware of (not present in app.ACLMap) and then constructs an ACL based on the remaining ones.
// getACL takes a list of roles found in an OIDC claim and constructs and ACL based on them. If assumed roles are disabled, then only known roles (present in app.ACLMap) are considered.
func (app *application) getACL(oidcRoles []string) (ACL, error) {
var roles []string
roles := []string{}
assumedRoles := []string{}

// Filter out unknown OIDC roles
for _, role := range oidcRoles {
_, exists := app.ACLMap[role]
if exists {
if app.ACLMap[role].Fullaccess {
return *app.ACLMap[role], nil
}
roles = append(roles, role)
} else {
assumedRoles = append(assumedRoles, role)
}
}

if app.AssumedRoles {
roles = append(roles, assumedRoles...)
}

if len(roles) == 0 {
return ACL{}, fmt.Errorf("no matching roles found")
}

// One role, so we can return existing ACL
// We can return a prebuilt ACL if there's only one role and it's known
if len(roles) == 1 {
role := roles[0]
return *app.ACLMap[role], nil
}

// If a user has a fullaccess role, there's no need to check any other one.
for _, role := range roles {
if app.ACLMap[role].Fullaccess {
return *app.ACLMap[role], nil
acl, exists := app.ACLMap[role]
if exists {
return *acl, nil
}
}

Expand Down
94 changes: 88 additions & 6 deletions cmd/lfgw/acl_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -383,18 +383,18 @@ func TestApplication_rolesToRawACL(t *testing.T) {

t.Run("1 known role", func(t *testing.T) {
roles := []string{"multiple-values"}
got, err := app.rolesToRawACL(roles)

want := "ku.*, min.*"

got, err := app.rolesToRawACL(roles)
assert.Nil(t, err)
assert.Equal(t, want, got)
})

t.Run("multiple known roles", func(t *testing.T) {
roles := []string{"multiple-values", "single-value"}
got, err := app.rolesToRawACL(roles)

want := "ku.*, min.*, default"

got, err := app.rolesToRawACL(roles)
assert.Nil(t, err)
assert.Equal(t, want, got)
})
Expand All @@ -410,6 +410,28 @@ func TestApplication_rolesToRawACL(t *testing.T) {
_, err := app.rolesToRawACL(roles)
assert.NotNil(t, err)
})

// Assumed roles enabled
app.AssumedRoles = true

t.Run("0 known roles, 1 is unknown (assumed roles enabled)", func(t *testing.T) {
roles := []string{"unknown-role"}
want := "unknown-role"

got, err := app.rolesToRawACL(roles)
assert.Nil(t, err)
assert.Equal(t, want, got)
})

t.Run("multiple roles, 1 is unknown (assumed roles enabled)", func(t *testing.T) {
roles := []string{"multiple-values", "single-value", "unknown-role"}
want := "ku.*, min.*, default, unknown-role"

got, err := app.rolesToRawACL(roles)
assert.Nil(t, err)
assert.Equal(t, want, got)
})

}

func TestApplication_GetACL(t *testing.T) {
Expand Down Expand Up @@ -476,7 +498,7 @@ func TestApplication_GetACL(t *testing.T) {
assert.Equal(t, want, got)
})

t.Run("multiple roles, no full access", func(t *testing.T) {
t.Run("multiple roles, 1 is unknown, no full access", func(t *testing.T) {
roles := []string{"single-value", "multiple-values", "unknown-role"}
knownRoles := []string{"single-value", "multiple-values"}

Expand All @@ -485,13 +507,73 @@ func TestApplication_GetACL(t *testing.T) {

want := ACL{
Fullaccess: false,
RawACL: rawACL,
LabelFilter: metricsql.LabelFilter{
Label: "namespace",
Value: "default|ku.*|min.*",
IsRegexp: true,
IsNegative: false,
},
RawACL: rawACL,
}

got, err := app.getACL(roles)
assert.Nil(t, err)
assert.Equal(t, want, got)
})

// Assumed roles enabled
app.AssumedRoles = true

t.Run("0 known roles, 1 is unknown (assumed roles enabled)", func(t *testing.T) {
roles := []string{"unknown-role"}

want := ACL{
Fullaccess: false,
LabelFilter: metricsql.LabelFilter{
Label: "namespace",
Value: "unknown-role",
IsRegexp: false,
IsNegative: false,
},
RawACL: "unknown-role",
}

got, err := app.getACL(roles)
assert.Nil(t, err)
assert.Equal(t, want, got)
})

t.Run("multiple roles, 1 is unknown (assumed roles enabled)", func(t *testing.T) {
roles := []string{"multiple-values", "single-value", "unknown-role"}

want := ACL{
Fullaccess: false,
LabelFilter: metricsql.LabelFilter{
Label: "namespace",
Value: "ku.*|min.*|default|unknown-role",
IsRegexp: true,
IsNegative: false,
},
RawACL: "ku.*, min.*, default, unknown-role",
}

got, err := app.getACL(roles)
assert.Nil(t, err)
assert.Equal(t, want, got)
})

t.Run("multiple roles, 1 is unknown, 1 gives full access (assumed roles enabled)", func(t *testing.T) {
roles := []string{"multiple-values", "admin", "unknown-role"}

want := ACL{
Fullaccess: true,
LabelFilter: metricsql.LabelFilter{
Label: "namespace",
Value: ".*",
IsRegexp: true,
IsNegative: false,
},
RawACL: ".*",
}

got, err := app.getACL(roles)
Expand Down
9 changes: 9 additions & 0 deletions cmd/lfgw/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ type application struct {
SafeMode bool `env:"SAFE_MODE" envDefault:"true"`
SetProxyHeaders bool `env:"SET_PROXY_HEADERS" envDefefault:"false"`
ACLPath string `env:"ACL_PATH" envDefault:"./acl.yaml"`
AssumedRoles bool `env:"ASSUMED_ROLES" envDefault:"false"`
OIDCRealmURL string `env:"OIDC_REALM_URL,required"`
OIDCClientID string `env:"OIDC_CLIENT_ID,required"`
Port int `env:"PORT" envDefault:"8080"`
Expand Down Expand Up @@ -74,6 +75,14 @@ func main() {
zerolog.SetGlobalLevel(zerolog.DebugLevel)
}

if app.AssumedRoles {
app.logger.Info().Caller().
Msg("Assumed roles mode is on")
} else {
app.logger.Info().Caller().
Msg("Assumed roles mode is off")
}

app.ACLMap, err = app.loadACL()
if err != nil {
app.logger.Fatal().Caller().
Expand Down
2 changes: 2 additions & 0 deletions cmd/lfgw/middleware.go
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,8 @@ func (app *application) oidcModeMiddleware(next http.Handler) http.Handler {
}

app.enrichLogContext(r, "email", claims.Email)
// NOTE: The field will contain all roles present in the token, not only those that are considered during ACL generation process
app.enrichDebugLogContext(r, "roles", strings.Join(claims.Roles, ", "))

acl, err := app.getACL(claims.Roles)
if err != nil {
Expand Down
7 changes: 7 additions & 0 deletions docs/filtering.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# How filtering works

By default, Grafana works with data sources in a so called [server mode](https://grafana.com/docs/grafana/latest/datasources/prometheus/#prometheus-settings). It means that a user doesn't have a direct access to a data source, thus all requests are proxied through Grafana. When this mode is combined with [OAuth authentication](https://grafana.com/docs/grafana/latest/auth/generic-oauth/) and [Forward Oauth Identity](https://grafana.com/docs/grafana/latest/developers/plugins/add-authentication-for-data-source-plugins/#forward-oauth-identity-for-the-logged-in-user) option is enabled, requests sent by Grafana to a Prometheus-like backend contain user's access token.

In an identity provider such as Keycloak, we can add custom client roles and pass them in, say, `roles` claim (claim name could be different, but lfgw does not currently allow any other name). That's where lfgw comes into play. By tying roles to a list of namespaces (either full names or regexps), we can tell lfgw which metric expressions have to be modified (to reduce the scope) and which are allowed to be passed as is.

When a metric expression is extracted from GET-parameters or a POST-form that Grafana sends, lfgw manipulates `namespace` label in each selector according to an acl. Once it's done, the updated request is forwarded to the Prometheus-like backend. Examples of ACL can be found in [README.md](../README.md#aclyaml-syntax).
2 changes: 2 additions & 0 deletions docs/similar-projects.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ Link: [Prometheus filter proxy](https://github.com/hoffie/prometheus-filter-prox
Minuses:

- based on Prometheus library, so might not support some of Victoria Metrics' extensions;
- does not support autoconfiguration;
- it's an overly simple implementation that relies only on URI paths to define the scope of available metrics, so a user might potentially get access to any metrics should the URL become exposed;
- it's based on http client, thus unlikely to be ready for high volumes of requests;
- does not allow to filter out requests to sensitive endpoints (like `/admin/tsdb`);
Expand All @@ -31,6 +32,7 @@ Pluses:
Minuses:

- a user cannot have multiple roles (`When you have multiple roles, the first one that is mentioned in prometheus-acls will be used.`);
- does not support autoconfiguration;
- based on Prometheus library, so might not support some of Victoria Metrics' extensions;
- does not allow to filter out requests to sensitive endpoints (like `/admin/tsdb`);
- does not rewrite requests to `/federate` endpoint (at least, at the time of writing);
Expand Down

0 comments on commit 1c32daf

Please sign in to comment.