Skip to content

Governance Architecture -- Three-Dimension Access Model

AKKO implements enterprise-grade data governance with three orthogonal dimensions: Platform Role, Business Team, and Project. A single Keycloak JWT token carries all governance information, enforced consistently across every service in the platform.

+------------------------------------------------------------------+
|                       Keycloak JWT Token                         |
|                                                                  |
|  preferred_username: "carol"                                     |
|  groups: ["akko-analyst", "equipe-marketing", "projet-scoring"]  |
|                                                                  |
|  Dimension 1: Platform Role   --> What TOOLS you can use         |
|  Dimension 2: Business Team   --> What TEAM resources you share  |
|  Dimension 3: Project         --> What DATA you can access       |
+------------------------------------------------------------------+
         |                 |                    |
         v                 v                    v
   +-----------+    +-------------+    +----------------+
   |   Trino   |    |    object storage    |    |   OPA Policy   |
   |  Superset |    |  Shared/    |    |  Row filters   |
   |  Airflow  |    |  Team dirs  |    |  Col masking   |
   |  LiteLLM  |    |             |    |  Schema access |
   +-----------+    +-------------+    +----------------+

Why three dimensions?

Traditional platforms conflate roles and data access into a single hierarchy. AKKO separates them: a marketing analyst and a fraud analyst may have the same platform role (both are akko-analyst) but see completely different data (one works on projet-scoring, the other on projet-risk). Meanwhile, they share team storage with their respective departments. This model scales from 5 users to 5,000 without restructuring.


The Three Dimensions

Dimension 1: Platform Role

Platform roles control what tools and capabilities a user can access across all AKKO services.

Role Persona Tools Resource Tier AI Model Access
akko-admin Platform administrator All services, admin consoles, DDL/DML Unlimited All models
akko-engineer Data engineer Spark, Airflow, Trino DDL, MLflow, notebooks High (16GB, 8 CPU) All models
akko-analyst Senior data analyst Trino SELECT, Superset, notebooks, dashboards Medium (8GB, 4 CPU) Standard models
akko-user Compliance analyst Trino SELECT (masked), OpenMetadata, audit logs Medium (8GB, 4 CPU) Standard models
akko-viewer Executive / dashboard viewer Superset dashboards, filtered views Low (4GB, 2 CPU) Chat models only

Platform roles are Keycloak realm roles, emitted in the JWT groups claim. They are orthogonal to data access -- an akko-analyst in the fraud team sees different data than an akko-analyst in marketing.

Dimension 2: Business Team

Business teams represent organizational structure. They control shared resources, not data access.

Team Example Members Shared Resources
equipe-marketing Carol, Frank shared/equipe-marketing/ in object storage
equipe-data-eng Bob, Grace shared/equipe-data-eng/ in object storage
equipe-fraude Eve, Hector shared/equipe-fraude/ in object storage

Teams are directory service groups synced to Keycloak. They appear in the JWT groups claim alongside platform roles.

Teams are NOT access control

A team membership alone grants no data access. It provides shared team storage in object storage and a logical grouping in OpenMetadata. Data access is controlled exclusively by project membership (Dimension 3).

Dimension 3: Project

Projects control data access. They determine which Trino schemas, Iceberg tables, object storage prefixes, and OpenMetadata domains are accessible to a user.

Project Data Scope Members (cross-team)
projet-scoring iceberg.scoring.*, object storage projects/scoring/ Carol (marketing), Bob (data-eng)
projet-risk iceberg.risk.*, object storage projects/risk/ Eve (fraude), Grace (data-eng)

Projects are cross-team: members from different business teams collaborate on the same data. Each project has:

  • An directory service group (e.g., projet-scoring)
  • A Keycloak service account for batch pipelines
  • A object storage prefix (projects/{name}/)
  • Trino schema access rules in OPA
  • An OpenMetadata domain
  • Column masking and row filter policies

Storage Model -- object storage JWT Variables

AKKO uses runtime JWT policy evaluation in object storage -- no per-user bucket creation, no static IAM policies.

Design Principles

Principle Implementation
No per-user bucket creation Industry best practice -- buckets are structural, not identity-scoped
Runtime JWT evaluation object storage evaluates ${jwt:preferred_username} and ${jwt:groups} at request time
Lazy user prefix creation Per-user directories created on first upload, not at provisioning time
Zero provisioning on user creation Adding a user to Keycloak + directory service group is sufficient

Bucket Layout

minio/
├── analytics/              # Iceberg warehouse (managed by Polaris)
│   ├── scoring/            # projet-scoring tables
│   ├── risk/               # projet-risk tables
│   └── shared/             # Cross-project reference data
├── production/             # Service accounts only (no interactive access)
│   ├── airflow/            # DAG artifacts, logs
│   └── mlflow/             # Model artifacts
├── projects/
│   ├── scoring/            # projet-scoring workspace
│   │   ├── raw/            # Raw data uploads
│   │   ├── processed/      # Intermediate results
│   │   └── exports/        # Deliverables
│   └── risk/               # projet-risk workspace
├── shared/
│   ├── equipe-marketing/   # Team-shared files
│   ├── equipe-data-eng/    # Team-shared files
│   └── equipe-fraude/      # Team-shared files
└── users/
    ├── carol/              # Personal workspace (created on first upload)
    ├── bob/                # Personal workspace
    └── eve/                # Personal workspace

JWT Policy Variables

object storage policies reference JWT claims directly:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"],
      "Resource": [
        "arn:aws:s3:::users/${jwt:preferred_username}/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:PutObject"],
      "Resource": [
        "arn:aws:s3:::shared/${jwt:groups}/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:PutObject"],
      "Resource": [
        "arn:aws:s3:::projects/${jwt:groups}/*"
      ]
    }
  ]
}

No static user management in object storage

When Carol joins projet-scoring, an admin adds her to the directory service group. At her next login, her JWT contains projet-scoring in the groups claim. object storage evaluates this at request time -- zero object storage configuration required.


Service Accounts

Interactive vs Automated Access

Aspect Interactive (Human) Automated (Pipeline)
Identity Personal JWT (Keycloak login) Service account (client_credentials)
Authentication Browser OIDC flow client_id + client_secret
Token source Keycloak user session K8s Secret mounted in pod
Resource quota Per-role tier (4-16 GB) Batch tier (64 GB, 20 executors)
Lifecycle Tied to human employment Tied to project lifecycle
Audit trail preferred_username in logs svc-projet-scoring in logs

Service Account Architecture

Project: projet-scoring
    |
    +-- Keycloak Client: svc-projet-scoring
    |   ├── Grant type: client_credentials
    |   ├── Groups: ["projet-scoring", "svc-accounts"]
    |   └── Scope: project data only
    |
    +-- K8s Secret: svc-projet-scoring-credentials
    |   ├── client_id: svc-projet-scoring
    |   └── client_secret: <generated>
    |
    +-- Airflow Connection: projet_scoring_trino
    |   └── Uses svc-projet-scoring token
    |
    +-- Spark Application:
        └── Mounts K8s Secret, obtains JWT at runtime

Resource Quotas

Quota Type Interactive Batch (Service Account)
Max memory 8 GB 64 GB
Max CPU 4 cores 16 cores
Spark executors 4 20
Concurrent queries 5 20
LiteLLM RPM 60 300

Why Service Accounts Matter

  1. Pipeline continuity: When Carol leaves projet-scoring, her personal access is revoked. The svc-projet-scoring service account continues running nightly pipelines uninterrupted.
  2. Audit separation: Interactive queries are logged under the human username. Batch operations are logged under the service account. Clear audit trail.
  3. Resource isolation: Interactive users get modest quotas for exploration. Batch jobs get generous quotas for production workloads. No resource contention.

Column Masking (Starburst-Inspired)

AKKO provides 8 predefined column masks plus custom expressions, inspired by Starburst's masking model but fully managed through OPA.

Predefined Masks

Mask Name Expression Example Input Example Output Compatible Types
SHA-256 to_hex(sha256(cast(col as varbinary))) alice@akko.io a1b2c3d4e5... VARCHAR, CHAR
SHA-512 to_hex(sha512(cast(col as varbinary))) alice@akko.io f6e5d4c3b2... VARCHAR, CHAR
MD5 to_hex(md5(cast(col as varbinary))) alice@akko.io d41d8cd98f... VARCHAR, CHAR
NULL CAST(NULL AS <type>) 2024-01-15 NULL All types
Redact '***REDACTED***' +33612345678 ***REDACTED*** VARCHAR, CHAR
First 4 substr(col, 1, 4) \|\| '****' 4242424242 4242**** VARCHAR, CHAR
Last 4 '****' \|\| substr(col, -4) 4242424242 ****4242 VARCHAR, CHAR
Year only date_trunc('year', col) 1990-05-23 1990-01-01 DATE, TIMESTAMP
Custom User-defined Trino SQL expression (varies) (varies) (depends on expression)

Mask Properties

  • Reusable: A mask is defined once and applied to many columns across many tables. Changing the mask definition updates all columns that use it.
  • Type-aware: The cockpit admin panel only shows compatible masks for each column type. You cannot apply First 4 to a DATE column.
  • Per-project: Masks are scoped to projects. projet-scoring may mask email with SHA-256, while projet-risk masks it with Redact.
  • Priority: When a user belongs to multiple projects with different masks on the same column, the most restrictive mask wins.

OPA Column Masking Example

package trino

import rego.v1

# Load masking rules from OPA data
masks := data.governance.column_masks

columnMask := {"expression": expression, "identity": identity} if {
    table := input.action.resource.table.tableName
    column := input.action.resource.column.columnName
    user_groups := input.context.identity.groups

    # Find applicable mask for this column in user's projects
    some project in user_groups
    mask_rule := masks[project][table][column]

    # User is not exempt (admins see clear data)
    not "akko-admin" in user_groups

    expression := mask_rule.expression
    identity := concat("_", ["mask", project, table, column])
}

Managing Masks in Cockpit

The cockpit admin panel provides a visual interface for managing column masks:

  1. Navigate to Administration > Governance > Column Masking
  2. Select a project (e.g., projet-scoring)
  3. Browse available tables and columns
  4. For each column, select a mask from the dropdown (filtered by column type)
  5. Click Apply -- the OPA policy data is updated in real time

Row-Level Security

Row-level security (RLS) restricts which rows a user can see, based on project membership.

Design Principles

Principle Implementation
Per-project, not per-user Filters are defined at the project level. All project members see the same rows.
Visual builder Cockpit admin panel provides a no-code filter builder
Type-aware operators UI only shows compatible operators for each column type
AND combination Multiple filters on the same table are combined with AND
User overrides Individual exceptions can be granted for specific users

Filter Builder

The visual builder in the cockpit admin panel:

+--------------------------------------------------------------+
|  Row Filter: projet-scoring / iceberg.scoring.transactions   |
|                                                              |
|  Table:    [transactions    v]                               |
|  Column:   [region          v]                               |
|  Operator: [IN              v]                               |
|  Value:    [EMEA, APAC         ]                             |
|                                                              |
|  + Add another filter                                        |
|                                                              |
|  Active filters:                                             |
|  1. region IN ('EMEA', 'APAC')                               |
|  2. amount > 100                                             |
|  Combined: region IN ('EMEA','APAC') AND amount > 100        |
+--------------------------------------------------------------+

Operators by Column Type

Column Type Available Operators
VARCHAR / CHAR =, !=, IN, NOT IN, LIKE, IS NULL, IS NOT NULL
INTEGER / BIGINT / DOUBLE =, !=, <, >, <=, >=, BETWEEN, IN, IS NULL
DATE / TIMESTAMP =, !=, <, >, <=, >=, BETWEEN, IS NULL
BOOLEAN =, IS NULL, IS NOT NULL

OPA Row Filter Example

package trino

import rego.v1

# Load row filter rules from OPA data
filters := data.governance.row_filters

rowFilters contains {"expression": expression, "identity": identity} if {
    table := input.action.resource.table.tableName
    user_groups := input.context.identity.groups

    # Find applicable filter for this table in user's projects
    some project in user_groups
    filter_rule := filters[project][table]

    # User is not exempt (admins see all rows)
    not "akko-admin" in user_groups

    expression := filter_rule.expression
    identity := concat("_", ["filter", project, table])
}

User Overrides

In exceptional cases, individual users can be granted broader access than their project default:

{
  "user_overrides": {
    "carol": {
      "projet-scoring": {
        "transactions": {
          "override": "region IN ('EMEA', 'APAC', 'NA')",
          "reason": "Cross-region analysis approved by DPO on 2026-03-10",
          "expires": "2026-06-10"
        }
      }
    }
  }
}

Overrides are:

  • Audited: Every override records who approved it and why
  • Time-limited: Overrides have an expiration date
  • Visible in cockpit: The admin panel shows active overrides with a warning badge

OPA Policy Model

OPA is the central policy engine for all data governance decisions in AKKO. It evaluates Trino access, column masking, and row filtering at runtime using data from three layers.

Three Data Layers

+------------------------------------------------------------------+
|                         OPA Data Store                            |
|                                                                  |
|  Layer 1: group_policies     (primary access rules)              |
|  ├── projet-scoring                                              |
|  │   ├── schemas: ["scoring"]                                    |
|  │   ├── column_masks: {transactions.email: "sha256", ...}       |
|  │   └── row_filters: {transactions: "region IN ('EMEA')"}       |
|  └── projet-risk                                                 |
|      ├── schemas: ["risk"]                                       |
|      ├── column_masks: {customers.ssn: "redact", ...}            |
|      └── row_filters: {customers: "risk_score > 50"}             |
|                                                                  |
|  Layer 2: user_overrides     (individual exceptions)             |
|  └── carol                                                       |
|      └── projet-scoring.transactions: expanded region filter     |
|                                                                  |
|  Layer 3: service_accounts   (batch pipeline rules)              |
|  └── svc-projet-scoring                                          |
|      ├── schemas: ["scoring"]                                    |
|      └── column_masks: {}   (no masking for ETL)                 |
+------------------------------------------------------------------+

Scope Merging Rules

When a user belongs to multiple projects, their access is the UNION of all project scopes:

Rule Behavior Example
Schema access UNION Carol in projet-scoring + projet-risk sees both scoring.* and risk.* schemas
Column masking Most restrictive wins If projet-scoring masks email with SHA-256 and projet-risk masks it with Redact, Carol sees Redact
Row filters AND combination If both projects filter transactions, both filters apply
Table access UNION Carol sees tables from both projects

Evaluation Flow

User Query: SELECT email FROM scoring.transactions WHERE amount > 1000

  1. JWT Extraction
     ├── user: carol
     └── groups: [akko-analyst, equipe-marketing, projet-scoring]

  2. Authorization Check (allow/deny)
     ├── Platform role: akko-analyst --> SELECT allowed
     └── Project scope: projet-scoring --> scoring.* allowed
     Result: ALLOWED

  3. Column Masking
     ├── Column: email
     ├── Project mask: projet-scoring.transactions.email = SHA-256
     └── Role exemption: akko-analyst is NOT exempt
     Result: to_hex(sha256(cast(email as varbinary)))

  4. Row Filtering
     ├── Table: scoring.transactions
     ├── Project filter: region IN ('EMEA', 'APAC')
     └── User override: none
     Result: WHERE region IN ('EMEA','APAC') AND amount > 1000

  5. Final Query (rewritten by Trino):
     SELECT to_hex(sha256(cast(email as varbinary))) AS email
     FROM scoring.transactions
     WHERE region IN ('EMEA','APAC') AND amount > 1000

Project Lifecycle

Create a Project

When an administrator creates a new project from the cockpit admin panel, the following resources are provisioned automatically:

Step Action Service
1 Create directory service group projet-{name} directory service
2 Sync group to Keycloak Keycloak (LDAP federation)
3 Create service account svc-projet-{name} Keycloak (client_credentials)
4 Store client credentials as K8s Secret Kubernetes
5 Create object storage prefix projects/{name}/ object storage
6 Create Trino schema iceberg.{name} Trino (via admin query)
7 Register OPA policy data for the project OPA
8 Create OpenMetadata domain OpenMetadata
# Example: cockpit triggers this sequence via API calls
# 1. Create directory service group
curl -X POST http://akko-lldap:17170/api/graphql \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -d '{"query":"mutation { createGroup(name: \"projet-scoring\") { id } }"}'

# 2. Keycloak syncs automatically via LDAP federation

# 3. Create service account
curl -X POST http://akko-keycloak:8080/admin/realms/akko/clients \
  -H "Authorization: Bearer $KC_TOKEN" \
  -d '{"clientId":"svc-projet-scoring","serviceAccountsEnabled":true,...}'

# 4. Store in K8s
kubectl create secret generic svc-projet-scoring-credentials \
  --from-literal=client_id=svc-projet-scoring \
  --from-literal=client_secret=$GENERATED_SECRET

# 5. Create object storage prefix (lazy -- created on first write)

# 6. Create Trino schema
trino --execute "CREATE SCHEMA IF NOT EXISTS iceberg.scoring"

# 7. Register OPA policy data
curl -X PUT http://akko-opa:8181/v1/data/governance/group_policies/projet-scoring \
  -d '{"schemas":["scoring"],"column_masks":{},"row_filters":{}}'

# 8. Create OpenMetadata domain
curl -X POST http://openmetadata:8585/api/v1/domains \
  -d '{"name":"scoring","description":"Scoring project domain"}'

Add a Member

  1. Admin adds user to the directory service group projet-{name}
  2. At the user's next login, the JWT includes projet-{name} in the groups claim
  3. OPA evaluates the new group membership at query time
  4. object storage grants access to projects/{name}/ based on JWT groups
  5. No restart of any service required

Remove a Member

  1. Admin removes user from the directory service group
  2. At the user's next login, the JWT no longer includes the project group
  3. OPA denies access to project schemas and applies no project-specific masks
  4. object storage denies access to project prefix
  5. Pipelines are unaffected -- the service account svc-projet-{name} continues running

Token expiry delay

Existing JWT tokens remain valid until they expire (default: 5 minutes). For immediate revocation, force-logout the user via the Keycloak Admin API.

Archive a Project

Step Action
1 Set Trino schemas to read-only in OPA (remove INSERT/DELETE)
2 Set object storage prefix to read-only (remove PutObject)
3 Disable service account in Keycloak
4 Mark OpenMetadata domain as archived
5 Retain data for compliance (configurable retention period)

Enterprise Patterns

Industry Research

AKKO's governance model is based on research of leading enterprise analytics stacks:

Platform Approach AKKO Equivalent
Databricks Unity Catalog Metastore > Catalog > Schema > Table hierarchy Project > Schema > Table with OPA enforcement
Snowflake Account > Database > Schema with role hierarchy Platform Role + Project with JWT-based scoping
Starburst Built-in access control with column masking and row filtering OPA policies with 8 predefined masks
Cloudera Ranger policies with tag-based access control OPA data layers with project-scoped tags

Industry Consensus

The data governance industry has converged on several best practices that AKKO implements:

Best Practice AKKO Implementation
Runtime policy evaluation (not pre-provisioning) OPA evaluates JWT groups at query time
Separation of identity and authorization Keycloak (identity) + OPA (authorization)
Service accounts for automation Keycloak client_credentials + K8s Secrets
Column masking as reusable objects 8 predefined masks, applied declaratively
Row-level security per data scope Per-project row filters, AND-combined
Centralized audit trail All access decisions logged in OPA decision logs

AKKO Differentiator

Single JWT controls all services. In competing platforms, each service has its own access control system. In AKKO, one Keycloak JWT token carries platform role, team membership, and project scope -- and every service respects it:

                    Single JWT Token
                         |
        +----------------+------------------+
        |                |                  |
        v                v                  v
   +--------+     +-----------+     +-------------+
   | Trino  |     |   object storage   |     | JupyterHub  |
   |  OPA   |     | JWT eval  |     | Spawner env |
   +--------+     +-----------+     +-------------+
        |                |                  |
        v                v                  v
   +--------+     +-----------+     +-------------+
   |LiteLLM |     |   Spark   |     |  OpenMeta   |
   | API key |     | Submit    |     |  Domain     |
   | by role |     | as svc    |     |  visibility |
   +--------+     +-----------+     +-------------+

No other open-source platform achieves this level of cross-service governance with a single identity token.


Summary

Dimension Source JWT Claim Controls
Platform Role Keycloak realm role groups: ["akko-analyst"] Tool access, resource tier, AI model access
Business Team directory service group groups: ["equipe-marketing"] Shared team storage in object storage
Project directory service group groups: ["projet-scoring"] Data access (schemas, tables, rows, columns)
Component Role in Governance
Keycloak Identity provider, JWT issuer, service account manager
directory service Group management (teams and projects)
OPA Policy engine (authorization, masking, filtering)
object storage Storage with runtime JWT policy evaluation
Trino Query engine with OPA-delegated access control
OpenMetadata Data catalog with domain-scoped visibility
Cockpit Admin UI for project/mask/filter management