Governance Architecture -- Three-Dimension Access Model¶
AKKO implements enterprise-grade data governance with three orthogonal dimensions: Platform Role, Business Team, and Project. A single Keycloak JWT token carries all governance information, enforced consistently across every service in the platform.
+------------------------------------------------------------------+
| Keycloak JWT Token |
| |
| preferred_username: "carol" |
| groups: ["akko-analyst", "equipe-marketing", "projet-scoring"] |
| |
| Dimension 1: Platform Role --> What TOOLS you can use |
| Dimension 2: Business Team --> What TEAM resources you share |
| Dimension 3: Project --> What DATA you can access |
+------------------------------------------------------------------+
| | |
v v v
+-----------+ +-------------+ +----------------+
| Trino | | object storage | | OPA Policy |
| Superset | | Shared/ | | Row filters |
| Airflow | | Team dirs | | Col masking |
| LiteLLM | | | | Schema access |
+-----------+ +-------------+ +----------------+
Why three dimensions?
Traditional platforms conflate roles and data access into a single hierarchy. AKKO separates them: a marketing analyst and a fraud analyst may have the same platform role (both are akko-analyst) but see completely different data (one works on projet-scoring, the other on projet-risk). Meanwhile, they share team storage with their respective departments. This model scales from 5 users to 5,000 without restructuring.
The Three Dimensions¶
Dimension 1: Platform Role¶
Platform roles control what tools and capabilities a user can access across all AKKO services.
| Role | Persona | Tools | Resource Tier | AI Model Access |
|---|---|---|---|---|
akko-admin |
Platform administrator | All services, admin consoles, DDL/DML | Unlimited | All models |
akko-engineer |
Data engineer | Spark, Airflow, Trino DDL, MLflow, notebooks | High (16GB, 8 CPU) | All models |
akko-analyst |
Senior data analyst | Trino SELECT, Superset, notebooks, dashboards | Medium (8GB, 4 CPU) | Standard models |
akko-user |
Compliance analyst | Trino SELECT (masked), OpenMetadata, audit logs | Medium (8GB, 4 CPU) | Standard models |
akko-viewer |
Executive / dashboard viewer | Superset dashboards, filtered views | Low (4GB, 2 CPU) | Chat models only |
Platform roles are Keycloak realm roles, emitted in the JWT groups claim. They are orthogonal to data access -- an akko-analyst in the fraud team sees different data than an akko-analyst in marketing.
Dimension 2: Business Team¶
Business teams represent organizational structure. They control shared resources, not data access.
| Team | Example Members | Shared Resources |
|---|---|---|
equipe-marketing |
Carol, Frank | shared/equipe-marketing/ in object storage |
equipe-data-eng |
Bob, Grace | shared/equipe-data-eng/ in object storage |
equipe-fraude |
Eve, Hector | shared/equipe-fraude/ in object storage |
Teams are directory service groups synced to Keycloak. They appear in the JWT groups claim alongside platform roles.
Teams are NOT access control
A team membership alone grants no data access. It provides shared team storage in object storage and a logical grouping in OpenMetadata. Data access is controlled exclusively by project membership (Dimension 3).
Dimension 3: Project¶
Projects control data access. They determine which Trino schemas, Iceberg tables, object storage prefixes, and OpenMetadata domains are accessible to a user.
| Project | Data Scope | Members (cross-team) |
|---|---|---|
projet-scoring |
iceberg.scoring.*, object storage projects/scoring/ |
Carol (marketing), Bob (data-eng) |
projet-risk |
iceberg.risk.*, object storage projects/risk/ |
Eve (fraude), Grace (data-eng) |
Projects are cross-team: members from different business teams collaborate on the same data. Each project has:
- An directory service group (e.g.,
projet-scoring) - A Keycloak service account for batch pipelines
- A object storage prefix (
projects/{name}/) - Trino schema access rules in OPA
- An OpenMetadata domain
- Column masking and row filter policies
Storage Model -- object storage JWT Variables¶
AKKO uses runtime JWT policy evaluation in object storage -- no per-user bucket creation, no static IAM policies.
Design Principles¶
| Principle | Implementation |
|---|---|
| No per-user bucket creation | Industry best practice -- buckets are structural, not identity-scoped |
| Runtime JWT evaluation | object storage evaluates ${jwt:preferred_username} and ${jwt:groups} at request time |
| Lazy user prefix creation | Per-user directories created on first upload, not at provisioning time |
| Zero provisioning on user creation | Adding a user to Keycloak + directory service group is sufficient |
Bucket Layout¶
minio/
├── analytics/ # Iceberg warehouse (managed by Polaris)
│ ├── scoring/ # projet-scoring tables
│ ├── risk/ # projet-risk tables
│ └── shared/ # Cross-project reference data
├── production/ # Service accounts only (no interactive access)
│ ├── airflow/ # DAG artifacts, logs
│ └── mlflow/ # Model artifacts
├── projects/
│ ├── scoring/ # projet-scoring workspace
│ │ ├── raw/ # Raw data uploads
│ │ ├── processed/ # Intermediate results
│ │ └── exports/ # Deliverables
│ └── risk/ # projet-risk workspace
├── shared/
│ ├── equipe-marketing/ # Team-shared files
│ ├── equipe-data-eng/ # Team-shared files
│ └── equipe-fraude/ # Team-shared files
└── users/
├── carol/ # Personal workspace (created on first upload)
├── bob/ # Personal workspace
└── eve/ # Personal workspace
JWT Policy Variables¶
object storage policies reference JWT claims directly:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"],
"Resource": [
"arn:aws:s3:::users/${jwt:preferred_username}/*"
]
},
{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:PutObject"],
"Resource": [
"arn:aws:s3:::shared/${jwt:groups}/*"
]
},
{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:PutObject"],
"Resource": [
"arn:aws:s3:::projects/${jwt:groups}/*"
]
}
]
}
No static user management in object storage
When Carol joins projet-scoring, an admin adds her to the directory service group. At her next login, her JWT contains projet-scoring in the groups claim. object storage evaluates this at request time -- zero object storage configuration required.
Service Accounts¶
Interactive vs Automated Access¶
| Aspect | Interactive (Human) | Automated (Pipeline) |
|---|---|---|
| Identity | Personal JWT (Keycloak login) | Service account (client_credentials) |
| Authentication | Browser OIDC flow | client_id + client_secret |
| Token source | Keycloak user session | K8s Secret mounted in pod |
| Resource quota | Per-role tier (4-16 GB) | Batch tier (64 GB, 20 executors) |
| Lifecycle | Tied to human employment | Tied to project lifecycle |
| Audit trail | preferred_username in logs |
svc-projet-scoring in logs |
Service Account Architecture¶
Project: projet-scoring
|
+-- Keycloak Client: svc-projet-scoring
| ├── Grant type: client_credentials
| ├── Groups: ["projet-scoring", "svc-accounts"]
| └── Scope: project data only
|
+-- K8s Secret: svc-projet-scoring-credentials
| ├── client_id: svc-projet-scoring
| └── client_secret: <generated>
|
+-- Airflow Connection: projet_scoring_trino
| └── Uses svc-projet-scoring token
|
+-- Spark Application:
└── Mounts K8s Secret, obtains JWT at runtime
Resource Quotas¶
| Quota Type | Interactive | Batch (Service Account) |
|---|---|---|
| Max memory | 8 GB | 64 GB |
| Max CPU | 4 cores | 16 cores |
| Spark executors | 4 | 20 |
| Concurrent queries | 5 | 20 |
| LiteLLM RPM | 60 | 300 |
Why Service Accounts Matter¶
- Pipeline continuity: When Carol leaves
projet-scoring, her personal access is revoked. Thesvc-projet-scoringservice account continues running nightly pipelines uninterrupted. - Audit separation: Interactive queries are logged under the human username. Batch operations are logged under the service account. Clear audit trail.
- Resource isolation: Interactive users get modest quotas for exploration. Batch jobs get generous quotas for production workloads. No resource contention.
Column Masking (Starburst-Inspired)¶
AKKO provides 8 predefined column masks plus custom expressions, inspired by Starburst's masking model but fully managed through OPA.
Predefined Masks¶
| Mask Name | Expression | Example Input | Example Output | Compatible Types |
|---|---|---|---|---|
| SHA-256 | to_hex(sha256(cast(col as varbinary))) |
alice@akko.io |
a1b2c3d4e5... |
VARCHAR, CHAR |
| SHA-512 | to_hex(sha512(cast(col as varbinary))) |
alice@akko.io |
f6e5d4c3b2... |
VARCHAR, CHAR |
| MD5 | to_hex(md5(cast(col as varbinary))) |
alice@akko.io |
d41d8cd98f... |
VARCHAR, CHAR |
| NULL | CAST(NULL AS <type>) |
2024-01-15 |
NULL |
All types |
| Redact | '***REDACTED***' |
+33612345678 |
***REDACTED*** |
VARCHAR, CHAR |
| First 4 | substr(col, 1, 4) \|\| '****' |
4242424242 |
4242**** |
VARCHAR, CHAR |
| Last 4 | '****' \|\| substr(col, -4) |
4242424242 |
****4242 |
VARCHAR, CHAR |
| Year only | date_trunc('year', col) |
1990-05-23 |
1990-01-01 |
DATE, TIMESTAMP |
| Custom | User-defined Trino SQL expression | (varies) | (varies) | (depends on expression) |
Mask Properties¶
- Reusable: A mask is defined once and applied to many columns across many tables. Changing the mask definition updates all columns that use it.
- Type-aware: The cockpit admin panel only shows compatible masks for each column type. You cannot apply
First 4to a DATE column. - Per-project: Masks are scoped to projects.
projet-scoringmay maskemailwith SHA-256, whileprojet-riskmasks it with Redact. - Priority: When a user belongs to multiple projects with different masks on the same column, the most restrictive mask wins.
OPA Column Masking Example¶
package trino
import rego.v1
# Load masking rules from OPA data
masks := data.governance.column_masks
columnMask := {"expression": expression, "identity": identity} if {
table := input.action.resource.table.tableName
column := input.action.resource.column.columnName
user_groups := input.context.identity.groups
# Find applicable mask for this column in user's projects
some project in user_groups
mask_rule := masks[project][table][column]
# User is not exempt (admins see clear data)
not "akko-admin" in user_groups
expression := mask_rule.expression
identity := concat("_", ["mask", project, table, column])
}
Managing Masks in Cockpit¶
The cockpit admin panel provides a visual interface for managing column masks:
- Navigate to Administration > Governance > Column Masking
- Select a project (e.g.,
projet-scoring) - Browse available tables and columns
- For each column, select a mask from the dropdown (filtered by column type)
- Click Apply -- the OPA policy data is updated in real time
Row-Level Security¶
Row-level security (RLS) restricts which rows a user can see, based on project membership.
Design Principles¶
| Principle | Implementation |
|---|---|
| Per-project, not per-user | Filters are defined at the project level. All project members see the same rows. |
| Visual builder | Cockpit admin panel provides a no-code filter builder |
| Type-aware operators | UI only shows compatible operators for each column type |
| AND combination | Multiple filters on the same table are combined with AND |
| User overrides | Individual exceptions can be granted for specific users |
Filter Builder¶
The visual builder in the cockpit admin panel:
+--------------------------------------------------------------+
| Row Filter: projet-scoring / iceberg.scoring.transactions |
| |
| Table: [transactions v] |
| Column: [region v] |
| Operator: [IN v] |
| Value: [EMEA, APAC ] |
| |
| + Add another filter |
| |
| Active filters: |
| 1. region IN ('EMEA', 'APAC') |
| 2. amount > 100 |
| Combined: region IN ('EMEA','APAC') AND amount > 100 |
+--------------------------------------------------------------+
Operators by Column Type¶
| Column Type | Available Operators |
|---|---|
| VARCHAR / CHAR | =, !=, IN, NOT IN, LIKE, IS NULL, IS NOT NULL |
| INTEGER / BIGINT / DOUBLE | =, !=, <, >, <=, >=, BETWEEN, IN, IS NULL |
| DATE / TIMESTAMP | =, !=, <, >, <=, >=, BETWEEN, IS NULL |
| BOOLEAN | =, IS NULL, IS NOT NULL |
OPA Row Filter Example¶
package trino
import rego.v1
# Load row filter rules from OPA data
filters := data.governance.row_filters
rowFilters contains {"expression": expression, "identity": identity} if {
table := input.action.resource.table.tableName
user_groups := input.context.identity.groups
# Find applicable filter for this table in user's projects
some project in user_groups
filter_rule := filters[project][table]
# User is not exempt (admins see all rows)
not "akko-admin" in user_groups
expression := filter_rule.expression
identity := concat("_", ["filter", project, table])
}
User Overrides¶
In exceptional cases, individual users can be granted broader access than their project default:
{
"user_overrides": {
"carol": {
"projet-scoring": {
"transactions": {
"override": "region IN ('EMEA', 'APAC', 'NA')",
"reason": "Cross-region analysis approved by DPO on 2026-03-10",
"expires": "2026-06-10"
}
}
}
}
}
Overrides are:
- Audited: Every override records who approved it and why
- Time-limited: Overrides have an expiration date
- Visible in cockpit: The admin panel shows active overrides with a warning badge
OPA Policy Model¶
OPA is the central policy engine for all data governance decisions in AKKO. It evaluates Trino access, column masking, and row filtering at runtime using data from three layers.
Three Data Layers¶
+------------------------------------------------------------------+
| OPA Data Store |
| |
| Layer 1: group_policies (primary access rules) |
| ├── projet-scoring |
| │ ├── schemas: ["scoring"] |
| │ ├── column_masks: {transactions.email: "sha256", ...} |
| │ └── row_filters: {transactions: "region IN ('EMEA')"} |
| └── projet-risk |
| ├── schemas: ["risk"] |
| ├── column_masks: {customers.ssn: "redact", ...} |
| └── row_filters: {customers: "risk_score > 50"} |
| |
| Layer 2: user_overrides (individual exceptions) |
| └── carol |
| └── projet-scoring.transactions: expanded region filter |
| |
| Layer 3: service_accounts (batch pipeline rules) |
| └── svc-projet-scoring |
| ├── schemas: ["scoring"] |
| └── column_masks: {} (no masking for ETL) |
+------------------------------------------------------------------+
Scope Merging Rules¶
When a user belongs to multiple projects, their access is the UNION of all project scopes:
| Rule | Behavior | Example |
|---|---|---|
| Schema access | UNION | Carol in projet-scoring + projet-risk sees both scoring.* and risk.* schemas |
| Column masking | Most restrictive wins | If projet-scoring masks email with SHA-256 and projet-risk masks it with Redact, Carol sees Redact |
| Row filters | AND combination | If both projects filter transactions, both filters apply |
| Table access | UNION | Carol sees tables from both projects |
Evaluation Flow¶
User Query: SELECT email FROM scoring.transactions WHERE amount > 1000
1. JWT Extraction
├── user: carol
└── groups: [akko-analyst, equipe-marketing, projet-scoring]
2. Authorization Check (allow/deny)
├── Platform role: akko-analyst --> SELECT allowed
└── Project scope: projet-scoring --> scoring.* allowed
Result: ALLOWED
3. Column Masking
├── Column: email
├── Project mask: projet-scoring.transactions.email = SHA-256
└── Role exemption: akko-analyst is NOT exempt
Result: to_hex(sha256(cast(email as varbinary)))
4. Row Filtering
├── Table: scoring.transactions
├── Project filter: region IN ('EMEA', 'APAC')
└── User override: none
Result: WHERE region IN ('EMEA','APAC') AND amount > 1000
5. Final Query (rewritten by Trino):
SELECT to_hex(sha256(cast(email as varbinary))) AS email
FROM scoring.transactions
WHERE region IN ('EMEA','APAC') AND amount > 1000
Project Lifecycle¶
Create a Project¶
When an administrator creates a new project from the cockpit admin panel, the following resources are provisioned automatically:
| Step | Action | Service |
|---|---|---|
| 1 | Create directory service group projet-{name} |
directory service |
| 2 | Sync group to Keycloak | Keycloak (LDAP federation) |
| 3 | Create service account svc-projet-{name} |
Keycloak (client_credentials) |
| 4 | Store client credentials as K8s Secret | Kubernetes |
| 5 | Create object storage prefix projects/{name}/ |
object storage |
| 6 | Create Trino schema iceberg.{name} |
Trino (via admin query) |
| 7 | Register OPA policy data for the project | OPA |
| 8 | Create OpenMetadata domain | OpenMetadata |
# Example: cockpit triggers this sequence via API calls
# 1. Create directory service group
curl -X POST http://akko-lldap:17170/api/graphql \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-d '{"query":"mutation { createGroup(name: \"projet-scoring\") { id } }"}'
# 2. Keycloak syncs automatically via LDAP federation
# 3. Create service account
curl -X POST http://akko-keycloak:8080/admin/realms/akko/clients \
-H "Authorization: Bearer $KC_TOKEN" \
-d '{"clientId":"svc-projet-scoring","serviceAccountsEnabled":true,...}'
# 4. Store in K8s
kubectl create secret generic svc-projet-scoring-credentials \
--from-literal=client_id=svc-projet-scoring \
--from-literal=client_secret=$GENERATED_SECRET
# 5. Create object storage prefix (lazy -- created on first write)
# 6. Create Trino schema
trino --execute "CREATE SCHEMA IF NOT EXISTS iceberg.scoring"
# 7. Register OPA policy data
curl -X PUT http://akko-opa:8181/v1/data/governance/group_policies/projet-scoring \
-d '{"schemas":["scoring"],"column_masks":{},"row_filters":{}}'
# 8. Create OpenMetadata domain
curl -X POST http://openmetadata:8585/api/v1/domains \
-d '{"name":"scoring","description":"Scoring project domain"}'
Add a Member¶
- Admin adds user to the directory service group
projet-{name} - At the user's next login, the JWT includes
projet-{name}in thegroupsclaim - OPA evaluates the new group membership at query time
- object storage grants access to
projects/{name}/based on JWT groups - No restart of any service required
Remove a Member¶
- Admin removes user from the directory service group
- At the user's next login, the JWT no longer includes the project group
- OPA denies access to project schemas and applies no project-specific masks
- object storage denies access to project prefix
- Pipelines are unaffected -- the service account
svc-projet-{name}continues running
Token expiry delay
Existing JWT tokens remain valid until they expire (default: 5 minutes). For immediate revocation, force-logout the user via the Keycloak Admin API.
Archive a Project¶
| Step | Action |
|---|---|
| 1 | Set Trino schemas to read-only in OPA (remove INSERT/DELETE) |
| 2 | Set object storage prefix to read-only (remove PutObject) |
| 3 | Disable service account in Keycloak |
| 4 | Mark OpenMetadata domain as archived |
| 5 | Retain data for compliance (configurable retention period) |
Enterprise Patterns¶
Industry Research¶
AKKO's governance model is based on research of leading enterprise analytics stacks:
| Platform | Approach | AKKO Equivalent |
|---|---|---|
| Databricks Unity Catalog | Metastore > Catalog > Schema > Table hierarchy | Project > Schema > Table with OPA enforcement |
| Snowflake | Account > Database > Schema with role hierarchy | Platform Role + Project with JWT-based scoping |
| Starburst | Built-in access control with column masking and row filtering | OPA policies with 8 predefined masks |
| Cloudera | Ranger policies with tag-based access control | OPA data layers with project-scoped tags |
Industry Consensus¶
The data governance industry has converged on several best practices that AKKO implements:
| Best Practice | AKKO Implementation |
|---|---|
| Runtime policy evaluation (not pre-provisioning) | OPA evaluates JWT groups at query time |
| Separation of identity and authorization | Keycloak (identity) + OPA (authorization) |
| Service accounts for automation | Keycloak client_credentials + K8s Secrets |
| Column masking as reusable objects | 8 predefined masks, applied declaratively |
| Row-level security per data scope | Per-project row filters, AND-combined |
| Centralized audit trail | All access decisions logged in OPA decision logs |
AKKO Differentiator¶
Single JWT controls all services. In competing platforms, each service has its own access control system. In AKKO, one Keycloak JWT token carries platform role, team membership, and project scope -- and every service respects it:
Single JWT Token
|
+----------------+------------------+
| | |
v v v
+--------+ +-----------+ +-------------+
| Trino | | object storage | | JupyterHub |
| OPA | | JWT eval | | Spawner env |
+--------+ +-----------+ +-------------+
| | |
v v v
+--------+ +-----------+ +-------------+
|LiteLLM | | Spark | | OpenMeta |
| API key | | Submit | | Domain |
| by role | | as svc | | visibility |
+--------+ +-----------+ +-------------+
No other open-source platform achieves this level of cross-service governance with a single identity token.
Summary¶
| Dimension | Source | JWT Claim | Controls |
|---|---|---|---|
| Platform Role | Keycloak realm role | groups: ["akko-analyst"] |
Tool access, resource tier, AI model access |
| Business Team | directory service group | groups: ["equipe-marketing"] |
Shared team storage in object storage |
| Project | directory service group | groups: ["projet-scoring"] |
Data access (schemas, tables, rows, columns) |
| Component | Role in Governance |
|---|---|
| Keycloak | Identity provider, JWT issuer, service account manager |
| directory service | Group management (teams and projects) |
| OPA | Policy engine (authorization, masking, filtering) |
| object storage | Storage with runtime JWT policy evaluation |
| Trino | Query engine with OPA-delegated access control |
| OpenMetadata | Data catalog with domain-scoped visibility |
| Cockpit | Admin UI for project/mask/filter management |
Related Documentation¶
- RBAC -- Role-Based Access Control -- Detailed role definitions, OPA policies, and user management
- OPA Service -- OPA deployment and configuration
- Keycloak Service -- Keycloak realm configuration
- Object storage -- S3-compatible storage layer configuration