OpenDPI Specification
The open standard for describing data product interfaces — what it is, how it's structured, and why it exists.
What is OpenDPI?
OpenDPI (Open Data Product Interface) is an open, vendor-neutral specification for describing how data products expose their interfaces. It is to data products what OpenAPI is to REST APIs: a machine-readable contract that tools, teams, and AI agents can reason about without needing live access to your warehouse.
An OpenDPI document answers three questions:
- What data does this product expose? — ports define named interfaces (tables, topics, files, API endpoints)
- Where does the data live? — connections declare the infrastructure endpoints
- What shape is the data? — schemas describe the structure using JSON Schema
Document structure
Every OpenDPI document is a YAML or JSON file with four top-level sections:
opendpi: "1.0.0" # Spec version — always "1.0.0" for now
info: # Product metadata
title: string # Required
version: string # Required, semantic versioning
description: string
connections: # Infrastructure endpoints
<name>:
type: string
host: string
description: string
variables: {}
ports: # Data interfaces
<name>:
description: string
connections:
- connection: "#/connections/<name>"
location: string
schema: {} # Inline JSON Schema or $ref
components: # Optional reusable schemas
schemas:
<name>: {}
info
The info section identifies the product and its current version. The version follows semantic versioning and is managed with daco product upgrade.
info:
title: Customer Analytics
version: "2.1.0"
description: Aggregated metrics for the customer lifecycle team.
connections
Each connection represents an infrastructure endpoint where data can be found. The type is a free-form string — it exists for human readability and downstream tooling, not for validation.
connections:
analytics_db:
type: postgresql
host: analytics.db.example.com
description: Production analytics warehouse
variables:
database: analytics
schema: public
events_kafka:
type: kafka
host: kafka.internal:9092
The variables map holds any additional configuration your downstream consumers need — database names, schemas, buckets, prefixes. Values are always strings.
ports
A port is a named data interface your product exposes. It references one or more connections and carries a JSON Schema describing the data's shape.
ports:
daily_metrics:
description: Daily aggregated customer metrics
connections:
- connection: "#/connections/analytics_db"
location: customer_daily_metrics
schema:
type: object
properties:
customer_id: { type: string }
date: { type: string, format: date }
total_orders: { type: integer }
required: [customer_id, date, total_orders]
Location syntax depends on the connection type:
| Connection type | Location format | Example |
|---|---|---|
| postgresql / mysql | schema.table or table |
public.orders |
| kafka | topic name | customer-events |
| s3 | bucket/path/ |
analytics/metrics/ |
| http | path | /api/v1/customers |
Schemas can also reference a separate file to keep the spec readable:
ports:
users:
schema:
$ref: schemas/users.schema.yaml
components
Reusable schema definitions go under components.schemas. Reference them from any port with $ref:
components:
schemas:
Address:
type: object
properties:
street: { type: string }
city: { type: string }
country: { type: string }
ports:
customers:
schema:
type: object
properties:
id: { type: string }
address:
$ref: "#/components/schemas/Address"
Schemas
Daco uses JSON Schema Draft 2020-12 for all schema definitions. The translator understands the full range of keywords:
- Scalar types:
string,integer,number,boolean,null - Composite:
objectwithproperties,required,additionalProperties - Arrays:
arraywithitems - Composition:
allOf,anyOf,oneOf - Constraints:
enum,pattern,minimum,maximum,minLength,maxLength - Definitions:
$defsand$ref
Full example
opendpi: "1.0.0"
info:
title: Order Pipeline
version: "1.3.0"
connections:
warehouse:
type: postgresql
host: dwh.example.com
variables:
database: production
schema: orders
ports:
raw_orders:
description: Raw order events from the ingestion layer
connections:
- connection: "#/connections/warehouse"
location: raw_orders
schema:
$ref: schemas/raw_orders.schema.yaml
order_summary:
description: Aggregated daily order totals
connections:
- connection: "#/connections/warehouse"
location: order_summary
schema:
type: object
properties:
date: { type: string, format: date }
total_revenue: { type: number }
order_count: { type: integer }
required: [date, total_revenue, order_count]