OpenDPI: A Standard for Data Product Interfaces
By Daco Team
OpenAPI changed how we build and consume REST APIs. Before it, integrating with an API meant reading through pages of documentation, hoping it was up to date, and writing boilerplate code by hand. OpenAPI made APIs machine-readable, and that unlocked an entire ecosystem: code generators, validators, documentation tools, and SDK builders.
Data products deserve the same. That is why we built OpenDPI.
Why Another Standard?
We are not the first to recognize that data products need better interfaces. There have been other initiatives in this space, and we studied them carefully. What we found was that many of these efforts try to solve everything at once: governance, lineage, quality, access control, and metadata all bundled together.
The result is specifications that are complex, hard to adopt, and difficult to build tooling around.
We took a different approach. OpenDPI focuses on one thing: describing what a data product exposes and how to connect to it. Nothing more. This constraint is intentional. A simple, focused standard is easier to implement, easier to validate, and easier to build tools for.
Tooling First
A standard without tooling is just documentation. We designed OpenDPI with tooling in mind from day one.
Because OpenDPI documents are machine-readable and follow a strict schema, you can:
- Generate code from data product definitions
- Validate that your data matches its contract
- Build catalogs that automatically discover and index data products
The standard exists to enable tooling. That is the real value.
That is why we built the Daco CLI. It handles project scaffolding, connection management, port definitions, and schema translation to over 12 formats including Avro, Protobuf, PySpark, Pydantic, and Go types.
Install it with Homebrew:
brew install dacolabs/tap/dacoor with Scoop:
scoop bucket add dacolabs https://github.com/dacolabs/scoop-bucket.git
scoop install dacoThen initialize your first data product:
daco initCheck out the Daco CLI documentation for more.
What Does It Look Like?
Here is a simple OpenDPI document describing a customer analytics data product:
opendpi: "1.0.0"
info:
title: Customer Analytics
version: "2.1.0"
description: Aggregated customer behavior metrics
connections:
analytics_db:
type: postgresql
host: analytics.db.example.com
variables:
database: analytics
schema: public
ports:
daily_metrics:
description: Daily aggregated customer metrics
connections:
- connection: "#/connections/analytics_db"
location: customer_daily_metrics
schema:
type: object
properties:
customer_id:
type: string
date:
type: string
format: date
total_orders:
type: integer
revenue:
type: numberThree sections, and you have a complete description of what this data product exposes:
- Connections define where the data lives. Any infrastructure: PostgreSQL, Kafka, S3, BigQuery, or your own custom system.
- Ports define what data is exposed. Each port has a location, a connection, and a JSON Schema describing the shape of the data.
- Info provides metadata about the product itself.
That is it. Simple, focused, and ready for tooling to build on top of.
Get Started
OpenDPI is in its early stages and we are building in the open. We would love your feedback and contributions:
- The Specification: Explore OpenDPI on GitHub.
- The CLI: Get the Daco CLI to start building.
- Contribute: Open an issue, suggest improvements, or help shape the standard.
Visit dacolabs.com to explore the OpenDPI specification, try the Daco CLI, and join our community.