Why Your Data Catalog Should Live Next to Your Code
Most data catalogs feel like they were built for a sales demo, not for the people actually working with data.
They promise a single source of truth. What you get is a stale portal that nobody trusts, maintained by a team that can't keep up with the pace of engineering.
The Problem with Traditional Data Catalogs
Traditional solutions suffer from four compounding problems.
Constant maintenance. Catalogs drift from reality despite automation attempts. Stale descriptions and broken lineage erode trust until the tool becomes shelfware.
Slow onboarding. Implementation requires weeks of configuration and connecting data sources before anyone sees value.
Vendor lock-in. Metadata gets trapped in proprietary platforms. When you want to switch tools — or when the vendor raises prices — you lose everything.
Late feedback loops. Stakeholders only review completed production work. Catching a naming inconsistency after a pipeline ships costs ten times more than catching it in a pull request.
A Different Approach: Code-First
What if your data catalog lived in the same repository as your code?
With Daco, you define data products using OpenDPI format alongside your pipelines. The Daco CLI enforces consistency across schemas. Definitions are version-controlled, reviewable in pull requests, and always in sync with what's actually running.
opendpi: "1.0.0"
info:
title: Customer Analytics
version: "2.1.0"
ports:
daily_metrics:
description: "Daily aggregated customer metrics"
schema:
type: object
properties:
customer_id: { type: string }
date: { type: string, format: date }
revenue: { type: number }
Enter Daco Studio
Daco Studio reads your repository and builds a live catalog automatically. Three steps:
- Connect your repository
- Studio reads your
opendpi.yamlfiles - Non-technical users get a searchable discovery interface — no manual entry required
When you push a change, the catalog updates. No synchronization scripts. No maintenance burden.
Why This Matters
A code-adjacent catalog gives you:
- Accuracy — reflects the actual state of your repository, not what someone remembered to document
- Speed — minutes to deploy, not weeks
- Early feedback — stakeholders review definitions in PRs, before anything ships to production
- Portability — OpenDPI is open and vendor-neutral; your metadata is yours
Built for Real Teams
The teams we work with don't want another tool to maintain. They want catalog updates to follow code changes automatically, and they want new hires to understand the data estate without asking ten people.
That's what we built.
Try Daco Studio or read the docs to get started.