Building Intelligence Week 7: the inventory schema stands up as a real Postgres database on Neon. Plus a Rabbit Hole on OpenClaw and the due-diligence questions an introduction isn't responsible for answering.
A note that came to the HootClub directly: Clayton Custer of gener8tor reached out to invite our community to the gALPHA program kicking off at WCTC's Applied AI Lab. I'm passing it along because it's a genuinely good fit for the people who read this — and because the registration window is short.
Register by Saturday, May 30 — the program begins Monday, June 1.
gALPHA is a free, four-week venture-creation workshop run by gener8tor (a nationally ranked startup accelerator) and sponsored by WCTC. The Applied AI Lab version takes a "back to basics" approach to building a business, but with AI woven through every stage — using generative tools to rethink the business model canvas, validate ideas, prototype with no-code workflow tools, and pitch. No prior AI experience is required, and it's open to anyone in the Greater Waukesha area, not just WCTC students.
What the four weeks cover:
The format is a weekly Monday Lunch & Learn (streamable over Zoom), Wednesday-evening group collaboration sessions, and flexible one-on-one coaching. Throughout, teams take on an AI Workflow Challenge — build an AI-enabled workflow that improves your business, pitch it at the showcase, and the best one wins six months (or $300) of a free AI service, sponsored by gener8tor.
If you've been sitting on an idea, this is a low-cost, high-support way to actually move on it — and a good chance to meet other technologists in the area. I'd encourage you to take a look.
Apply: gener8tor.com/galpha/applied-ai-lab · Contact: Clayton Custer, Program Manager, clayton.custer@gener8tor.com
Last week I drew the schema — six core entities, the junction tables between them, the rules each one encodes. It was a plan. A diagram. Something that existed in dbdiagram.io and in my head, but nowhere a row of real data could ever land. This week it became real. The schema is now a Postgres database, the tables exist, and they are sitting there empty, waiting for Milwaukee.
That's the whole milestone, and I want to be honest that it's a small one on purpose. I didn't load any data. I didn't pick a model. I didn't write a single line of the code that will eventually fill these tables. I translated the design into a real structure and confirmed it holds together — the order of operations this series keeps insisting on: schema before code before content. The structure exists now. Next week it starts to fill.
But standing it up surfaced a question I didn't expect, and it's the more interesting story. The database lives on Neon — managed Postgres that my hardais.com site already connects to. When I went to create it, I found I already had a Neon project there: hardais-catalog, the database behind a LLM catalog the site is intending to offer, where visitors search and sort information about language models. That stopped me. Because the thing I'm building — a searchable, sortable inventory of entities with relationships and provenance — is also a catalog. And the site already has a lab where you can ask a question of a model that uses vector-based retrieval versus one that doesn't, which is almost exactly what this whole series is demonstrating. (The components themselves — Neon, Postgres, the migration tools, pgvector — get the full what-and-why treatment in Under the Hood this week, if you want to go a level deeper.)
So the question wasn't "where do I put this." It was something bigger: is this build a separate thing, or is it the prototype for how all of my catalogs should eventually work? The LLM catalog, the retrieval lab, the visitor chat that will someday want a real knowledge base — they're all variations on the same shape this Milwaukee inventory is taking. I'm not answering that yet. I built the inventory in its own clean, separate Neon project for now, which keeps the work isolated and keeps every door open. But I'm flagging it out loud, because it's the kind of realization that only shows up once you stop drawing and start building.
Next week is the one I've been pointing at for a while: getting the data. The structure is built. Now comes the hard, honest part — filling it with real Milwaukee tech organizations, events, and people, with sources attached and the right level of verification for each. That's where this stops being architecture and starts being an answer to the question that started it all.
Last week's Under the Hood described a stack I was going to use: Postgres running in a Docker container on my machine, verified with a tool called DBeaver. When I actually sat down to build it, I used neither. That's not a reversal — it's what happens when an abstract plan meets a real environment, and the reasoning behind each cut is worth more than the original plan was. So here's the stack I actually built on, what each piece is, and why two of last week's pieces turned out to be solving problems I don't have.
Postgres, and where it actually lives
The foundation is PostgreSQL — the relational database the whole design assumes. That part didn't change. What changed is where it runs, and there's a common misconception buried in here worth clearing up. I'd been thinking of the database as "hosted on Vercel," because hardais.com runs on Vercel. That's not how it works. Vercel hosts the website — the front end and the small server functions behind it — but Vercel does not run a database. Instead, the site connects to a database that lives somewhere else, over a connection string. That somewhere else, for me, is Neon: a managed Postgres provider built to pair with exactly this kind of setup. Managed simply means I don't run or patch the database server myself — Neon does — and I get a connection string the site (and my build) point at. Same Postgres, just run for me instead of by me.
Why I dropped Docker
Docker is a tool for packaging software — here, a Postgres database plus its exact version and settings — into a self-contained "container" that runs identically on any machine. Last week I planned to run Postgres in a Docker container locally. The reason I dropped it is clarifying: Docker's whole value is making a local environment reproducible and portable. But I'm building directly against Neon, where the database already exists and is already reproducible — that's what managed hosting is. There's no local environment for Docker to package, so Docker would be ceremony, not substance. And if I ever did want to work locally, I already have Postgres installed on my machine; I'd build there and copy the result up to Neon, still never needing the container layer. Docker is a genuinely good tool. It was just answering a question my setup doesn't ask.
How the schema actually got built: ORMs and migrations
The design lived in a .dbml file — the text version of last week's diagram. Turning that into real tables is
the job of two tools that work together. The first is SQLAlchemy, an ORM —
Object-Relational Mapper. An ORM lets me describe database tables as objects in Python code instead of writing raw SQL by hand;
it's the translation layer between "code I write" and "tables the database understands." The second is Alembic,
which handles migrations. A migration is a versioned, recorded change to the database structure — "add this
table, this column, this relationship" — saved as a file you can review, replay, and roll back. The principle is the one
that runs through this whole project: schema changes deserve version control just like code does. You don't quietly edit a
database and hope you remember what you did. You write the change down, run it, and have a record. That's how the six entities and
their junctions went from a diagram into a structure that exists.
Why I dropped DBeaver too
DBeaver is a database tool that can, among other things, render an entity-relationship diagram from a live database, and last week I'd planned to use it to verify the schema. Dropping it was an honest redundant-tool call. I'd already built and rendered the ERD in dbdiagram.io, which served that purpose, and the migration running cleanly is what actually confirms the tables match the design. DBeaver may be a bit more robust, but it would just produce a second picture of something I'd already verified — so it's no longer needed.
The piece I deliberately did not build: pgvector
pgvector is a Postgres extension that lets the database store and search embeddings — the numerical representations of text that make semantic, meaning-based retrieval (the "RAG" in retrieval-augmented generation) possible. It is going to matter. Not someday in the abstract: three things on my site already point straight at it — the LLM catalog, the retrieval lab that demonstrates RAG-versus-no-RAG, and the visitor chat that will eventually want a real knowledge base — and the Milwaukee inventory will want it too. So why didn't I build it now? Because there is nothing to embed yet. Embeddings are for text that exists, and right now the tables are empty. Standing up vector search this week would be building a tool with nothing to point it at.
This is what forethought actually looks like, and it's the reason the foundation choices matter. pgvector isn't a separate system
I'd have to bolt on later — it's an extension you enable with a single command, CREATE EXTENSION vector;, on
the Postgres database I already have. Choosing managed Postgres on Neon now means the vector piece is one migration away when
there's finally data to embed, not a re-platform. I'm not provisioning it prematurely. I'm provisioning the ground it will
stand on, and leaving the rest for the week it's actually needed — which, fittingly, is the week the data starts
arriving.
A while back, a talk at a local meetup introduced me to OpenClaw — a self-hosted platform for running an AI agent you can message from chat apps. It was a genuinely useful introduction, and it did what a good introduction does: it left me with questions. Specifically, the questions I'd want answered before putting something like this anywhere near real work. So I went and researched them. This is the conversation that came out of it — the due-diligence layer an introduction isn't responsible for providing.
Q: OpenClaw is "self-hosted." Doesn't that mean my data stays private — nothing leaves my machine?
A: This was the first thing I had to get precise about, because "self-hosted" is true but narrower than it sounds. What's self-hosted is the orchestration — the gateway process, the routing, the session history, the tool execution. That all runs on your hardware. But the thinking happens wherever the model lives. If you point OpenClaw at a frontier model like GPT or Claude, every prompt still gets sent to that provider's servers — and in an agent setup, "the prompt" isn't just your message. It's your message plus whatever the agent's tools pulled in to answer it: files it read, command output it captured, pages it fetched. So the data crossing the boundary can actually be larger than a normal chatbot query. Self-hosting the gateway doesn't change that one bit. The only configuration where data genuinely never leaves is when the model itself runs locally too. "Self-hosted gateway" and "private inference" are two different claims, and only the second one is about where your data goes.
Q: Fine — but I can lock it down so only I can message it. Doesn't that close the security problem?
A: It closes one door, and it's an important one. You can absolutely configure OpenClaw so only you can send it messages — allowlists, pairing for unknown senders, all of it. That stops a stranger from instructing the agent directly. But here's the distinction that took me a minute to see clearly: controlling who can message it is not the same as controlling what it reads while working. The moment you ask it to do something useful — "summarize this thread," "check this repo's issues," "read this page" — the agent ingests content written by people who aren't on your allowlist. And if that content contains hidden instructions, the agent encounters them regardless of how tightly you locked the front door. This is prompt injection, and the unsettling part is it doesn't need a malicious sender. It rides in on the very material you asked the agent to look at. The allowlist guards the request. It does nothing to guard the material the request pulls in.
Q: Isn't this the same risk as any AI coding tool? Why single out OpenClaw?
A: The injection vector is the same — any agent that reads outside content can be fed a hidden instruction, and that includes the everyday tools many of us already use. So no, OpenClaw isn't uniquely dangerous in kind. Where it differs is degree, along three axes. First, attendance: an interactive tool acts while you're watching and you can stop a weird step; OpenClaw is designed to run unattended, on a schedule, so the same bad step can land at 3 a.m. with no one in the loop. Second, the inbound channel: connecting it to messaging apps expands who could put content in front of it from "things I chose to open" to "anyone who can reach the channel." Third, standing access: it's a persistent process holding standing credentials, not a session that ends when you close the laptop. None of these is a flaw to patch.
Q: So what's the actual takeaway — is it safe to use or not?
A: The takeaway is the thing that reframed it for me, and it's not "safe" or "unsafe." It's that the features being sold are the same features that carry the risk. Unattended operation, inbound chat channels, standing autonomous access — those three things are exactly why OpenClaw is appealing, and they're exactly the three things that turn an ordinary injection risk into a serious one. You can't keep the appeal and engineer the risk away separately, because they're the same property. That doesn't mean don't use it. It means the right question isn't "is it secure" — it's "what is the worst thing this agent could do if it were turned against me, and have I made that worst case survivable?" The defenses that actually matter follow from that: run the model locally if your data can't leave, give the agent its own scoped credentials rather than your real ones, keep it away from anything you can't afford to lose, and treat everything it reads as potentially hostile. The platform can be configured responsibly. But it's the kind of tool where you have to decide the posture deliberately, because the convenient default and the safe default are not the same default.
None of this is a knock on OpenClaw or on the talk that pointed me toward it — both did exactly what they were supposed to. The lesson I came away with is bigger than one platform: as agents move from "tool you invoke" to "service that runs," the interesting questions stop being about capability and start being about blast radius. What can it reach, who can steer it, and what happens if it's wrong. Those are the questions worth asking before you adopt any of this — and they're the ones an enthusiastic introduction will rarely answer for you.
“There is no doubt that the best technique is the one which is not noticeable.”
— Satyajit Ray — He was an Indian filmmaker, screenwriter, and author widely regarded as one of the greatest auteurs in cinema history. He received an Academy Honorary Award in 1992 for his profound influence on the art of motion pictures. His work often focused on the humanistic details of daily life and utilized a masterful economy of storytelling.