7/11/26

AIStorming: A Process-Oriented Framework for AI-Assisted Discovery, Analysis, and Specification

Oscar Garcia @ozkary 7/11/2026 No comments

Overview

As the software development lifecycle shifts toward autonomous vibe coding, rapid AI prototyping, and Software Design Description (SDD) driven workflows, engineering teams face a critical structural bottleneck: the discovery and analysis gap. Generative AI tools excel at emitting syntactically valid code, yet they consistently fail when handed vague, ungrounded, or fragmented domain specifications.

To bridge the gap between initial domain exploration and deterministic engineering execution, I introduce AIStorming—a process-oriented framework born out of my engineering practice and foundational work in enterprise data pipelines and cloud-native application architectures.

Throughout my career designing high-scale cloud-native solutions, event-driven microservices, and modern data platforms, domain research, requirement validation, and exploratory design depended heavily on manual web searches, static documentation reviews, and disconnected whiteboarding sessions. Over years of hands-on practice, my workflow naturally evolved from manual research into AI-driven, interactive, and code-centric discovery. By leveraging AI coding assistants directly inside the development workspace to profile sample datasets, stress-test business rules, probe API interfaces, and validate system constraints in real time, I transformed traditional brainstorming into an active, deterministic engineering phase: AIStorming.

AIStorming formalizes the process of using AI coding tools (e.g., GitHub Copilot, Google Antigravity) to conduct real-time domain discovery, exploratory analysis, and system boundary mapping across both cloud-native software and data engineering domains. Rather than relying on abstract, non-executable IT discovery frameworks, AIStorming anchors domain analysis directly in executable code, version control, and data validation. This yields structured problem statements, refined use cases, quantifiable technical requirements, and machine-readable specifications optimized for vibe coding, cloud-native design, and automated SDD execution.

AIStorming: A Process-Oriented Framework for AI-Assisted Discovery, Analysis, and Specification

🚀 Featured Open Source Projects

Explore these curated resources to level up your engineering skills. If you find them helpful, a ⭐️ is much appreciated!

🏗️ Data Engineering

Focus: Real-world ETL & MTA Turnstile Data

🤖 Artificial Intelligence

Focus: LLM Patterns and Agentic Workflows

📉 Machine Learning

Focus: Introduction to machine learning

💡 Contribute: Found a bug or have a suggestion? Open an issue! and be part of the open source project.

1. The Core Engineering Challenge: The Discovery-to-Implementation Gap

In traditional enterprise software engineering, product discovery is often disconnected from actual system implementation. Product managers, business analysts, and architects spend weeks producing static requirements documents, wireframes, or text-heavy stories. When these artifacts are passed down to software engineers—or fed as prompt context into AI coding tools—the implementation breaks down due to implicit assumptions, unverified data schemas, unexpected edge cases, and missing operational constraints.

In an era dominated by AI-assisted synthesis, garbage context in results in garbage execution out.

+-----------------------------------------------------------------------------------+
|                            TRADITIONAL DISCOVERY GAP                              |
|                                                                                   |
|  [ Domain & Business ] ---> ( Static Docs / Whiteboards ) ---> [ AI Assistant ]   |
|  Ideation & Concepts          Unverified Context                 Hallucinated     |
|                                                                  Implementation   |
+-----------------------------------------------------------------------------------+
                                        VS.
+-----------------------------------------------------------------------------------+
|                             AISTORMING FRAMEWORK                                  |
|                                                                                   |
|  [ Domain Input ]  ---> { AIStorming: Code-Centric Analysis } ---> [ SDD & Vibe ] |
|  Raw Data / Rules /     Exploratory Scripts, Profiling,         Deterministic     |
|  API Contracts          & Requirement Synthesis in IDE             Builds         |
+-----------------------------------------------------------------------------------+

To leverage AI coding assistants safely and at enterprise quality, the Discovery Phase must be reinvented. It can no longer be a passive ideation exercise; it must become an active, code-centric discovery engine that validates domain facts, data structures, and service behaviors before full-scale software architecture and implementation begin.

2. What is AIStorming?

AIStorming is the process-oriented methodology of conducting domain discovery, exploratory data/logic analysis, and system boundary definition by pairing human engineering expertise with AI coding tools inside an Integrated Development Environment (IDE).

The objective of AIStorming is not to generate production application code or deploy live systems immediately. Instead, its primary output is a structured set of verified, code-grounded engineering artifacts:

Grounded Problem Statements: Unambiguous scope boundaries, operational objectives, and domain invariants.
Exploratory Data & Logic Findings: Code-verified insights into payload structures, edge cases, integration contracts, API behaviors, and processing constraints.
Formal Functional & Technical Requirements: Explicit inputs, state behaviors, domain events, security boundaries, and non-functional targets (throughput, latency, reliability).
Machine-Readable Specifications (SDD): Domain models, schemas, contract interfaces, and prompt-context files tailored for downstream vibe coding, agentic orchestration, and automated pipeline execution.

3. Foundational Principles of AIStorming

Derived from the universal tenets of my Data Engineering Process Fundamentals (DEP) and expanded across cloud-native software architecture, AIStorming adapts proven discovery mechanics to AI-driven developer tooling.

I. The Code-Centric Paradigm

Traditional discovery relies on prose descriptions that fail upon first contact with compiler logic, schema validation, or streaming engines. AIStorming dictates that discovery must be code-centric from day one.

Exploratory data analysis (EDA), API probing, schema validation, and domain logic simulations are written as executable scripts (Python, TypeScript, SQL, Go, Jupyter Notebooks) inside the IDE workspace.
AI tools are forced to interact with concrete execution outputs, stack traces, and SDK responses rather than abstract concepts.

II. Dataset & Domain Grounding

AI models hallucinate when operating in an isolated context vacuum. AIStorming requires immediate grounding against real-world sample datasets, schema definitions, domain models, and API interfaces.

Evaluates data quality, structural variance, integration frequencies (batch vs. streaming event-driven), and service boundaries.
Uses AI assistants to parse, profile, and transform sample payloads and domain entities live during the discovery window.

III. Source Control and Auditable Iteration

Prompt histories floating in web interfaces are disposable, non-reproducible, and unmaintainable. AIStorming enforces that every discovery artifact—exploratory notebooks, schema models, interface definitions, and context files—is checked directly into a Git repository (e.g., GitHub).

Enables collaboration between software engineers, data architects, and AI agents.
Maintains a versioned, auditable history of how business domain rules evolved into formal technical specifications.

IV. Bridge to Vibe Coding and SDD

AIStorming serves as the explicit precursor to Vibe Coding (rapid, intent-driven application development using AI models) and Software Design Description (SDD) generation. By translating loose ideas into code-verified constraints during AIStorming, downstream AI models receive high-fidelity, hallucination-free prompts during full-scale development.

4. The 4 Universal Steps of the AIStorming Process

Whether applied to cloud-native microservices, event-driven streaming applications, serverless architectures, or modern enterprise data platforms, the AIStorming framework executes across four process-oriented phases.

+-------------------+      +-------------------+      +-------------------+      +-------------------+
|  STEP 1:          | ---> |  STEP 2:          | ---> |  STEP 3:          | ---> |  STEP 4:          |
|  Problem          |      |  Exploratory      |      |  Use Case &       |      |  SDD & Vibe       |
|  Framing &        |      |  Data & Domain    |      |  Requirement      |      |  Specification    |
|  Context Prime    |      |  Analysis         |      |  Synthesis        |      |  Output           |
+-------------------+      +-------------------+      +-------------------+      +-------------------+

Step 1: Problem Framing & Context Priming

Objective: Establish domain boundaries and prime the AI workspace with domain context.
Process:
1. Initialize a dedicated discovery branch in the code repository.
2. Create a core workspace context file (CONTEXT.md or system prompt boundaries) containing raw business objectives, domain rules, SLA requirements, and target application guidelines.
3. Engage the AI coding assistant to challenge the problem scope, identifying missing assumptions, unstated edge cases, or domain contradictions.

Step 2: Exploratory Data & Domain Logic Analysis

Objective: Verify domain facts and data behaviors using runnable code inside the IDE.
Process:
1. Load representative data samples, schema models, domain event payloads, or third-party API contracts into VS Code or Jupyter Notebooks.
2. Pair with the AI assistant to write exploratory scripts using standard manipulation libraries, data frames, or contract interfaces (Pandas, Pydantic, Zod, Spark/PySpark, SQL).
3. Perform data profiling, test payload transformations, inspect edge cases, validate event structures, and evaluate performance/throughput assumptions.
4. Commit all exploratory scripts, test executions, and output traces to Git.

Step 3: Use Case & Requirement Synthesis

Objective: Extract structured engineering requirements from discovery findings.
Process:
1. Prompt the AI assistant to analyze the commit history, exploratory scripts, and execution outputs generated in Step 2.
2. Synthesize findings into formalized Use Cases (actor/system interactions, event triggers, preconditions, happy path flows, failover behaviors).
3. Categorize non-negotiable Technical Requirements:
  - Data Integration & Schema Transformation rules
  - Cloud-Native Application Patterns (Event-driven, REST/GraphQL APIs, Pub/Sub boundaries)
  - Performance, Scalability & Latency SLA targets
  - Security, Identity, Governance, and Compliance boundaries

Step 4: SDD & Vibe Coding Specification Output

Objective: Produce machine-readable software specifications for autonomous AI execution.
Process:
1. Compile discovery outputs into a standardized Software Design Description (SPEC.md / ARCHITECTURE.md).
2. Generate baseline interface contracts, schemas, and domain types (TypeScript interfaces, OpenAPI specs, Pydantic data models, Avro/Protobuf schemas, ERD models).
3. Define precise context prompts and agent instructions to drive subsequent implementation phases (Vibe Coding, Automated Test Generation, and CI/CD development).

5. Architectural Deliverables Matrix

To maintain enterprise quality, an AIStorming session must terminate in a concrete set of repository artifacts:

Deliverable Phase	File Artifact	Content & Purpose	Target Consumer
Problem Definition	`PROBLEM.md`	Domain boundaries, business goals, explicit out-of-scope declarations	Lead Engineers & Architects
Exploratory Analysis	`analysis/.ipynb`, `scripts/`	Executable data profiling, edge-case tests, API contract probes	Engineering Team & AI Agents
Use Cases	`USE_CASES.md`	System interactions, domain event triggers, failure modes, retry logic	Product Owners & Testing Engines
Technical Specs	`REQUIREMENTS.md`, `SPEC.md`	Data models, storage schemas, security boundaries, SLA performance targets	Vibe Coding AI Tools (Cursor, Copilot, Claude)
Architecture Contract	`ARCHITECTURE.md`, `schemas/*`	System topology, ERD models, OpenAPI/AsyncAPI specs, CI/CD pipeline rules	SDD Generators & Developers

6. Practical Application: Enterprise Cloud-Native & Data Platform Build

To demonstrate the versatility of AIStorming across cloud-native microservices and high-scale data platforms, consider the execution path of a modern enterprise solution:

The Human Intent: An architect needs to design a high-throughput, event-driven data ingestion platform that processes live streaming payloads into an analytical data lake.
The AIStorming Phase:
- Instead of asking an AI tool to "write a real-time data ingestion application," the team opens the IDE and loads representative stream logs and target schemas.
- The architect uses AIStorming to draft and run exploratory Python/PySpark scripts inside the IDE, testing deserialization speed, schema drift, and payload validation rules.
- The AI assistant identifies that 6% of incoming event payloads contain missing nested timestamp attributes and schema variations that would break downstream parquet writes.
Requirement Synthesis: The team updates REQUIREMENTS.md via AIStorming to explicitly mandate upstream payload sanitization, dead-letter routing (DLQ), and a strict contract enforcement layer.
SDD & Vibe Implementation: The resulting SPEC.md, Pydantic models, and OpenAPI/AsyncAPI specs are fed into coding agents. The agents write production-grade microservices and pipeline code on the first pass because the domain edge cases were caught during AIStorming.

7. Conclusion: AIStorming as an Enterprise Engineering Standard

As artificial intelligence shifts software and data engineering from manual syntax writing to high-level system orchestration, the role of the engineer evolves from code writer to system architect and discovery strategist.

AIStorming bridges the foundational gap between raw domain ideas and deterministic execution. By marrying the process-oriented discipline of Data Engineering Process Fundamentals with cloud-native software architecture patterns and modern AI coding tools, AIStorming transforms discovery from a passive, unverified discussion into a repeatable, code-centric, and auditable engineering standard.

By adopting AIStorming as a formal phase prior to vibe coding and SDD execution, engineering organizations can eliminate AI hallucination risks, enforce enterprise domain integrity, and accelerate software delivery with complete architectural control.

References & Foundational Frameworks

Garcia, Oscar D. (Ozkary). "From Raw Data to Roadmap: The Discovery Phase in Data Engineering Process Fundamentals." Ozkary Technologies
Garcia, Oscar D. (Ozkary). "Data Engineering Process Fundamentals - Design and Planning."
Garcia, Oscar D. (Ozkary). "Architecting an Agentic Data Pipeline - From Data Lake Discovery to Managed Orchestration."

🌟 Let's Connect & Build Together

Thanks for reading! 😊 If you enjoyed these resources, let's stay in touch! I share deep-dives into AI/ML patterns and host community events here:

GDG Broward: Join our local dev community for meetups and workshops.
Global AI Events: Join Global AI Events.
LinkedIn: Let's connect professionally! I share insights on engineering.
GitHub: Follow my open-source journey and star the repos you find useful.
YouTube: Watch step-by-step tutorials on the projects listed above.
BlueSky / X / Twitter: Daily tech updates and quick engineering tips.

👉 *Originally published at ozkary.com*

6/24/26

Building Reusable & Extendable Agents with the Google ADK

Oscar Garcia @ozkary 6/24/2026 ai-agents , ai-governance , data , devops , google-adk , python , youtube No comments

Overview

The goal of this presentation is to introduce the audience to an agentic SDK—specifically the Google Agent Development Kit (ADK), while addressing a critical trap in modern AI engineering. It is incredibly easy to fall into the habit of building AI agents using basic procedural code or siloed Jupyter Notebooks. While these approaches work for initial validation, they fail to scale in an enterprise environment.

Instead, this session demonstrates how to leverage robust software design patterns and foundational architectural principles. By building an abstraction layer over the SDK, we can create a mature, enterprise-ready library. This approach allows us to decouple our core business logic from third-party frameworks, making our codebase completely agnostic to any single SDK and giving us the flexibility to swap underlying tools as the AI ecosystem evolves.

Follow the next sections for the main points of the presentation, and then take a look at the video presentation to dive deeper into the concepts.

Building Reusable & Extendable Agents with the Google ADK

Presentation Summary

Discover how to transition from building monolithic, single-prompt chatbots to designing highly modular, scalable, and extendable enterprise agents using the Google Agent Development Kit (ADK). This presentation provides a hands-on architectural deep dive into building process-oriented workflows, implementing the Model Context Protocol (MCP) for cloud data platform integrations, and utilizing automated DevOps tooling to eliminate technical debt in your AI engineering pipelines.

The Monolithic Prompt & SDK Trap

When developers start building AI agents, the initial momentum is almost always driven by quick prototyping. You pull down a hot new SDK, run a quick pip install, and start hardcoding prompts directly into your files.

While this works for a weekend hobby project, it quickly collapses under its own weight in an enterprise ecosystem. You end up with multiple developers writing siloed, inconsistent code, duplicating core tasks like error handling, and introducing massive technical debt. Even worse, your entire system becomes tightly coupled to a single third-party framework. If you ever need to pivot or replace that framework, you are looking at a complete rewrite.

Production-grade engineering requires moving away from spaghetti code toward process-oriented, SDK-agnostic architecture.

Architectural Blueprint: Layered Agent Design

To achieve true reusability and governance, we must build a core architectural foundation that abstracts third-party dependencies away. Instead of letting an external SDK dictate our application structure, we stack agents in specialized layers via inheritance.

Building Reusable & Extendable Agents with the Google ADK - Architecture

1. The Base Agent (The Foundation)

The BaseAgent is an abstract base class responsible for handling cross-cutting concerns that every enterprise agent needs:

Consistent logging structures.
Centralized security and exception handling.
Standardized interface definitions.

By encapsulating these inside a base layer, any new agent you spin up automatically inherits these core enterprise features.

2. The Basic Agent (Configuration over Hardcoding)

The BasicAgent extends the base layer to introduce configuration management. To keep our code robust and maintainable, prompts should never be hardcoded.

Instead, the BasicAgent pulls details—like the target Gemini model or project IDs—from environment variables. Simple instruction hooks can be fed via configurations, allowing your DevOps pipeline to deploy behavior updates or model rollouts without requiring a single line of code to change.

3. The Tool Agent (Advanced Governance & MCP)

The ToolAgent introduces external capabilities through the Model Context Protocol (MCP). For complex enterprise needs, configuration files aren’t enough. The ToolAgent uses file pointers to read advanced Markdown documents detailing strict system instructions, safety limitations, data boundaries, and governance rules.

Extending Capability with Custom & Native MCP Tools

An agent on its own is just an engine that knows how to talk to a Large Language Model. To make it useful, it needs a way to interact with the outside world. This presentation highlighted two separate paradigms for handling tools:

Abstracting Built-in SDK Tools

The Google ADK provides out-of-the-box tools for major platforms like BigQuery. However, to maintain code isolation, we shouldn’t map those tools blindly. In the demo code, we extended the native tool using a custom authorization class (AuthorizationContext). This separation of concerns ensures that token generation, credential refreshing, and cloud authentication happen entirely independent of the agent’s reasoning loop.

Building Custom MCP Tools from Scratch

When an SDK lacks a tool for your specific business requirements—such as interacting with a custom file bucket or specific database—you can build your own using frameworks like FastMCP. The presentation demonstrated a custom Google Cloud Storage (GCS) tool capable of:

Listing target bucket contents.
Generating compressed file previews.
Executing self-diagnostic checks to validate connections before a workflow begins.

The Unsung Hero: The Agent Runner Runtime

One of the least understood components of agentic design is the Agent Runner. While the Google ADK provides an excellent local web-based playground that automatically bootstraps your environment for rapid testing, production environments require you to explicitly script this runtime pipeline.

The Agent Runner acts as the orchestrator of your system, managing three vital elements:

Orchestration: Connecting multi-agent workflows (e.g., passing tasks seamlessly between a storage agent and a BigQuery data agent).
Session Management: Directing the active state of an execution path.
Memory Management: Maintaining persistence. While short-term tasks can run on fast, volatile in-memory sessions, complex industrial or manufacturing pipelines require long-term history to monitor trends, catch system drift, and diagnose process failures over time. The presentation demonstrated wiring an isolated SQLite engine into the runner to handle this tracking cleanly.

Modern DevOps for AI: UV and Makefiles

Enterprise code demands automated quality gates. Rather than relying on standard global package structures, the project repository leverages modern Python tooling to accelerate developer onboarding:

UV (Virtual Environment Manager): A lightning-fast, modern alternative to legacy virtualenv tools. Using a strict uv.lock file ensures every developer on your team runs identical dependency versions, entirely eradicating the “it works on my machine” problem.
Makefiles as CI/CD Blueprints: Instead of manually typing tedious execution commands, a standard Makefile orchestrates development tasks. Running make lint catches configuration errors and unmapped dependencies (like an imported but unused session config) before the code ever reaches a code review, while commands like make run-tool smoothly handle localized testing.

Key Takeaways for Enterprise Developers

Isolate the SDK: Treat third-party agent frameworks as pluggable libraries, not foundational pillars. Abstract them behind abstract base classes so you can swap architectures with minimal friction.
Configuration Wins Over Code: Keep your agent identities, target models, and governance logic inside Markdown and environment variables. Let your DevOps pipelines drive system behavior.
Rely on Runtimes for Memory: Keep your agents lean. Let specialized agent runners handle the operational state, session history, and database logging.

Resources & Next Steps

Get the Code: Explore the foundational structures, base classes, and custom MCP tool definitions by visiting the official GitHub Repository. (Don’t forget to star the repo if you find the patterns helpful!)
Dive Deeper into Engineering Processes: For a comprehensive guide on building scalable, process-oriented architectural systems, check out my book, “Data Engineering Process Fundamentals”.

🤖 Artificial Intelligence

Focus: LLM Patterns and Agentic Workflows

🚀 Featured Open Source Projects

Explore these curated resources to level up your engineering skills. If you find them helpful, a ⭐️ is much appreciated!

🏗️ Data Engineering

Focus: Real-world ETL & MTA Turnstile Data

📉 Machine Learning

Focus: MLOps and Productionizing Models

💡 Contribute: Found a bug or have a suggestion? Open an issue! and be part of the open source project.

YouTube Video

Take a look at this video and learn about building and testing agents using the Google Agent Development Kit (ADK) by leveraging its CLI and web tool. We’ll start from the absolute basics, learning how to build a simple agent and test it instantly.

From there, we will move on to extend our agents by building custom Model Context Protocol (MCP) tools. Throughout this session, we will focus on staying away from hardcoded prompts. Instead, you’ll learn how to leverage clean software design patterns to build truly reusable, extendable agents that can adapt dynamically with the use of configurable prompts and plug-and-play MCP tools, shifting your development loop from a one-off script or simple notebook into a scalable, production library.

👍 Subscribe to the channel to get notify on new events!

🌟 Let’s Connect & Build Together

Thanks for reading! 😊 If you enjoyed these resources, let’s stay in touch! I share deep-dives into AI/ML patterns and host community events here:

GDG Broward: Join our local dev community for meetups and workshops.
Global AI Events: Join Global AI Events.
LinkedIn: Let’s connect professionally! I share insights on engineering.
GitHub: Follow my open-source journey and star the repos you find useful.
YouTube: Watch step-by-step tutorials on the projects listed above.
BlueSky / X / Twitter: Daily tech updates and quick engineering tips.

👉 Originally published at ozkary.com

4/30/26

From Passive Dashboards to Active Agents: Real-Time Reasoning over Data Streams

Oscar Garcia @ozkary 4/30/2026 ai-agents , angularjs , javascript , nodejs , typescript , websockets No comments

Overview

Dashboards are effective at showing us that something is breaking, but they usually rely on a human to watch the screen and decide what to do next. In this session, we will look at how to take a standard real-time telemetry dashboard and make it autonomous.

We will walk through a practical implementation using an Angular frontend and a Node.js server. We’ll look at how the system leverages a relational database for persistence and a Redis in-memory cache to handle high-frequency and short volume data feeds. From there, we incorporate an AI Agent that follows three core principles: perceiving the data stream through a sliding window, reasoning against statistical control limits, and acting by sending real-time analysis back to the user. This session is focused on bridging the gap between raw data streams and automated decision-making.

From Passive Dashboards to Active Agents: Real-Time Reasoning over Data Streams

🚀 Featured Open Source Projects

Explore these curated resources to level up your engineering skills. If you find them helpful, a ⭐️ is much appreciated!

🏗️ Data Engineering

Focus: Real-world ETL & MTA Turnstile Data

🤖 Artificial Intelligence

Focus: LLM Patterns and Agentic Workflows

📉 Machine Learning

Focus: Introduction to machine learning

💡 Contribute: Found a bug or have a suggestion? Open an issue! and be part of the open source project.

🔗 Review the repo used for this presentation:

Tech Stack

YouTube Video

👍 Subscribe to the channel to get notify on new events!

📅 Agenda

The Real-Time Feed: Monitoring Device Telemetry

An introduction to the live system where devices emit high-frequency data for quality monitoring and 3-sigma control limit oversight.

System Architecture: From Ingestion to Persistence

A deep dive into the technical stack, mapping out the data journey through Node.js, Redis in-memory caching, and relational database storage.

The Human in the Loop: Cognitive Limitations

Exploring the “Passive Monitoring” challenge—why relying on human interpretation of real-time alerts creates a bottleneck in process control.

The AI Agent: Applying the 3 Principles

Implementing the “Active Observer” using the three core pillars of AI Agents: Perception (the window), Reasoning (the limits), and Action (the analysis).

The Intelligent Journey: Summary & Advantages

Recapping our transition from a passive monitor to a smart system and discussing the advantages of automated, agentic data interpretation.

⭐ Why Attend?

The industry is moving beyond simple “Chat” interfaces and into the realm of Agentic Observability. By attending this session, you will see a practical blueprint for integrating intelligence directly into a high-velocity data stack.

👥 Who Is This For?

Junior Developers: Learn the fundamentals of real-time streaming, how to manage state in Node.js, and how to interact with AI APIs in a professional environment.
Senior Engineers & Architects: See a robust architectural pattern for integrating Redis, relational databases, and AI Agents while maintaining system stability and security.
Decision Makers (VPs/Directors): Understand the ROI of “Active Monitoring”—how AI Agents can act as a force multiplier for your engineering teams by automating the first layer of data interpretation.
Quality & Reliability Engineers: Explore how to digitize the control limit logic you already use into an autonomous 24/7 “Digital Twin.”

Presentation

The Real-Time Feed Challenge

High-velocity telemetry defines the “digital pulse” of modern industrial systems.

Telemetry Streams: Industrial devices emit continuous telemetry (Temperature, Sound, Humidity). These data points require sub-second processing to maintain operational stability.
Control Limits: Quality engineers rely on statistical boundaries to define “normal” operation. Detecting drift is the critical first step in identifying risks before failure occurs.

Scalable System Architecture

A multi-layered stack designed to bridge the gap between ingestion and intelligence.

Redis Cache: Manages the “Live State” or Digital Twin. Provides sub-millisecond access for immediate AI perception.
Relational Data Warehouse: Powers the historical persistence layer for long-term trend analysis and compliance auditing.
Node Controller: The orchestration hub. Manages telemetry ingestion, state updates, and the execution of the agentic reasoning loop.

The “Human in the Loop” Trap

Passive observability relies on human interpretation, creating a critical bottleneck.

Cognitive Overload: Humans struggle to interpret hundreds of concurrent streams, missing subtle patterns.
Alert Fatigue: Constant threshold violations desensitize responders, causing critical 3σ violations to be ignored.
Response Latency: The time required for a human to interpret a dashboard often exceeds the window for effective corrective action.

The 3 Principles of AI Agents

Automating observability requires a system that can perceive, reason, and act independently.

Perception: Maintaining a stateful sliding window of the Digital Twin to understand temporal context.
Reasoning: Evaluating the stream against statistical 3σ limits and physical engineering constraints.
Action: Closing the loop by emitting real-time narratives or triggering autonomous safety protocols.

The Intelligent Journey: Comparison

Feature	Passive Monitor	Smart Monitor
Interpretation	Requires human "eyes-on-glass"	Autonomous, semantic interpretation
Response	Reactive to simple thresholds	Proactive identification of drift
Reliability	High risk of missed signals	Intelligent filtering of noise
Data Context	Disconnected historical vs. live	Stateful perception of "Live Device"
Intervention	Significant human latency	Automated safety & audit narratives

🌟 Let’s Connect & Build Together

Thanks for reading! 😊 If you enjoyed these resources, let’s stay in touch! I share deep-dives into AI/ML patterns and host community events here:

GDG Broward: Join our local dev community for meetups and workshops.
Global AI Events: Join Global AI Events.
LinkedIn: Let’s connect professionally! I share insights on engineering.
GitHub: Follow my open-source journey and star the repos you find useful.
YouTube: Watch step-by-step tutorials on the projects listed above.
BlueSky / X / Twitter: Daily tech updates and quick engineering tips.

👉 Originally published at ozkary.com

3/31/26

Architecting an Agentic Data Pipeline - From Data Lake Discovery to Managed Orchestration

Oscar Garcia @ozkary 3/31/2026 AI , data , data pipelines , data lake , data warehouse , python No comments

Overview

This session explores the strategy of leveraging AI to move beyond manual implementation and into the next level of data engineering. We dive into a process that positions the AI not as a syntax generator, but as a cognitive partner in the engineering lifecycle. We will examine the architectural shift required to transform raw data lake assets into high-performance, orchestrated systems, focusing on the strategic collaboration between human intent and agentic design.

Architecting an Agentic Data Pipeline - From Data Lake Discovery to Managed Orchestration

🚀 Featured Open Source Projects

Explore these curated resources to level up your engineering skills. If you find them helpful, a ⭐️ is much appreciated!

🏗️ Data Engineering

Focus: Real-world ETL & MTA Turnstile Data

🤖 Artificial Intelligence

Focus: LLM Patterns and Agentic Workflows

📉 Machine Learning

Focus: MLOps and Productionizing Models

💡 Contribute: Found a bug or have a suggestion? Open an issue! and be part of the open source project.

🔗 Related Repository: AI Agents for Data Engineering

Explore the full implementation of the AI Agents used in this workflow:
https://github.com/ozkary/data-engineering-mta-turnstile/tree/main/ai-agents

YouTube Video

👍 Subscribe to the channel to get notify on new events!

📅 Agenda

Data Lake Discovery: The strategy of deploying discovery agents to autonomously identify patterns and define the foundation of the data grain.
Governance & Requirements: Establishing the strategic guardrails and requirements that empower an "Architect" agent to maintain system consistency.
Logical Design for the Staging Area: A process dive into using AI to propose and build a logical abstraction layer, separating raw sources from core business logic.
Designing and Implementing the Physical Model: How agents navigate the transition to physical storage, building Dimension and Fact tables while maintaining referential integrity.
Incremental Update Strategy: Developing a sustainable approach to support continuous data feeds from the data lake using idempotent, self-healing processes.
Pipeline Design and Orchestration: The coordination of complex tasks to manage the relationship between dimensions and facts, ensuring strict lineage and integrated observability.

⭐ Why Attend?

Elevate Your Role: Learn how to shift your focus from writing repetitive code to defining high-level architectural intent and performing strategic design reviews.
Master Systemic Reasoning: Understand how to leverage AI to solve complex engineering challenges like referential integrity and dependency management at scale.
Build for Operations: Move toward a model where system health and observability are built-in byproducts of the design process, not afterthoughts.

👥 Who Is This For?

Data Engineers & Architects: Looking to evolve their workflow from manual scripting to high-level systemic design.
Engineering Leaders: Interested in the ROI and reliability of integrating autonomous agents into the development lifecycle.
AI Enthusiasts: Wanting to see a practical, "beyond-the-chatbot" application of agentic reasoning in a production environment.
Technical Decision Makers: Seeking a strategy for maintaining governance and referential integrity in an AI-augmented organization.

Presentation

Automating the Data Engineering Lifecycle

We are running a modern Data Engineering process by combining the reasoning power of AI Agents with the standardized connectivity of MCP tools.

Goal: Move from manual scripting to an intelligent, agent-led pipeline.
Outcome: A system that can discover, map, and orchestrate data across the cloud.

How do we leverage these tools?

The "Brains" and the "Hands" of the process.

AI Agents: Use Large Language Models (LLMs) to understand complex system instructions and specific user prompts. They provide the "logic" behind the process.
MCP Tools: Provide the "connectivity." They expose metadata to the agent, which allows the AI to understand exactly what actions are available and how to execute them correctly.

How does this all work?

The Execution Loop

The Model: The agent calls a managed LLM service in the cloud (Gemini) for high-level reasoning.
Discovery: The agent "sees" the available MCP tools and automatically understands how to use them to interact with GCS or BigQuery.
Governance: System Prompts provide the guardrails, core requirements, and engineering standards the agent must follow.
Action: The User Prompt provides the specific task (e.g., "Find today's files"). The agent then executes the work.

Intelligent Orchestration

We build an AI-powered Data Engineering process that successfully handles:

Data Lake Discovery: Automatically identifying patterns and namespaces in GCS.
Data Warehouse Orchestration: Mapping those discoveries directly into BigQuery and creating the data models for analysis.

AI-Driven Data Engineering

Agents can connect to a data lake and run discovery on the file
Agents can use the result of the discovery to build external tables, views, tables and even stored procedures for the incremental update process

Architecting an Agentic Data Pipeline - From Data Lake Discovery to Managed Orchestration

🌟 Let's Connect & Build Together

Thanks for reading! 😊 If you enjoyed these resources, let's stay in touch! I share deep-dives into AI/ML patterns and host community events here:

GDG Broward: Join our local dev community for meetups and workshops.
Global AI Events: Join Global AI Events.
LinkedIn: Let's connect professionally! I share insights on engineering.
GitHub: Follow my open-source journey and star the repos you find useful.
YouTube: Watch step-by-step tutorials on the projects listed above.
BlueSky / X / Twitter: Daily tech updates and quick engineering tips.

👉 Originally published at ozkary.com

2/25/26

AI Driven App Architecture - Smart Development Life Cycle Governance

Oscar Garcia @ozkary 2/25/2026 AI , ai-agents , ai-governance , typescript No comments

Overview

As development teams scale, maintaining architectural consistency becomes the biggest bottleneck. Documents are ignored, and linters only catch syntax errors, not design patterns.

In this session, we will demonstrate how to transform AI from a passive coding assistant into an active Architectural Enforcer. By embedding your "unwritten rules" directly into the repository configuration, you create a developer experience where the AI enforces your patterns in real-time.

We will explore how this shifts the workflow: new developers are guided by the AI from day one, preventing architectural leakage before a pull request is ever opened.

AI Driven App Architecture - Smart Development Life Cycle Governance

🚀 Featured Open Source Projects

Explore these curated resources to level up your engineering skills. If you find them helpful, a ⭐️ is much appreciated!

🏗️ Data Engineering

Focus: Real-world ETL & MTA Turnstile Data

🤖 Artificial Intelligence

Focus: LLM Patterns and Agentic Workflows

📉 Machine Learning

Focus: MLOps and Productionizing Models

💡 Contribute: Found a bug or have a suggestion? Open an issue! and be part of the open source project.

YouTube Video

👍 Subscribe to the channel to get notify on new events!

Video Agenda

The Problem: Architectural Drift

Why strict rules (Controller-View, Pascal/camelCase) degrade over time and how AI can fix it.

The Intelligence Engine

Breakdown of the core components: Global Rules, Contextual Guardrails, Agent Tools, and Directory Structure.

Configuration: Global Governance

Setting up global "system prompts" for the repository to enforce tech stack and naming conventions.

Configuration: Contextual Guardrails

Creating "firewalls" for specific folders (e.g., preventing logic in views, preventing API calls in Controllers).

Configuration: The Tooling

Building custom Slash Commands (/new-module) to automate "Vertical Slice" scaffolding.

Configuration: The Auditor Agent

Implementing a specialized "Gatekeeper" persona that scans imports to ensure strict layer separation.

Agent Mapping

A conceptual framework comparing repository configuration to autonomous agent architecture.

💡 Why Attend?

Stop writing boilerplate: Learn to automate complex folder structures with one command.
Reduce PR Reviews: Shift governance "left" by having the AI catch architectural errors instantly.
Interactive Demo: See the .github configuration in action on a real codebase.
Takeaway Code: Leave with the copy-paste markdown templates to implement this in your own repo tomorrow.

Target Audience

Tech Leads & Architects who need to enforce standards across scaling teams.
Developers who are tired of correcting the same patterns in code reviews.
DevOps Engineers interested in "Governance as Code."
Leadership teams that are trying to raise standards and productivity in their organizations.

Presentation

SETTING THE STAGE

The Context

We enforce a strict pattern using the ViCSA architecture
PascalCase for UI Components.
camelCase for Logic & Services.
Separation of Concerns (SoC) is non-negotiable.

The Problem

Architectural Drift: Patterns degrade over time.
Passive Docs: Wiki pages are ignored.
Linter Limits: Linters catch syntax, not architecture.
Solution: Active Governance via AI.

THE INTELLIGENCE ENGINE

Core AI Policies

Centralized Config: Rules live in the repo, not the user's IDE.
Global Rules: Applied to every interaction (System Prompt).
Contextual Rules: Triggered only when specific files are opened.
Agent Tools: Custom commands to scaffold new components, controllers or services.

AI Driven App Architecture - Smart Development Life Cycle Governance - Project Structure

CONFIGURATION: GLOBAL GOVERNANCE

Global Instructions

File: .github/copilot-instructions.md

This acts as the System Prompt for the entire repository. It is silently added to every interaction.

Tech Stack: TS, Tailwind, Hooks.
Naming: Pascal vs camelCase.
Flow: View → Controller → Service -> API.

AI Driven App Architecture - Smart Development Life Cycle Governance - Global Governance

DEV EXPERIENCE: THE SILENT ENFORCER

Without Config

A developer asks:

How do I create a new service?

AI suggests a generic Class-based service.
Suggests creating a utils.js file.
Ignores project folder structure.

With Config

A developer asks: How do I create a new service?"

AI reads the Governance.
Response: Create src/services/userAuth/index.ts using a functional export, as per project standards.

CONFIGURATION: CONTEXTUAL GUARDRAILS

View Layer Rules

File: .github/instructions/controller-layer.md

Trigger: Opening any **/*.tsx file.

"You are a View."
"No Logic allowed."
"No direct API calls."

Controller Layer Rules

File: .github/instructions/view-layer.md

Trigger: Opening any **/controller.ts file.

"You are a Controller."
"Use Services, NOT Fetch."
"Manage State here."

DEV EXPERIENCE: REAL-TIME INTERVENTION

The Scenario

A developer tries to write fetch() inside a UI Component (index.tsx).
They ask Copilot: "Write a fetch call here for me."

The Intervention

Ghost Text: Copilot refuses to autocomplete the network call.

Chat Reply:

I cannot. This is a View file. Please move this logic to the sibling Controller (index.ts) and import it.

CONFIGURATION: THE TOOLING

Prompt Library

File: .github/prompts/new-module.md

These act as Agent Tools or "Slash Commands".

Goal: Automate the "Vertical Slice".
Benefit: Complex scaffolding logic is stored in the repo, not in the developer's head.
Usage: /new-module

# Prompt Library (The Scaffolder)
File: `.github/prompts/new-component.md`
Goal: Automate the creation of a standalone UI Component with optional Service/API layers.

# Create New Component
I need to generate a new component following our **Folder-as-Namespace** pattern.
**Command:** `/new-component:{{componentName}} {{args}}`

Please generate the code blocks for the layers requested in the arguments (service, api). 
*Note: Logic folders must be camelCase. UI folders must be PascalCase.*

---

### Component Layer (Required)
**Folder:** `src/components/{{componentName (PascalCase)}}/`
- **File:** `controller.ts` (Controller): Logic and State only.
- **File:** `index.tsx` (View): Pure UI. Imports Controller.
---


### Service Layer (Optional)
*Condition: Generate only if 'service' is present in {{args}}.*

**File:** `src/services/{{componentName (camelCase)}}/index.ts`
- **Role:** Business logic and data transformation.
- **Code:** Import the API (if requested). Export a service object or functional exports.

---

### API Layer (Optional)
*Condition: Generate only if 'api' is present in {{args}}.*

**File:** `src/apis/{{componentName (camelCase)}}/index.ts`
- **Role:** Define specific endpoints.
- **Code:** Import `coreClient` from `src/apis/index.ts`. Export async functions with typed responses.

---

### Style Guidelines
- **Typing:** Use TypeScript interfaces for all Props and Data models.
- **Separation:** Logic stays in `controller.ts`, JSX stays in `index.tsx`.
- **Naming:** Components use PascalCase; Services/APIs use camelCase.

DEV EXPERIENCE: THE SCAFFOLDING

The Command

Starting a new feature called "Sales Dashboard".

Action:

/new-module featureName:Sales Dashboard

The Execution

Analyzes the request.
Applies PascalCase to Containers/Components folders.
Applies camelCase to api/service folders.
Generates the Controller-View pair instantly.

THE RESULT: GENERATED ARCHITECTURE

The Results

Layers generated instantly.
Correct naming conventions applied.
Zero manual boilerplate.

AI Driven App Architecture - Smart Development Life Cycle Governance - Project Structure

CONFIGURATION: THE AUDITOR AGENT

Specialized Persona

File: .github/agents/arch-auditor.md

This creates a named Agent that acts as a Gatekeeper. It doesn't write features; it verifies them.

Role: Architecture Enforcer.
Task: Scans imports to ensure strict layer separation.
Rule: "Views never talk to APIs."

# Custom AI Agent (The Reviewer)
Agent ID: `@vicsa-auditor`

Context: A bot that ensures the chain of command is respected using the ViCSA architecture (View Controller Service API)

## Primary Objective
name: Architecture Auditor
description: Verifies strict separation of Controller, Service, and View layers.
tools: [code-search]

---
## Role
You ensure the integrity of the data flow: View -> Controller -> Service -> API.

## Audit Logic
When asked to "Audit this feature":

1. **Check the View (.tsx):** - FAIL if it imports `src/services`.
   - FAIL if it imports `src/apis`.
   - PASS only if it imports `./index`.

2. **Check the Controller (.ts):**
   - FAIL if it uses `fetch` or `axios`.
   - PASS only if it delegates to `src/services`.

3. **Check the Service:**
   - FAIL if it defines its own URL logic.
   - PASS only if it imports `src/apis/index.ts`.

DEV EXPERIENCE: THE CODE REVIEW

The Interaction

Before raising a pull request, the developer invokes the auditor.

Prompt:

@vicsa-auditor check this component for violations.

Response:

✅ PASS: SalesDashboard/index.tsx imports only from its sibling controller. No direct API calls found.

AI Driven App Architecture - Smart Development Life Cycle Governance - Review Process

THE AUTONOMY ADVANTAGE

AI enforces the ViCSA architecture through continuous observation and autonomous execution.

Perception: Continuously observes the active workspace, file paths (e.g., src/components/), and context to understand the developer's structural intent.
Reasoning: Evaluates the perceived context against the repository's .github Guardrails, determining if a View is bypassing a Controller or violating Separation of Concerns, SoC.
Action: Executes autonomous scaffolding, enforces strict ViCSA governance, provides recommended fixes feedback.

SUMMARY & AGENT MAPPING

Embedding governance directly into the repository transforms the development lifecycle. It replaces passive wiki pages with active, real-time enforcement, ensuring that every AI suggestion aligns with architectural standards. This eliminates "drift", accelerates onboarding, and turns Copilot into a domain-expert partner.

Agent Component	GitHub Implementation
System Prompt	Global Instructions (copilot-instructions.md)
Context / RAG	Modular Instructions (instructions/*.md)
Tools / Functions	Prompt Library (prompts/*.md)
Human Prompt	Chat Window
Persona	Agent Personas (i.e. agents/arch-auditor.md)

RAG: Retrieval augmented generation

🌟 Let's Connect & Build Together

Thanks for reading! 😊 If you enjoyed these resources, let's stay in touch! I share deep-dives into AI/ML patterns and host community events here:

GDG Broward: Join our local dev community for meetups and workshops.
Global AI Events: Join Global AI Events.
LinkedIn: Let's connect professionally! I share insights on engineering.
GitHub: Follow my open-source journey and star the repos you find useful.
YouTube: Watch step-by-step tutorials on the projects listed above.
BlueSky / X / Twitter: Daily tech updates and quick engineering tips.

1/21/26

The Cognitive Data Lakehouse: AI-Driven Unification and Semantic Modeling in a Zero-ETL Environment

Oscar Garcia @ozkary 1/21/2026 AI , cloud-engineering , data , data lake , data warehouse No comments

Overview

In the modern data landscape, the wall between "where data lives" and "how we get insights" is crumbling. This session focuses on the Cognitive Data Lakehouse. A paradigm shift that allows developers to treat a fragmented data lake as a unified, high-performance warehouse.

We will explore how to move beyond brittle ETL pipelines using Zero-ETL architecture in the cloud. The core of our discussion will center on using integrated AI capabilities and semantic modeling to solve the "Metadata Mess" inherent in global manufacturing feeds without moving a single byte of data. From raw telemetry in object storage to semantic intelligence via large language models, we’ll show you the real-world application of AI in modern data engineering.

The Cognitive Data Lakehouse: AI-Driven Unification and Semantic Modeling in a Zero-ETL Environment

🚀 Featured Open Source Projects

Explore these curated resources to level up your engineering skills. If you find them helpful, a ⭐️ is much appreciated!

🏗️ Data Engineering

Focus: Real-world ETL & MTA Turnstile Data

🤖 Artificial Intelligence

Focus: LLM Patterns and Agentic Workflows

📉 Machine Learning

Focus: MLOps and Productionizing Models

💡 Contribute: Found a bug or have a suggestion? Open an issue! and be part of the open source project.

YouTube Video

Video Agenda

Phase 1: Foundations & The Zero-ETL Strategy

We kick off with the infrastructure layer. We'll discuss the design of cross-region telemetry tables and how modern cloud engines allow us to query raw files in object storage with the performance of a native table. We’ll establish why "0x data movement" is the goal for modern scalability.

Phase 2: Confronting the Metadata Mess

Schema drift and inconsistent naming across global regions are the enemies of unified analytics. We will look at why traditional manual mapping fails and how we can use AI inference to bridge these gaps and standardize naming conventions automatically.

Phase 3: AI-Driven Unification & Semantic Modeling

The "Cognitive" part of the Lakehouse. We’ll dive into the technical implementation of registering AI models directly within your data warehouse environment. You'll see how to create an abstraction layer that uses AI to normalize data on the fly, creating a robust semantic model.

Phase 4: Scaling to a Global Feed

Finally, we’ll demonstrate the DevOps workflow for integrating a new international factory feed into a global telemetry view. We'll show how to maintain a "Single Source of Intelligence" that BI tools and analysts can consume without needing to know the complexities of the underlying lake.

💡 Why Attend?

Master Modern Architecture: Learn the "Abstraction Layer" design pattern that is replacing traditional, slow ETL/ELT processes.
Hands-on AI for Data Ops: See exactly how to use AI and semantic modeling within SQL-based workflows to automate data cleaning and schema mapping.
Scale Without Pain: Discover how to manage global data sources (multi-region, multi-format) through a single governing layer.
Developer Networking: Connect with other data architects, engineering leaders, and professionals solving similar scale and complexity challenges.

Target Audience: Data Engineers, Analytics Architects, Cloud Developers, and anyone interested in the intersection of Big Data and Generative AI.

Presentation

Phase 1: The Zero-ETL Strategy

INFRASTRUCTURE: DATA STAYS LOCAL

Architecting for Scale

Storage Decoupling: Raw files remain in the Data Lake, eliminating replication overhead.
Virtual Access: Data Warehouse external tables allow immediate querying of CSV, Parquet, and JSON.
Minimal Latency: No waiting for ingest pipelines; analysis starts upon file arrival.

The Cognitive Data Lakehouse: AI-Driven Unification and Semantic Modeling in a Zero-ETL Environment - Medallion Architecture Design Diagram

UNMATCHED STORAGE EFFICIENCY

Zero Data Replication

Traditional ETL requires moving data across multiple tiers. Our architecture ensures a single source of truth with zero data movement between GCS and BigQuery compute.
This is similar to the Bronze Zone in a Medallion Architecture.

The Cognitive Data Lakehouse: AI-Driven Unification and Semantic Modeling in a Zero-ETL Environment - Medallion Architecture Design Diagram

Phase 2: The Metadata Mess

CHALLENGES OF UNIFICATION

Schema Friction

Feeds arrive with inconsistent headers (e.g., 'Device Number' vs 'deviceNo'). Manual aliasing is fragile and slow.

Entity Drift

Names and IDs vary across systems, preventing standard joins from matching records effectively.

Type Mismatches

Varying data types for the same concept (Integer vs String) crash standard SQL aggregation views.

Phase 3: The AI Solution

BIGQUERY STUDIO: THE AI INTERFACE

Remote AI Registration

CREATE MODEL `gemini_remote`
REMOTE WITH CONNECTION `bq_connection`
OPTIONS(endpoint = 'gemini-1.5-pro');

Automated Inference

AI "reads" information schemas to infer mapping logic, moving you from Code Author to Logic Approver.

SELECT ml_generate_text_result
FROM ML.GENERATE_TEXT(
  MODEL `gemini_remote`,
  (SELECT "Compare Source A and B schemas. Write a SQL view to unify them." AS prompt)
);

AI-ASSISTED SCHEMA DISCOVERY

Prompting for Base Tables

Using AI to generate the DDL for external tables by pointing to compressed feeds in the lake (USA & MEX factories).

SELECT ml_generate_text_result
FROM ML.GENERATE_TEXT(
  MODEL `gemini_remote`,
  (SELECT "Create External Tables as smart_factory.us_telemetry with path 'gs://factory-dl/us/dev-540/telemetry-*.csv.gz' '. Include option CSV, GZIP compression and skip 1 row. Infer and add the schema using lower case" AS prompt));

SELECT ml_generate_text_result
FROM ML.GENERATE_TEXT(
  MODEL `gemini_remote`,
  (SELECT "Create External Tables as smart_factory.mx_telemetry with path 'gs://factory-dl/mx/dev-940/telemetry-*.csv.gz' '. Include option CSV, GZIP compression and skip 1 row. Use schema device_number STRING, bay_id INT64, factory STRING, created STRING" AS prompt));

Generated BigLake DDL

-- USA Factory Feed
CREATE OR REPLACE EXTERNAL TABLE `smart_factory.us_telemetry` (
  device_number STRING,
  bay_id INT64,
  factory STRING,
  created STRING
)
OPTIONS (
  format = 'CSV',
  uris = ['gs://factory-dl/us/dev-540/telemetry*.csv.gz'],
  skip_leading_rows = 1,
  compression = 'GZIP'
);

-- MEX Factory Feed
CREATE OR REPLACE EXTERNAL TABLE `smart_factory.mx_telemetry` (
  device_number STRING,
  bay_id INT64,
  factory STRING,
  created STRING
)
OPTIONS (
  format = 'CSV',
  uris = ['gs://factory-dl/mx/dev-940/telemetry*.csv.gz'],
  skip_leading_rows = 1,
  compression = 'GZIP'
);

AI-ABSTRACTION: THE VIEW LAYER

Generating the Interface

AI creates a clean abstraction view for each external table, decoupling raw storage from the analytics model.

-- AI Instruction
"Create a view named 
smart_factory.vw_us_telemetry 
selecting all columns from the
usa_telemetry table. Safe cast the created column as datetime."

Abstraction Layer DDL

-- Semantic Abstraction Layer
CREATE OR REPLACE VIEW `smart_factory.vw_us_telemetry` AS
SELECT 
  device_number,
  bay_id,
  factory,
  SAFE_CAST(created as DATETIME) AS created
FROM `smart_factory.us_telemetry`;

COGNITIVE UNIFICATION

The Multi-Region Model

The unified view now consumes from the abstraction layer, ensuring that changes to raw storage don't break the views down stream.

-- AI Instruction
"Create a view with name
smart_factory.vw_telemetry that creates a union of all the fields from the views vw_[region]_telemetry. The regions include us and mx. List out all the field names. Never use * for field names"

Unified Global View

-- Semantic Abstraction Layer
CREATE OR REPLACE VIEW `smart_factory.vw_telemetry` AS
SELECT 
  device_number,
  bay_id,
  factory,
  created
FROM `smart_factory.vw_us_telemetry`
UNION ALL
SELECT 
  device_number,
  bay_id,
  factory,
  created
FROM `smart_factory.vw_mx_telemetry`

SCALING TO CHINA FACTORY

Evolving the Model

Adding the new China feed by generating the External Table definition via AI.

CREATE OR REPLACE EXTERNAL TABLE `smart_factory.cn_telemetry` (
  device_number STRING,
  bay_id INT64,
  factory STRING,
  created STRING
)
OPTIONS (
  format = 'CSV',
  uris = ['gs://factory-dl/cn/dev-900/telemetry*.csv.gz'],
  skip_leading_rows = 1,
  compression = 'GZIP'

Human-in-the-Loop DevOps

Use AI to update the unified view with the new data feed. Review and apply the changes by the DevOps team, as changes to a production view require approval.

Manufacturing SPC & Root Cause Analysis

This query calculates a rolling mean and standard deviation over the last 10 minutes of telemetry to detect anomalies, “Out of Control” conditions.

WITH TelemetryStats AS (
  SELECT
    machine_id,
    timestamp,
    sensor_reading,
    -- Calculate rolling stats for the "Control Chart"
    AVG(sensor_reading) OVER(PARTITION BY machine_id ORDER BY timestamp ROWS BETWEEN 20 PRECEDING AND CURRENT ROW) as rolling_avg,
    STDDEV(sensor_reading) OVER(PARTITION BY machine_id ORDER BY timestamp ROWS BETWEEN 20 PRECEDING AND CURRENT ROW) as rolling_stddev
  FROM `production_data.mx_telemetry_stream`
  WHERE timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)
),
Anomalies AS (
  SELECT *,
    -- Define "Out of Control" (Reading > 3 Sigma from mean)
    ABS(sensor_reading - rolling_avg) > (3 * rolling_stddev) AS is_out_of_control
  FROM TelemetryStats
)
SELECT * FROM Anomalies WHERE is_out_of_control = TRUE;

Control Chart Visualization

The Cognitive Data Lakehouse: AI-Driven Unification and Semantic Modeling in a Zero-ETL Environment - Control Charts

ADVANTAGE COMPARISON MATRIX

Metric	Manual Data Engineering	AI-Augmented Zero-ETL
Unification Speed	Days/Weeks per Source	Minutes via Generative AI
Schema Drift	Manual Script Rewrites	Adaptive AI View Discovery
Infrastructure Cost	High (Data Redundancy)	Minimal (In-place on GCS)

Strategic Intelligence ROI:

ROI(ai) = Insights Velocity / (Movement Cost + Labor Hours)

FINAL THOUGHTS: STRATEGIC SUMMARY

Legacy Challenges

Brittle ETL: Manual pipelines break with every schema change.
Cost Inefficiency: Redundant storage for processed data.
Semantic Silos: Hard-coded aliases for disparate naming conventions.
Slow Time-to-Insight: Weeks spent on manual schema alignment.

AI-Assisted Solutions

Zero-ETL Arch: Cost-effective storage with Data Lake virtual access.
Automated Inference: Vertex AI handles the "heavy lifting" of mapping.
Adaptive DevOps: Scalable model evolution (USA → MEX → China).
Unified Intelligence: One virtual source of truth for global analytics.

Moving from Data Reporting to Active Semantic Intelligence.

We've covered a lot today, but this is just the beginning!

If you're interested in learning more about building cloud data pipelines, I encourage you to check out my book, 'Data Engineering Process Fundamentals,' part of the Data Engineering Process Fundamentals series. It provides in-depth explanations, code samples, and practical exercises to help in your learning.

📅 Upcoming Sessions

Our upcoming series expands beyond data engineering to bridge the gap between AI, Machine Learning, and modern cloud architecture. Using our Data, AI, and ML GitHub blueprints, we provide the code-first patterns needed to build everything from Zero-ETL pipelines to scalable LLM-powered systems. Join us to explore how these integrated disciplines work together to turn raw data into production-ready intelligence.

🌟 Let's Connect & Build Together

If you enjoyed these resources, let's stay in touch! I share deep-dives into AI/ML patterns and host community events here:

GDG Broward: Join our local dev community for meetups and workshops.
LinkedIn: Let's connect professionally! I share insights on engineering.
GitHub: Follow my open-source journey and star the repos you find useful.
YouTube: Watch step-by-step tutorials on the projects listed above.
BlueSky / X / Twitter: Daily tech updates and quick engineering tips.

👉 Originally published at ozkary.com

I am Oscar Garcia, OzkaryTM. I author this site, speak at conferences and events, contribute to OSS, mentor people. I use this blog to post ideas and experiences about software development, with the goal to both learn from and help the technology communities around the world.

7/11/26

Overview

🚀 Featured Open Source Projects

🏗️ Data Engineering

🤖 Artificial Intelligence

📉 Machine Learning

1. The Core Engineering Challenge: The Discovery-to-Implementation Gap

2. What is AIStorming?

3. Foundational Principles of AIStorming

I. The Code-Centric Paradigm

II. Dataset & Domain Grounding

III. Source Control and Auditable Iteration

IV. Bridge to Vibe Coding and SDD

4. The 4 Universal Steps of the AIStorming Process

Step 1: Problem Framing & Context Priming

Step 2: Exploratory Data & Domain Logic Analysis

Step 3: Use Case & Requirement Synthesis

Step 4: SDD & Vibe Coding Specification Output

5. Architectural Deliverables Matrix

6. Practical Application: Enterprise Cloud-Native & Data Platform Build

7. Conclusion: AIStorming as an Enterprise Engineering Standard

References & Foundational Frameworks

🌟 Let's Connect & Build Together

6/24/26

Overview

Presentation Summary

The Monolithic Prompt & SDK Trap

Architectural Blueprint: Layered Agent Design

1. The Base Agent (The Foundation)

2. The Basic Agent (Configuration over Hardcoding)

3. The Tool Agent (Advanced Governance & MCP)

Extending Capability with Custom & Native MCP Tools

Abstracting Built-in SDK Tools

Building Custom MCP Tools from Scratch

The Unsung Hero: The Agent Runner Runtime

Modern DevOps for AI: UV and Makefiles

Key Takeaways for Enterprise Developers

Resources & Next Steps

🤖 Artificial Intelligence

🚀 Featured Open Source Projects

🏗️ Data Engineering

📉 Machine Learning

YouTube Video

🌟 Let’s Connect & Build Together

4/30/26

Overview

🚀 Featured Open Source Projects

🏗️ Data Engineering

🤖 Artificial Intelligence

📉 Machine Learning

🔗 Review the repo used for this presentation:

Tech Stack

YouTube Video

📅 Agenda

⭐ Why Attend?

👥 Who Is This For?

Presentation

The Real-Time Feed Challenge

Scalable System Architecture

The “Human in the Loop” Trap

The 3 Principles of AI Agents

The Intelligent Journey: Comparison

🌟 Let’s Connect & Build Together

3/31/26

Overview

🚀 Featured Open Source Projects

🏗️ Data Engineering

🤖 Artificial Intelligence

📉 Machine Learning

🔗 Related Repository: AI Agents for Data Engineering

YouTube Video

📅 Agenda

⭐ Why Attend?

👥 Who Is This For?

Presentation

Automating the Data Engineering Lifecycle

How do we leverage these tools?

How does this all work?

Intelligent Orchestration

I am Oscar Garcia, Ozkary^TM. I author this site, speak at conferences and events, contribute to OSS, mentor people. I use this blog to post ideas and experiences about software development, with the goal to both learn from and help the technology communities around the world.