Adal Adal CLI
NVIDIA

Architecture & Tech Stack

How 5 AI agents, 2 NVIDIA models, and real PostgreSQL turn spreadsheets into answers

4,060 Lines of TypeScript
31 TypeScript Files
5 AI Agents
4 API Endpoints
2 Database Tables
3 React Components
16 Dependencies

Tech Stack

Frontend
Next.js 15 React 19 TypeScript Tailwind CSS 4 Framer Motion React Flow
AI & Data
NVIDIA NIM API OpenAI SDK Plotly.js SheetJS PapaParse csv-parse
Backend & Infra
Neon PostgreSQL Drizzle ORM Clerk Auth Vercel SSE Streaming

Built with Adal CLI

MergeAI was scaffolded and developed using Adal CLI — a developer productivity tool for rapid AI application development. Adal CLI provided the project structure, deployment pipeline, and development workflow that made it possible to build a 5-agent, cross-file analytics platform in a single hackathon weekend.

Request Flow — From Question to Answer

Every question flows through a self-correcting pipeline. If the Validator catches an error, the entire pipeline retries — up to 3 rounds — automatically.

User Question
Schema Agent
Nano 8B
SQL Agent
Ultra 253B
Execute on Neon
Validator Agent
Deterministic
Summary Agent
Nano 8B
Chart Agent
Nano 8B
Results + Chart
Self-Correcting Loop: If the Validator detects errors (bad SQL, high null %, zero rows), it sends feedback to the Schema + SQL agents and the pipeline retries. Up to 3 rounds — no human intervention needed.

5 AI Agents — Orchestrated Pipeline

Each agent is a specialist. The Orchestrator coordinates them in sequence, streaming real-time progress events to the frontend via SSE (Server-Sent Events).

🔍
Schema Agent
Nemotron Nano 8B
Analyzes file schemas, discovers join keys between files, identifies metrics and dimensions. Decides if query needs 1 file or cross-file join.
🔨
SQL Agent
Nemotron Ultra 253B
Generates production-grade SQL with CTEs, JSONB extraction, type casting, and cross-file JOINs. Handles follow-up context for conversational queries.
Validator Agent
Deterministic
Checks SQL results for errors: zero rows, high null percentages, SQL exceptions. Sends fix suggestions back to Schema + SQL agents for retry.
💬
Summary Agent
Nemotron Nano 8B
Converts query results into a natural language summary. "Engineering department has the highest avg salary at $142K, followed by Sales at $118K."
📊
Chart Agent
Nano 8B + Deterministic
Selects optimal chart type using keyword detection (deterministic) with LLM fallback. Generates Plotly.js configs for bar, line, pie, scatter, and heatmap.

NVIDIA Models — NIM API

All inference runs through NVIDIA NIM API using the OpenAI-compatible SDK. Zero self-hosted GPU — pure cloud inference with enterprise reliability.

NVIDIA Nemotron Ultra
253B
SQL Agent — Most capable model for production-grade SQL generation. Uses "detailed thinking off" mode for clean output without <think> tags. Temperature 0 for deterministic results.
NVIDIA Nemotron Nano
8B
Schema Agent, Summary Agent, Chart Agent — Fast, structured output for schema analysis, natural language summaries, and chart type selection. 3 agents share this model for low-latency responses.

Cross-File Joins — "The Merge in MergeAI"

Upload 2+ spreadsheets with different columns. Ask a question that spans both. MergeAI automatically discovers the join key and builds a cross-file SQL query.

Example: Upload employees.csv (names, departments, salaries) + training_data.csv (departments, training costs, outcomes).
Ask: "Compare average salary vs training cost by department" — MergeAI joins on Department automatically.
Schema Agent: Join Discovery
AI-Powered (Nano 8B)

Compares column names, types, and sample values across files. Detects exact matches (Employee_ID = Employee_ID), fuzzy matches (Dept = Department), and case-insensitive matches. Returns confidence score and match type.

Visual Schema Map: Join Preview
Client-Side (React Flow)

Interactive diagram showing files as draggable nodes with column lists. Detected join keys shown as color-coded edges. Uses client-side name matching for instant preview. Columns categorized: metrics, dimensions, dates, identifiers.

File A: employees.csv
Department ↔ Dept
fuzzy match (82%)
File B: training_data.csv

JSONB Storage — Schema-Less Architecture

Unlike traditional approaches that create a new SQL table per upload, MergeAI stores ALL data in a single JSONB column. Any CSV/Excel structure is supported — zero schema migrations.

uploaded_files
id PK UUID
user_id VARCHAR
file_name VARCHAR
columns JSONB string[]
column_types JSONB {col: type}
sample_values JSONB {col: vals[]}
row_count INTEGER
is_demo BOOLEAN
created_at TIMESTAMP
1
N CASCADE
uploaded_rows
id PK BIGSERIAL
file_id FK UUID
user_id VARCHAR
row_data JSONB {col: value}
Indexes
idx_rows_file (file_id)
idx_rows_user (user_id)
How SQL Queries Work on JSONB:
SELECT row_data->>'Department' AS dept, AVG((row_data->>'Salary')::numeric) AS avg_salary
FROM uploaded_rows WHERE file_id = '...' GROUP BY 1


The SQL Agent generates JSONB extraction (row_data->>'column') with type casts (::numeric) for aggregations. All queries are read-only — write operations are blocked.

SQL Safety Layer — Defense in Depth

AI generates SQL. That's dangerous without guardrails. MergeAI validates every query through a 6-layer safety system before execution.

Read-Only Allowlist Only SELECT and WITH (CTEs) are allowed. Everything else is rejected.
Keyword Blocklist INSERT, UPDATE, DELETE, DROP, ALTER, TRUNCATE, CREATE, GRANT, REVOKE, COPY, EXECUTE — blocked.
Anti-Injection Semicolons in mid-query blocked. No multi-statement injection. Trailing semicolons stripped.
Row Limit Cap Auto-appends LIMIT 200 to any query missing a LIMIT clause. Prevents full-table scans.
10s Query Timeout AbortController kills any query over 10 seconds. Neon HTTP driver is stateless.
Upload Limits 10MB max file size, 10,000 max rows, 50 max columns. Enforced client-side and server-side.

Real-Time Agent Streaming — SSE Protocol

Users see every agent thinking live. Each step animates in the UI as it happens — not after.

Server: SSE Event Stream

The /api/query endpoint returns a ReadableStream with text/event-stream content type. Each agent event is sent as data: {JSON}\n\n — standard SSE format.

Event types:
• agent_start — Agent begins work
• agent_progress — Intermediate status
• agent_complete — Agent finished
• round_retry — Validator requested retry
• query_complete — Final results
• query_error — Pipeline failed
Client: useAgentStream Hook

Custom React hook (use-agent-stream.ts, 156 lines) consumes SSE events via fetch() with a ReadableStream reader. Parses each event and updates React state. Agent cards animate between states.

UI states per agent:
• Waiting (gray, pulsing dot)
• Active (blue, spinning ring)
• Done (green check, status text)
• Error (red, error message)

Smart Features

Interactive Plotly Charts

Chart Agent picks the right chart type. Supports bar, line, pie, scatter, and heatmap. Rendered with Plotly.js — hover tooltips, zoom, pan, PNG download. Dark theme.

AI-Suggested Queries

Suggestion engine analyzes columns by category (metrics, dimensions, dates) using actual data types. Generates 3-5 smart questions. Cross-file suggestions prioritized.

Follow-Up Context

Chat threads maintain conversation history. "Now break that down by gender" — SQL Agent modifies the previous query instead of starting from scratch.

Data Preview Modal

Click any file to see first 20 rows. Color-coded column headers by type: numbers (blue), dates (purple), text (gray). Sticky headers, horizontal scroll.

Excel & CSV Upload

Upload .xlsx, .xls, or .csv files. Excel parsed client-side with SheetJS (dynamic import for bundle optimization). Auto type inference from first 20 rows.

Export Results (Excel & CSV)

Download query results as Excel or CSV. CSV uses pure string manipulation. Excel uses SheetJS. Both run client-side — zero server involvement.

API Endpoints (5 Routes)

POST /api/query Run agent pipeline — returns SSE stream with live agent events + results
POST /api/upload Upload file — receives JSON (columns, types, rows), stores in Neon JSONB
GET /api/files List user's files — returns file metadata (columns, types, sample values)
GET /api/files/[id]/preview Preview file data — returns first 20 rows with ownership verification
DELETE /api/files/[id] Delete uploaded file — removes file record and all rows, ownership verified

Project Structure (31 TypeScript Files)

src/ ├── app/ │ ├── api/ │ │ ├── query/ │ │ │ └── route.ts 89 lines — SSE agent pipeline │ │ ├── upload/ │ │ │ └── route.ts 91 lines — JSONB file storage │ │ └── files/ │ │ ├── route.ts 43 lines — file listing │ │ └── [id]/preview/ │ │ └── route.ts 62 lines — data preview │ ├── dashboard/ │ │ └── page.tsx 1,119 lines — main app UI │ ├── page.tsx 333 lines — landing page │ ├── layout.tsx root layout + Clerk provider │ ├── loading.tsx loading state │ ├── globals.css Tailwind + animations │ ├── settings/ │ │ └── page.tsx settings page │ ├── sign-in/ │ │ └── page.tsx Clerk auth │ └── sign-up/ │ └── page.tsx Clerk auth ├── lib/ │ ├── agents/ │ │ ├── orchestrator.ts 203 lines — agent coordination + retry loop │ │ ├── schema-agent.ts 75 lines — join key discovery │ │ ├── sql-agent.ts 74 lines — SQL generation + follow-up │ │ ├── validator-agent.ts 62 lines — result validation + fix │ │ ├── chart-agent.ts 225 lines — chart type selection │ │ └── summary-agent.ts 22 lines — NL summary generation │ ├── db.ts 96 lines — Neon connection + SQL safety │ ├── nvidia.ts 54 lines — NVIDIA NIM API via OpenAI SDK │ ├── types.ts 90 lines — TypeScript interfaces │ ├── join-detector.ts 264 lines — client-side join detection │ ├── suggestion-engine.ts 239 lines — AI-suggested queries │ └── session.ts 16 lines — per-browser UUID ├── components/ │ ├── SchemaMap.tsx 256 lines — React Flow diagram │ ├── PlotlyChart.tsx 153 lines — Plotly.js rendering │ └── DataPreview.tsx 129 lines — file preview modal ├── hooks/ │ └── use-agent-stream.ts 156 lines — SSE consumer + state └── drizzle/ └── schema.ts 41 lines — Drizzle ORM definitions