MergeAI — Architecture & Tech Stack

Tech Stack

Frontend

Next.js 15 React 19 TypeScript Tailwind CSS 4 Framer Motion React Flow

AI & Data

NVIDIA NIM API OpenAI SDK Plotly.js SheetJS PapaParse csv-parse

Backend & Infra

Neon PostgreSQL Drizzle ORM Clerk Auth Vercel SSE Streaming

Built with Adal CLI

      MergeAI was scaffolded and developed using Adal CLI — a developer productivity tool for rapid AI application development. Adal CLI provided the project structure, deployment pipeline, and development workflow that made it possible to build a 5-agent, cross-file analytics platform in a single hackathon weekend.
    

Request Flow — From Question to Answer

Every question flows through a self-correcting pipeline. If the Validator catches an error, the entire pipeline retries — up to 3 rounds — automatically.

User Question

→

Schema Agent
Nano 8B

→

SQL Agent
Ultra 253B

→

Execute on Neon

→

Validator Agent
Deterministic

→

Summary Agent
Nano 8B

→

Chart Agent
Nano 8B

→

Results + Chart

      Self-Correcting Loop: If the Validator detects errors (bad SQL, high null %, zero rows), it sends feedback to the Schema + SQL agents and the pipeline retries. Up to 3 rounds — no human intervention needed.
    

5 AI Agents — Orchestrated Pipeline

Each agent is a specialist. The Orchestrator coordinates them in sequence, streaming real-time progress events to the frontend via SSE (Server-Sent Events).

🔍

Schema Agent

Nemotron Nano 8B

Analyzes file schemas, discovers join keys between files, identifies metrics and dimensions. Decides if query needs 1 file or cross-file join.

🔨

SQL Agent

Nemotron Ultra 253B

Generates production-grade SQL with CTEs, JSONB extraction, type casting, and cross-file JOINs. Handles follow-up context for conversational queries.

✓

Validator Agent

Deterministic

Checks SQL results for errors: zero rows, high null percentages, SQL exceptions. Sends fix suggestions back to Schema + SQL agents for retry.

💬

Summary Agent

Nemotron Nano 8B

Converts query results into a natural language summary. "Engineering department has the highest avg salary at $142K, followed by Sales at $118K."

📊

Chart Agent

Nano 8B + Deterministic

Selects optimal chart type using keyword detection (deterministic) with LLM fallback. Generates Plotly.js configs for bar, line, pie, scatter, and heatmap.

NVIDIA Models — NIM API

All inference runs through NVIDIA NIM API using the OpenAI-compatible SDK. Zero self-hosted GPU — pure cloud inference with enterprise reliability.

NVIDIA Nemotron Ultra

253B

SQL Agent — Most capable model for production-grade SQL generation. Uses "detailed thinking off" mode for clean output without <think> tags. Temperature 0 for deterministic results.

NVIDIA Nemotron Nano

8B

Schema Agent, Summary Agent, Chart Agent — Fast, structured output for schema analysis, natural language summaries, and chart type selection. 3 agents share this model for low-latency responses.

Cross-File Joins — "The Merge in MergeAI"

      Upload 2+ spreadsheets with different columns. Ask a question that spans both. MergeAI automatically discovers the join key and builds a cross-file SQL query.

      Example: Upload employees.csv (names, departments, salaries) + training_data.csv (departments, training costs, outcomes).

      Ask: "Compare average salary vs training cost by department" — MergeAI joins on Department automatically.

Schema Agent: Join Discovery

AI-Powered (Nano 8B)

Compares column names, types, and sample values across files. Detects exact matches (Employee_ID = Employee_ID), fuzzy matches (Dept = Department), and case-insensitive matches. Returns confidence score and match type.

Visual Schema Map: Join Preview

Client-Side (React Flow)

Interactive diagram showing files as draggable nodes with column lists. Detected join keys shown as color-coded edges. Uses client-side name matching for instant preview. Columns categorized: metrics, dimensions, dates, identifiers.

File A: employees.csv

↔

Department ↔ Dept
fuzzy match (82%)

↔

File B: training_data.csv

JSONB Storage — Schema-Less Architecture

Unlike traditional approaches that create a new SQL table per upload, MergeAI stores ALL data in a single JSONB column. Any CSV/Excel structure is supported — zero schema migrations.

uploaded_files

id PK UUID

user_id VARCHAR

file_name VARCHAR

columns JSONB string[]

column_types JSONB {col: type}

sample_values JSONB {col: vals[]}

row_count INTEGER

is_demo BOOLEAN

created_at TIMESTAMP

1

N CASCADE

uploaded_rows

id PK BIGSERIAL

file_id FK UUID

user_id VARCHAR

row_data JSONB {col: value}

Indexes

idx_rows_file (file_id)

idx_rows_user (user_id)

      How SQL Queries Work on JSONB:

      SELECT row_data->>'Department' AS dept, AVG((row_data->>'Salary')::numeric) AS avg_salary
FROM uploaded_rows WHERE file_id = '...' GROUP BY 1

      The SQL Agent generates JSONB extraction (row_data->>'column') with type casts (::numeric) for aggregations. All queries are read-only — write operations are blocked.

SQL Safety Layer — Defense in Depth

AI generates SQL. That's dangerous without guardrails. MergeAI validates every query through a 6-layer safety system before execution.

Read-Only Allowlist Only SELECT and WITH (CTEs) are allowed. Everything else is rejected.

Keyword Blocklist INSERT, UPDATE, DELETE, DROP, ALTER, TRUNCATE, CREATE, GRANT, REVOKE, COPY, EXECUTE — blocked.

Anti-Injection Semicolons in mid-query blocked. No multi-statement injection. Trailing semicolons stripped.

Row Limit Cap Auto-appends LIMIT 200 to any query missing a LIMIT clause. Prevents full-table scans.

10s Query Timeout AbortController kills any query over 10 seconds. Neon HTTP driver is stateless.

Upload Limits 10MB max file size, 10,000 max rows, 50 max columns. Enforced client-side and server-side.

Real-Time Agent Streaming — SSE Protocol

Users see every agent thinking live. Each step animates in the UI as it happens — not after.

Server: SSE Event Stream

The /api/query endpoint returns a ReadableStream with text/event-stream content type. Each agent event is sent as data: {JSON}\n\n — standard SSE format.

          Event types:

          • agent_start — Agent begins work

          • agent_progress — Intermediate status

          • agent_complete — Agent finished

          • round_retry — Validator requested retry

          • query_complete — Final results

          • query_error — Pipeline failed

Client: useAgentStream Hook

Custom React hook (use-agent-stream.ts, 156 lines) consumes SSE events via fetch() with a ReadableStream reader. Parses each event and updates React state. Agent cards animate between states.

          UI states per agent:

          • Waiting (gray, pulsing dot)

          • Active (blue, spinning ring)

          • Done (green check, status text)

          • Error (red, error message)

Smart Features

Interactive Plotly Charts

Chart Agent picks the right chart type. Supports bar, line, pie, scatter, and heatmap. Rendered with Plotly.js — hover tooltips, zoom, pan, PNG download. Dark theme.

AI-Suggested Queries

Suggestion engine analyzes columns by category (metrics, dimensions, dates) using actual data types. Generates 3-5 smart questions. Cross-file suggestions prioritized.

Follow-Up Context

Chat threads maintain conversation history. "Now break that down by gender" — SQL Agent modifies the previous query instead of starting from scratch.

Data Preview Modal

Click any file to see first 20 rows. Color-coded column headers by type: numbers (blue), dates (purple), text (gray). Sticky headers, horizontal scroll.

Excel & CSV Upload

Upload .xlsx, .xls, or .csv files. Excel parsed client-side with SheetJS (dynamic import for bundle optimization). Auto type inference from first 20 rows.

Export Results (Excel & CSV)

Download query results as Excel or CSV. CSV uses pure string manipulation. Excel uses SheetJS. Both run client-side — zero server involvement.

API Endpoints (5 Routes)

POST /api/query Run agent pipeline — returns SSE stream with live agent events + results

POST /api/upload Upload file — receives JSON (columns, types, rows), stores in Neon JSONB

GET /api/files List user's files — returns file metadata (columns, types, sample values)

GET /api/files/[id]/preview Preview file data — returns first 20 rows with ownership verification

DELETE /api/files/[id] Delete uploaded file — removes file record and all rows, ownership verified

Project Structure (31 TypeScript Files)

src/ ├── app/ │ ├── api/ │ │ ├── query/ │ │ │ └── route.ts 89 lines — SSE agent pipeline │ │ ├── upload/ │ │ │ └── route.ts 91 lines — JSONB file storage │ │ └── files/ │ │ ├── route.ts 43 lines — file listing │ │ └── [id]/preview/ │ │ └── route.ts 62 lines — data preview │ ├── dashboard/ │ │ └── page.tsx 1,119 lines — main app UI │ ├── page.tsx 333 lines — landing page │ ├── layout.tsx root layout + Clerk provider │ ├── loading.tsx loading state │ ├── globals.css Tailwind + animations │ ├── settings/ │ │ └── page.tsx settings page │ ├── sign-in/ │ │ └── page.tsx Clerk auth │ └── sign-up/ │ └── page.tsx Clerk auth ├── lib/ │ ├── agents/ │ │ ├── orchestrator.ts 203 lines — agent coordination + retry loop │ │ ├── schema-agent.ts 75 lines — join key discovery │ │ ├── sql-agent.ts 74 lines — SQL generation + follow-up │ │ ├── validator-agent.ts 62 lines — result validation + fix │ │ ├── chart-agent.ts 225 lines — chart type selection │ │ └── summary-agent.ts 22 lines — NL summary generation │ ├── db.ts 96 lines — Neon connection + SQL safety │ ├── nvidia.ts 54 lines — NVIDIA NIM API via OpenAI SDK │ ├── types.ts 90 lines — TypeScript interfaces │ ├── join-detector.ts 264 lines — client-side join detection │ ├── suggestion-engine.ts 239 lines — AI-suggested queries │ └── session.ts 16 lines — per-browser UUID ├── components/ │ ├── SchemaMap.tsx 256 lines — React Flow diagram │ ├── PlotlyChart.tsx 153 lines — Plotly.js rendering │ └── DataPreview.tsx 129 lines — file preview modal ├── hooks/ │ └── use-agent-stream.ts 156 lines — SSE consumer + state └── drizzle/ └── schema.ts 41 lines — Drizzle ORM definitions