An end-to-end Next.js agent for building company-intelligence experiences. The app maps a target domain, scrapes high-signal pages, generates structured insights with GPT-5.1 via the OpenAI Responses API, streams progress over Server-Sent Events (SSE), persists the run, and exports a finished PDF. Everything ships with strict TypeScript, deterministic contracts, and public-only dependencies so you can drop it straight into your onboarding flow.
What the agent extracts
- Structured profile with companyName, tagline, up to 10 value props, key offerings (title + optional description), up to 6 primary industries, and page-level source citations so analysts can audit the output.
- Narrative executive overview that streams token-by-token and powers the PDF, dashboard copy, and run summaries.
- Model metadata for both GPT-5.1 runs (responseId, model id, token usage, raw text, reasoning headline/summary) to support observability and incident review.
- Provenance bundle covering sitemap selections, mapped links, raw scrape payloads (markdown/text/media), and the XML snapshot of pages handed to the models.
- Snapshot lifecycle data including run status, stage progress, favicon, timestamps, and counts of successful vs. failed pages for reporting and retries.
- Full pipeline:
map → scrape → structured outputs → overview → SSE stream → persist → export PDFwith a demo UI and API surface ready for production. - GPT-5.1 structured intelligence: Dual GPT-5.1 models produce a normalized profile and a narrative overview, validated with Zod before anything is stored or emitted.
- Streaming UX: Deterministic SSE frames (
text/event-stream) let the front end surface deltas, reasoning, and completion status in real time—even if the client reconnects mid-run. - Durable runs: Active collections survive refreshes via a runtime coordinator backed by Redis. Clients can resume a stream or cancel the run with dedicated APIs.
- Historical snapshots: Analysts can load any prior completed run back into the editor or chat via the snapshot detail API, hydrate drafts, and re-export PDFs without re-running the pipeline.
- Swappable persistence: Choose in-memory for demos, Redis for fast prototyping, or Postgres + Drizzle for production auditing—all wired through the same
CompanyIntelPersistenceinterface and selectable viaPERSISTENCE_BACKEND. - Operational guardrails: JSON-ish logging, configurable origins, secret scanning, strict ESLint/Prettier, Vitest, and CI workflows baked in.
Prerequisites: Node.js ≥ 20.11, pnpm ≥ 9.
- Copy the environment template and add your keys:
You must provide
cp .env.example .env
OPENAI_API_KEY(Responses API) andTAVILY_API_KEYto run live collections. - Install dependencies and start the dev server:
pnpm install pnpm dev
- Visit
http://localhost:3000to use the demo workflow. By default the app uses in-memory persistence; setREDIS_URLto enable Redis-backed storage. For a production build runpnpm buildfollowed bypnpm start. - Launch Storybook (optional) to explore isolated UI states:
Storybook runs at
pnpm storybook
http://localhost:6007and uses MSW fixtures to simulate streaming states.
The Postgres adapter work (MILESTONE_POSTGRES_IMPLEMENTATION) ships with Drizzle migrations so schema changes stay deterministic:
pnpm db:generate— diffdatabase/schema.tsintodatabase/migrations/(requiresDATABASE_URLfor Drizzle config).pnpm db:migrate— apply migrations to the database referenced byDATABASE_URL.
You only need these when working on the Postgres backend; Redis/memory flows remain unchanged.
Spin up a disposable Postgres the same way CI will run it:
docker compose up postgres -d
# waits until pg_isready passes, creates companyintel/companyintel db/user mapped to localhost:5432
# Run migrations once the container is healthy
DATABASE_URL=postgres://companyintel:companyintel@localhost:5432/companyintel pnpm db:migrate:dockerWith that in place you can set DATABASE_URL + PERSISTENCE_BACKEND=postgres in .env and run pnpm dev normally.
When you want to prove the Postgres adapter matches in-memory behavior, make sure Docker postgres is running (see above) and then run:
TEST_DATABASE_ALLOW_DROP=true pnpm test:postgresThe script defaults TEST_DATABASE_URL to postgres://companyintel:companyintel@localhost:5432/companyintel_test, aligns DATABASE_URL with it, and sets TEST_DATABASE_ALLOW_DROP=true so the helper can recreate schemas safely. CI can override TEST_DATABASE_URL to its own throwaway DB if needed.
Once Postgres persistence is on, every run populates:
company_profiles— the live profile (company name, tagline, value props, industries, favicon).company_snapshots— structured summaries, overview text, vector-store metadata, timestamps.company_snapshot_pages— normalized scrape payloads handed to the LLMs.
Recommended workflow:
- Set
PERSISTENCE_BACKEND=postgresand run collections normally. - Watch
company_snapshotsfor newstatus = 'complete'rows. - Sync the relevant JSON fields into your own CRM/agent tables and mark the snapshot as consumed.
Example poller:
import postgres from 'postgres';
const sql = postgres(process.env.DATABASE_URL!);
const sinceId = Number(process.env.LAST_SYNCED_SNAPSHOT_ID ?? 0);
const pending = await sql`
select id, summaries, vector_store_id
from company_snapshots
where status = 'complete' and id > ${sinceId}
order by id asc
`;
for (const snapshot of pending) {
// hydrate downstream agents (CRM, article writer, etc.)
process.env.LAST_SYNCED_SNAPSHOT_ID = String(snapshot.id);
}Because the Postgres adapter already normalizes JSON/Zod output, you can reference those fields directly inside other agents (“Resume-ready” talking point: company intel snapshots stream into Postgres via Drizzle, then hydrate downstream prompts via a background sync job).
All UI, server, and API contracts live inside the workspace package @company-intel/feature (see features/company-intel). You can import the pieces you need without relying on the demo app, then install via npm or a local tarball:
// UI
import { CompanyIntelPanel, CompanyIntelProviders } from '@company-intel/ui/company-intel';
// Server + persistence bootstrap
import { createCompanyIntelEnvironment } from '@company-intel/server/bootstrap';
// HTTP handlers (framework agnostic)
import { handleCompanyIntelGet, handleCompanyIntelPatch } from '@company-intel/feature/api/companyIntelRest';
import { createRunStream } from '@company-intel/feature/api/runStream';
import { createChatStream } from '@company-intel/feature/api/chatStream';
// Next.js adapters (optional)
import { toNextResponse, createSseResponse } from '@company-intel/feature/adapters/next/http';Call the new config helpers once during bootstrap so the feature never reaches into global process.env or console logging implicitly:
import { configureCompanyIntelFeature, createLogger } from '@company-intel/feature/config';
configureCompanyIntelFeature({
env: process.env, // replace with your own env loader in serverless contexts
logger: createLogger({ service: 'my-app' }), // optional
});For finer-grained control (e.g., tests), use configureCompanyIntelEnv(customSource) or configureCompanyIntelLogger(customLogger) directly.
- npm registry (recommended):
pnpm add @company-intel/feature. Publish updates withnpm publish --access publicfromfeatures/company-intel. - Tarball / file install:
pnpm pack --filter @company-intel/featureproducescompany-intel-feature-X.Y.Z.tgz, which consumers can install viapnpm add ./company-intel-feature-X.Y.Z.tgz. - Workspace link (monorepo consumers): add
"@company-intel/feature": "workspace:*"in your rootpackage.json.
The integration steps below are identical regardless of distribution method.
For a detailed integration guide (including server bootstrap, API mounting, and SSE expectations) see docs/integration.md.
| Variable | Description | Default |
|---|---|---|
OPENAI_API_KEY |
OpenAI Responses API key used for both structured profile and overview runs. | — |
TAVILY_API_KEY |
Tavily site-mapping and extraction key. | — |
TAVILY_EXTRACT_DEPTH |
Default Tavily Extract depth (basic or advanced). |
basic |
OPENAI_MODEL_STRUCTURED |
GPT-5.1 model id for structured profile output. | gpt-5.1 |
OPENAI_MODEL_OVERVIEW |
GPT-5.1 model id for narrative overview output. | gpt-5.1 |
OPENAI_MODEL_CHAT |
GPT-5.1 model id for the knowledge chat agent. | gpt-5.1 |
STRUCTURED_REASONING_EFFORT |
Default reasoning effort (low/medium/high) for the structured profile agent. |
medium |
OVERVIEW_REASONING_EFFORT |
Default reasoning effort (low/medium/high) for the overview agent. |
medium |
CHAT_REASONING_EFFORT |
Default reasoning effort (low/medium/high) for the snapshot chat agent. |
low |
REDIS_URL |
Optional Redis connection string. When unset, memory persistence is used. | — |
DATABASE_URL |
Optional Postgres connection string used by the new Drizzle adapter. Required when PERSISTENCE_BACKEND=postgres. |
— |
PERSISTENCE_BACKEND |
Force a backend (postgres, redis, or memory). Recommended when both DATABASE_URL and REDIS_URL are set; otherwise the app logs a warning and defaults to Postgres. Defaults to Postgres when only DATABASE_URL is set, Redis when only REDIS_URL is set, else memory. |
— |
ALLOW_ORIGINS |
Comma-separated list of allowed origins for downstream clients. | http://localhost:3000 |
- Mapping: Tavily collects candidate URLs for the target domain and ranks them.
- Scraping: High-signal pages are fetched and normalized for downstream processing.
- Structured profile: GPT-5.1 (Responses API) generates a typed company profile using the
CompanyIntelStructuredOutputSchemaand streams partial deltas. - Narrative overview: A second GPT-5.1 run produces long-form narrative context with reasoning summaries.
- Persistence: Final payloads, snapshots, and page excerpts are stored via the configured
CompanyIntelPersistenceimplementation. - Export: A PDF renderer (React-PDF) builds a branded deliverable available at
/api/company-intel/snapshots/:id/export. - Durability: The runtime coordinator keeps the active snapshot id and progress in Redis so clients can resume streaming or cancel mid-run.
Need a visual? Check docs/ops/workflow.md for the Mermaid systems diagram that walks through the frontend, API, orchestrator, and RAG subsystems end-to-end.
- UI (
app/**,features/company-intel/src/ui/company-intel/**): Next.js App Router screens, TanStack Query providers, and shadcn-style UI shims. UI never imports server code directly; it talks to the API via fetchers inCompanyIntelClientProvider. - API (
app/api/company-intel/**): Route handlers run in the Node.js runtime, coerce HTTP inputs into typed server calls, stream SSE frames, and sanitize responses for the client. - Server (
features/company-intel/src/server/**):createCompanyIntelServerorchestrates the collection workflow, coordinates Tavily/OpenAI integrations, enforces Zod validation, and drives PDF generation. - Persistence (
features/company-intel/src/server/persistence/**): Memory, Redis, and Postgres adapters all satisfyCompanyIntelPersistence, so you can pick your durability story without touching the server or UI layers. Dates are serialized to ISO at boundaries, andreplaceSnapshotPageskeeps page blobs in sync. - Runtime coordinator (
features/company-intel/src/server/runtime/**): Manages single-flight execution, SSE subscriptions, cancellation, and recovery so runs survive disconnects. - Bootstrap (
features/company-intel/src/server/bootstrap.ts):getCompanyIntelEnvironment()resolves config, logging, persistence, Tavily, and OpenAI clients once per process and caches the singleton. - Logging & metrics:
lib/logging.tsemits JSON-friendly logs for stage transitions, including model ids, response ids, and usage metadata when available.
- Both runs use the OpenAI Responses API with GPT-5.1 models configured through environment variables.
- Structured profile responses stream token-level deltas (
response.output_text.delta), which are accumulated and validated against theCompanyIntelStructuredOutputSchemabefore persistence or emission. - Overview responses stream
deltaandreasoning_summary_text.deltaevents; handlers emitoverview-deltaandoverview-reasoning-deltaframes so the UI can present long-form copy and short headlines. - On any validation failure the run aborts fast, emits a
run-errorSSE event, and the snapshot is markedfailed.
All routes live under /api/company-intel and run on the Node.js runtime.
| Method | Route | Notes |
|---|---|---|
GET |
/ |
Returns { data: { profile, snapshots } } with ISO timestamps. |
PATCH |
/ |
Applies sanitized updates (companyName, tagline, overview, primaryIndustries, valueProps, keyOfferings). |
POST |
/preview |
Maps a domain and returns recommended selections before scraping. |
POST |
/ |
Triggers a run. Requires Accept: text/event-stream; the response is an SSE feed that ends with [DONE]. Returns 406 if the header is missing and 409 if a run is already active for the domain. |
GET |
/runs/:snapshotId/stream |
Reconnects to an active run, replays buffered frames, and resumes the live SSE stream. |
DELETE |
/runs/:snapshotId |
Cancels the active run (idempotent). On success the stream emits run-cancelled and the snapshot is pruned. |
GET |
/snapshots/:id |
Returns a fully serialized snapshot (profile summaries, overview text, scrape stats) so the UI can hydrate historical runs or gate chat readiness. |
POST |
/snapshots/:id/chat |
Streams chat completions for a snapshot. Requires Accept: text/event-stream; responds with [DONE] upon completion. |
GET |
/snapshots/:id/export |
Generates a PDF export (Content-Disposition: attachment). |
Each frame is emitted as data: <json>\n\n and the stream terminates with data: [DONE]\n\n.
snapshot-created{ status }status{ stage, completed?, total? }structured-delta{ delta, accumulated, summary? }structured-reasoning-delta{ delta, headlines: string[] }structured-complete{ payload }(payload metadata includesreasoningHeadlines: string[])overview-delta{ delta, displayText? }overview-reasoning-delta{ delta, headlines: string[] }overview-complete{ overview, headlines: string[] }vector-store-status{ status: 'publishing'|'ready'|'failed', error?, vectorStoreId?, fileCounts? }run-complete{ result }run-error{ message }run-cancelled{ reason? }[DONE]
The UI hooks in features/company-intel/src/ui/company-intel/hooks assume this order and will fail fast if malformed frames are encountered.
pnpm lint— ESLint + Prettier with strict rules for server and client bundles.pnpm typecheck—tsc --noEmitagainst the entire workspace.pnpm test— Vitest suites covering serialization, persistence parity, and SSE framing.pnpm build—next buildfor production bundles.pnpm scan— gitleaks secret scan (required to be clean before merge).pnpm prune:exports— runsts-prunewith Storybook stories and Next.js entrypoints filtered out so signal focuses on actionable unused exports.
- Integration tests mock Tavily and OpenAI; real runs require valid API keys.
- Redis persistence uses basic JSON serialization. Add TTLs, indices, or sharding strategies before serving heavy multi-tenant traffic.
- PDF generation runs server-side via
@react-pdf/renderer. Consider queuing long-running exports in production environments.
