Product Requirements Document: StemFlow

Version: 1.0 Type: Open Source / Local-First Web Application Core Concept: AI-Assisted Spatial Reasoning for Scientific Research

1. Executive Summary

StemFlow is a web-based “Visual Lab Partner” designed for individual academic researchers. Unlike linear chatbots (ChatGPT), StemFlow offers an infinite canvas where research is mapped out as a branching tree of logical nodes.

It enforces a specific scientific workflow—the “Observation-Mechanism-Validation” (OMV) Triad—to structure the AI’s reasoning. By allowing users to rate the significance of their findings (“Research Episodes”), StemFlow transforms from a simple diagramming tool into a narrative engine capable of drafting research papers based on the user’s most critical discoveries.

Primary Goal: To reduce the cognitive load of complex research by spatializing logic and automating the suggestion of next steps.

2. The Core Philosophy: “Guided Flexibility”

The application relies on two structural pillars to manage AI context and user focus.

A. The OMV Logic Loop

The AI does not just “chat”; it follows the scientific method loop:

Observation Node (Fact): Input data (Images, PDFs, Text).
Mechanism Node (Hypothesis): AI suggests why the observation happened (citing literature).
Validation Node (Experiment): AI suggests how to test the mechanism.
Result (New Observation): The loop closes and restarts.

B. The “Research Episode” & Narrative Rating

The Unit: A completed Mechanism -> Validation -> Result chain is grouped into an “Episode.”
The Rating: The user rates each Episode from 1 to 5 stars based on scientific significance (not just success).
The Payoff:
- AI Attention: Future suggestions prioritize logic from 5-star episodes.
- Paper Drafting: The “Generate Report” feature uses 5-star episodes as the core content for the Results/Discussion sections.

3. Technical Architecture (Local-First)

Constraint: Zero server-side data storage. Total privacy.

Stack

Frontend Framework: React (Next.js or Vite).
Canvas Engine: React Flow (for node rendering and virtualization).
State Management: Zustand (High performance, transient state).
Local Database: IndexedDB via Dexie.js (Stores full file blobs and text).
File System: File System Access API (Allows “Save Project to Disk”).
AI Layer: Client-side fetch to Anthropic API (Claude 3.5 Sonnet) or OpenAI.
Background Processing: Web Workers (Crucial for PDF parsing and data prep to prevent UI freeze).

Performance Strategy

Bottleneck	Solution
Rendering Lag	Use React Flow’s Viewport Virtualization (only render visible nodes).
Memory Bloat	Store raw images/PDFs in IndexedDB, not React State. Render thumbnails only.
Context Window	Pruning Strategy: Send only the active branch (Ancestors) + Summaries of older nodes.
PDF Latency	Parse text via PDF.js inside a Web Worker.

4. Data Model (JSON Schema)

The Node Object

{
  "id": "node_uuid",
  "type": "OBSERVATION" | "MECHANISM" | "VALIDATION",
  "data": {
    "text_content": "User notes...",
    "attachments": [{ "id": "file_1", "vision_summary": "Auto-generated description of image" }],
    "ai_context_summary": "Compressed summary for history"
  },
  "position": { "x": 0, "y": 0 },
  "parent_ids": ["node_parent"]
}

The Global Project State

{
  "project_id": "stemflow_01",
  "goal": "Cure X via Pathway Y",
  "constraints": ["Low Budget", "No Sequencer"],
  "episodes": [
    {
      "id": "episode_01",
      "node_ids": ["A", "B", "C"],
      "significance_score": 5, // 1-5
      "reasoning": "Confirmed primary hypothesis"
    }
  ]
}

5. Functional Requirements (MVP)

Phase 1: The Canvas (Foundation)

Infinite Whiteboard: Pan, zoom, create nodes.
Node Types: distinct UI styling (color/icon) for O, M, V nodes.
Connection Logic: Drag-and-drop connections.
Local Persistence: Auto-save to IndexedDB; Export to .json.

Phase 2: The Brain (AI Integration)

BYO Key: Settings panel to input API Key.
Vision-to-Text: On image upload, AI generates a text description (saved to vision_summary).
“Generate Next Step”:
- Backend logic (in browser) traverses the graph to find Ancestors.
- Constructs prompt with Global State + Ancestor Summaries + Current Node.
- Renders 3 “Ghost Node” options.

Phase 3: The Scientist (Advanced Logic)

Episode Grouping: Auto-detect M->V->O loops and draw a bounding box.
Rating UI: 5-star widget on the Episode box.
Context Pruning: Implement the “Summary” background worker to compress old nodes.
PDF Parsing: Drag-and-drop PDF to extract text for the context window.

Phase 4: Launch Polish

Export to Report: A “Drafting” button that compiles 5-star episodes into a Markdown document (Abstract/Results).
Templates: Pre-loaded graphs for “Literature Review” or “Wet Lab Experiment.”

6. UX/UI Flow (The “Happy Path”)

Setup: User opens StemFlow → Settings → Pastes API Key → Creates “New Project.”
Input: User drags a PDF onto the canvas. It becomes an Observation Node.
Analysis: User clicks “Analyze.” AI reads PDF text and summaries it.
Ideation: User drags a line out → selects “Suggest Mechanisms.”
Selection: 3 Ghost Nodes appear. User confirms one: “Hypothesis: Protein Misfolding.”
Planning: User clicks “Suggest Validation.” AI proposes “Western Blot.”
Execution (Offline): User performs the blot in real life.
Result: User uploads the photo of the blot to the Validation node.
Rating: System detects the loop is closed. User rates it 5/5.
Writing: User clicks “Draft Paper.” AI uses this 5-star loop to write the core argument.

7. Next Steps for You

Repository Setup: Initialize a Next.js + React Flow project.
Dependencies: Install reactflow, dexie, zustand, lucide-react, pdfjs-dist.
Prototype: Build the “Silent Canvas” first (just nodes and connections, no AI) to test the UI feel.

Antigravity Status: Session Concluded. You have a complete blueprint for StemFlow. Good luck building the future of research interfaces.

Harvey's Garden

Explorer

StemFlow