2024–2025 · Personal project

AI Learning Platform

Upload your notes, build question banks, study adaptively, learn with friends. A social learning platform with a multi-provider LLM router, spaced-repetition scoring, and collaborative sharing built on Supabase RLS.

Next.js · Supabase · TypeScript · Multi-LLM

The problem

Students accumulate lecture slides, notes, and readings across a term but rarely have a structured way to test themselves against that material. Flashcard apps require manual entry. Past papers cover the wrong syllabus. The gap between “I've read this” and “I know this” goes unmeasured until the exam.

The idea was simple: upload whatever material you have, specify how many questions and at what difficulty, and get a ready-to-use question bank back. Over a term you accumulate banks per module. Then the app helps you study — not just by testing you, but by noticing which areas you actually struggle with and pushing those back to the surface.

My role

Solo build from idea through to production. I designed the data model, built the document ingestion and LLM generation pipeline, wrote the adaptive scoring algorithm, and implemented the sharing system. The project was also a deliberate exercise in working with multiple LLM providers and building resilience at that layer rather than assuming any single provider is always available.

— Architecture

Frontend

Framework

Next.js 14 · TypeScript · App Router

Tailwind CSS · Radix UI primitives

State

React Server Components + client islands

File upload

Supabase Storage — PDF, DOCX, TXT

Auth

Supabase Auth — email, magic link, OAuth

Backend

API

Next.js API routes · Edge-compatible

Database

Supabase PostgreSQL — RLS for sharing

LLM routing

Custom router — GPT-4o, Claude, Gemini

Queue

Supabase pg_cron — async generation jobs

Parsing

Document text extraction pipeline

Adaptive engine

Signals

Response time + correct/incorrect history

Algorithm

Weighted error rate with time-decay factor

Resurfacing

Priority queue — weak questions requeued

Persistence

Per-user, per-question stats in Postgres

Sharing

Row-level security — read/write per user

— Key decisions

LLM router with automatic provider fallback

Context

Coupling to a single provider meant one rate limit or outage could break the entire generation pipeline. Different models also have different cost/quality trade-offs for different question types.

Outcome

A thin routing layer maps question-generation tasks to the most suitable available model. If the primary provider is slow or rate-limited, the router falls back to the next in the preference list transparently. No generation request surfaces a provider error to the user.

Supabase RLS for social sharing instead of a permissions service

Context

Users needed to share individual questions and entire banks with friends — read-only or editable. A separate permissions service would add significant infrastructure for what is fundamentally a data-access problem.

Outcome

Postgres row-level security policies enforce share permissions at the database layer. A shares table records (owner, recipient, resource_id, access_level). Every query is automatically scoped — no application-layer permission checks needed, no risk of forgetting one.

Adaptive selection based on time and accuracy, not just accuracy

Context

Tracking only right/wrong misses questions a user answers correctly but slowly — a sign they're not yet confident. Time alone is noisy (distractions, re-reads). Neither signal is sufficient on its own.

Outcome

A weighted score combines normalised response time with a rolling error rate. Questions above the threshold are promoted into a high-priority pool and resurface more frequently. The effect is that genuinely weak areas receive more repetitions without the user having to identify them.

Question banks as a first-class entity

Context

Early designs treated question banks as simple lists — a folder of questions. That made cross-bank quizzes, sharing, and collaborative editing awkward to model.

Outcome

Banks are their own database entity with many-to-many membership to questions. A question can live in multiple banks. Quiz sessions are composed from one or more banks. Sharing a bank grants access to its member questions without duplicating data.

— Technical depth

Document ingestion pipeline

Users upload PDFs, Word documents, or plain text. The pipeline extracts text, chunks it into context-window-safe segments, and passes each chunk to the LLM with a structured prompt that specifies question count, difficulty level, and output format. Questions are returned as structured JSON, validated against a schema, and written to Postgres in a single transaction — either the entire bank lands or nothing does.

✓01UploadPDF · DOCX · TXT

✓02ExtractRaw text

✓03ChunkContext windows

✓04GenerateLLM → JSON

✓05ValidateSchema check

✓06StorePostgres txn

← scroll →

LLM routing

The router maintains a ranked list of providers with their current status. On each generation request it selects the highest-ranked available provider, sends the request, and monitors the response time. If the provider exceeds a latency threshold or returns a rate-limit error, the router marks it degraded and retries immediately against the next in the list. The caller never sees a provider-specific error — only success or a single unified failure if all providers are unavailable.

This also made it straightforward to route different task types to different models: cheaper, faster models for simple factual questions; stronger models for analytical or application-level difficulty.

All providers healthy

incoming request

GPT-4o

Analytical · complex questions

340ms

Claude Sonnet

Factual · reasoning tasks

290ms

Gemini Pro

Broad coverage · fallback

410ms

response to caller

The caller receives success or a single unified error — never a provider-specific message.

Adaptive scoring

Each question session records two signals per attempt: whether the answer was correct, and the response time normalised against the user's median for that difficulty band. These combine into a confidence score:

score = (error_rate × 0.7) + (slow_rate × 0.3)

Questions above a score threshold are promoted into a high-priority pool. When the quiz engine selects the next question, it samples from the high-priority pool with a higher probability than from the general pool. This means weak questions resurface without any explicit scheduling — the distribution does the work.

Confidence score formula

score = (error_rate × 0.7) + (slow_rate × 0.3)

Questions scoring ≥ 0.40 are promoted to the high-priority pool

Weak question

Missed 3 of 4 attempts, consistently slow

promoted

error_rate × 0.70.50

slow_rate × 0.30.17

score

threshold 0.400.68

Borderline

Some errors, moderate response time

general pool

error_rate × 0.70.28

slow_rate × 0.30.09

score

threshold 0.400.37

Strong question

Rarely wrong, quick and confident

general pool

error_rate × 0.70.07

slow_rate × 0.30.04

score

threshold 0.400.11

Sharing via RLS

Supabase's row-level security lets you attach policies directly to tables. A shares table records who shared what with whom and at what access level. The RLS policies on questions and question_bankscheck for a matching shares row before allowing a read or write. No application code enforces this — the database does. A leaked API route or a missing auth check can't accidentally expose another user's data because the query will return nothing.

— Outcomes

<3s

avg. bank generation time

— —

provider errors surfaced to users

— —

LLM providers in the router

RLS

enforces all share permissions

— What I'd do differently

The document chunking strategy was naive at first — fixed character windows with no regard for semantic boundaries. Chunks split mid-sentence confused the model and produced malformed questions. I'd start with paragraph-aware chunking and add overlap between chunks from the beginning rather than retrofitting it.

The adaptive algorithm weights are hand-tuned constants. They work well in practice but have no principled basis. With more usage data I'd run an offline evaluation against known learning curves to find weights that minimise time-to-mastery rather than guessing.

← All case studies