Talk to us about your test prep platform project.
Tell us the exam you're targeting, your question bank size, your current tech stack, and what your existing tools can't do. We'll scope the right platform and give you a fixed cost.
Building a test prep product for a high-stakes exam but your question bank, adaptive engine, and analytics are three separate tools that don't talk to each other?
Students completing practice tests but unable to understand which specific topics they need to focus on in the time they have left before the exam?
Custom test prep platforms built for standardised exams, professional certifications, and licensing tests -- with the adaptive engine, question bank, and performance analytics your product needs to help students actually improve.
100+ products shipped since 2019. We've built test prep platforms for ed-startups, publishers, and institutions targeting high-stakes exams across academic and professional certification markets.
Structured question bank with topic, difficulty, and question type tagging
Adaptive mock exams that adjust difficulty based on student performance
Per-topic performance analytics with score prediction and weak area identification
Spaced repetition, study plan generation, and mobile app for iOS and Android
RaftLabs builds custom test prep platforms and apps for ed-startups, publishers, and institutions targeting standardised tests, professional certifications, and licensing exams. Custom test prep development covers question bank management, adaptive mock exams, timed practice tests, performance analytics by topic and question type, spaced repetition, weakness identification, score prediction, study plan generation, and mobile apps for iOS and Android. Custom builds make sense when off-the-shelf tools can't support your question types, adaptive logic, or exam-specific interface requirements. Most test prep projects deliver in 10--14 weeks at a fixed cost with full source code ownership.
A student preparing for the GMAT has six weeks, a specific score gap to close, and content they understand vs. content that costs them time on every practice test. Generic quiz tools deliver questions and show a percentage score. That tells the student nothing about which quant topics to prioritise or how their pacing compares to the real exam time limits. Purpose-built test prep platforms connect question performance to study planning, adapt difficulty to current skill level, and show students exactly where their time is best spent.
Custom test prep development builds the adaptive engine, question bank, and analytics layer around the specific exam your product targets -- not around a generic quiz format.
Structured question database with topic, subtopic, difficulty level, and question type tagging for precise filtering and adaptive selection. Each question carries Bloom's taxonomy tags -- Remembering, Understanding, Applying, Analysing, Evaluating, Creating -- so content teams can ensure full cognitive coverage across each exam domain, not just factual recall. Rich text and media support handles diagram-based, graph-based, passage-based, and data interpretation questions including embedded images, tables, and mathematical expressions via MathJax or KaTeX rendering.
Version control for question updates ensures edits don't affect historical performance records -- students who completed earlier attempts see the question as it appeared at the time. Bulk import from existing content libraries supports CSV, XLSX, and QTI 2.1 format ingestion. Editorial review workflow lets content teams flag, review, and approve questions before they enter the live pool, with reviewer assignment, comment threads per question, and approval gate before publication. Item-level analytics show which questions have high skip rates, unusual time spent, or poor discrimination -- signals that a question needs revision rather than more exposure.
Computerized Adaptive Testing (CAT) engine using Item Response Theory -- specifically the 3PL (three-parameter logistic) model -- to estimate each student's ability level (theta) in real time. The 3PL model accounts for question difficulty (b-parameter), discrimination (a-parameter), and guessing probability (c-parameter), which means the engine does not simply increase or decrease difficulty on correct/incorrect responses. It applies the Fisher Information criterion to select the next item that provides maximum information at the student's current estimated ability, narrowing the measurement error as efficiently as possible.
For smaller question banks where full IRT calibration isn't feasible, a rule-based adaptive engine with configurable difficulty step logic is available as an alternative. Exam simulation matches the real exam's timing, interface conventions, section structure, and question format so the mock experience transfers directly to the live test day. Section-level timers, passage navigation controls, calculator availability, and scratch pad features match the target exam specification. Mid-exam penalty controls handle negative marking schemes where applicable. Post-exam review presents full answer explanations with step-by-step AI-generated reasoning for each incorrect response, plus time-per-question breakdown against per-question benchmarks for the target exam.
Per-student dashboard showing accuracy by topic and subtopic across all practice attempts, with psychometric indicators including item-total correlation scores and Cronbach's alpha reliability coefficients per practice test. These measures tell content teams whether a test is measuring consistently, not just whether a student passed or failed. Time spent per question is compared against per-question benchmarks for the target exam so pacing problems are visible before test day.
Percentile ranking compares each student's performance against the cohort on the platform -- or against a defined baseline group -- so students understand where they stand relative to peers who are competing for the same scores. Improvement trends track weak area progress across multiple practice attempts so students see whether focused study is moving the needle. Score prediction models the student's current practice performance against the scoring scale of the target exam -- for GMAT, GRE, SAT, and similar standardised tests -- giving a realistic point estimate and confidence interval. Target score gap tracking shows exactly how many points a student needs to gain across which topics to reach their stated goal, updated after each practice session.
Vocabulary, formula, and concept decks with spaced repetition scheduling based on a modified SM-2 algorithm that calculates the next review interval from the student's recall quality rating (0-5 scale). Cards rated poorly are surfaced again within 24 hours; cards rated well extend their interval exponentially -- 1 day, 3 days, 7 days, 14 days -- so review time is concentrated on material that actually needs work rather than running through content the student already knows.
Content team-generated decks cover exam-specific vocabulary, mathematical formulas, legal principles, medical mnemonics, or whatever knowledge category the target exam tests. Student-created decks from their own notes are supported with a simple card editor. Deck organisation mirrors the exam's domain structure so students can drill by content area aligned to their weak topic list from practice test analytics. Progress tracking by deck and domain shows percentage of cards in each mastery tier -- new, learning, young, mature -- so students have a clear picture of knowledge coverage before the exam. Leeched cards (cards rated poorly many consecutive times) are flagged for instructor review rather than continuing to frustrate the student with repetition that isn't working.
Personalised study plan generated from four inputs: the student's exam date, current diagnostic or practice test score, target score, and available study hours per week. The plan algorithm calculates the total study hours available before the exam, allocates them across content domains weighted by the student's measured weak areas from analytics, and assigns daily tasks that are achievable within the time blocks the student specified. A student targeting a 700 on the GMAT with 8 weeks and 15 hours per week sees a different daily task sequence than a student with the same target but 4 weeks and 20 hours.
Dynamic plan adjustment runs after each practice session -- if a student's accuracy in a topic crosses a mastery threshold, the plan reallocates the hours freed up toward remaining gap areas rather than continuing to drill a topic already under control. This prevents students from over-studying comfortable content while leaving real weaknesses unaddressed. Calendar integration exports study sessions to Google Calendar or Outlook so reminders arrive in the student's normal scheduling workflow. Push notifications and email reminders fire at the scheduled session start time. Progress-to-target tracking shows whether the student is on pace to reach their score goal given current improvement velocity, updated daily.
iOS and Android native or React Native app with the full practice test, flashcard, and analytics experience on mobile. The mobile app supports proctored practice tests for platforms integrating with Proctorio or ExamSoft API -- enabling controlled-environment practice sessions where students want to replicate test-day conditions more closely. Proctoring features include camera-based identity verification, screen recording, and behavioural monitoring during the session.
Offline content download lets students pre-cache practice questions, flashcard decks, and study plan tasks before going offline -- useful for commute study and low-connectivity situations. Completed offline attempts sync back to the server when connectivity is restored, updating performance analytics and study plan progress without requiring manual action from the student. Push notifications handle daily study reminders, streak maintenance, new content releases, and score milestone celebrations. Cross-device progress sync ensures a student who switches between phone and desktop sees identical analytics, flashcard progress, and study plan status everywhere. Biometric authentication (Face ID, fingerprint) provides fast sign-in without entering credentials on each session.
Frequently asked questions
We've built test prep platforms for standardised academic tests, professional certification exams, and licensing tests across multiple markets. Exam categories include graduate admissions tests (GMAT, GRE, LSAT), English proficiency tests (IELTS, TOEFL), medical licensing (USMLE, PLAB), professional certifications (CPA, CFA, PMP, AWS, CompTIA), and national curriculum exams across multiple countries.
The underlying architecture -- adaptive engine, question bank, analytics -- is the same across exam types. What changes is the question format support, exam interface conventions, and scoring model for each specific test. GMAT adaptive logic targets the official GMAT scoring scale and mirrors the exam's section-adaptive structure. USMLE platforms require clinical vignette question formatting and a different psychometric calibration profile than multiple-choice factual recall. Professional certification exams from bodies like PMI, AICPA, or AWS have domain weighting specifications that must be reflected in the question sampling logic. We build the question format support, exam interface, and scoring model to match the specific exam your product targets.
An adaptive exam engine selects the next question based on the student's performance on previous questions in the same session. The two main technical approaches are Item Response Theory (IRT) and rule-based adaptation.
IRT -- specifically the 3PL model used in high-stakes CAT implementations -- models each question's difficulty parameter (b), discrimination parameter (a), and guessing probability (c). During the exam, the engine maintains a running estimate of the student's latent ability (theta) and applies the Fisher Information criterion to select the item from the question bank that provides the most measurement information at the current theta estimate. This narrows the ability estimate with fewer questions than fixed-form exams, which is why adaptive tests can achieve equivalent measurement precision in fewer items. IRT requires a calibrated question bank -- typically 300 or more items per domain with pre-equating data from prior administrations. Rule-based adaptation uses simpler logic: correct response increases the difficulty tier; incorrect response decreases it. Rule-based works well for smaller question banks (under 200 items per domain) or for exams where content coverage across topics matters as much as difficulty targeting. We recommend the right approach based on your question bank size, exam structure, and whether psychometric calibration data is available from previous versions of the assessment.
Yes. Professional certification and licensing exams often have specific requirements that differ from academic tests: domain weighting specified by the certifying body (for example, the PMP exam blueprint allocates defined percentages to Predictive, Agile, and Hybrid domains), pass/fail scoring with scaled cut scores rather than percentile rankings, scenario-based or performance-based questions that don't fit a standard multiple-choice format, and content security requirements that restrict question exposure.
We build the question bank structure with Bloom's taxonomy tagging aligned to the exam's knowledge framework, the exam interface matching the proctored testing environment as closely as the platform allows, and the scoring model to match the specific pass/fail or scaled score methodology the certifying body uses. For exams with domain weighting specifications, the question sampling logic enforces the required domain distribution in each practice test. We've built platforms for medical, legal, financial, and technology certification exams. Content security controls -- item exposure rate limits, scrambled answer order, and session-level question deduplication -- are included where the certifying body's guidelines require them.
A core test prep platform -- question bank with Bloom's taxonomy tagging, timed practice tests, basic per-topic performance reporting, and a mobile-responsive web app -- typically runs $25,000--$55,000. This delivers a functional product you can use with real students and gather usage data from.
A full-featured platform with a CAT adaptive engine using IRT 3PL scoring, psychometric analytics (Cronbach's alpha, item-total correlation), per-topic accuracy and pacing dashboards, percentile ranking against cohort, AI-generated answer explanations with step-by-step reasoning, spaced repetition with SM-2 scheduling, personalised study plan generation with target score tracking, proctoring integration via Proctorio or ExamSoft API, and native iOS and Android apps typically runs $55,000--$130,000. Cost depends on question type complexity, adaptive engine sophistication, analytics depth, proctoring requirements, and mobile app scope. We scope every project before pricing it and provide a fixed cost proposal before development starts.
What clients say
Three-year average engagement. Founders and operators describing the work in their own words. No marketing varnish.

RaftLabs has been an exceptional partner. From the start, they became more than just a service provider, they embraced our vision with their expertise and dedication.
01 / 02
Tell us the exam you're targeting, your question bank size, your current tech stack, and what your existing tools can't do. We'll scope the right platform and give you a fixed cost.