Skip to content
Undetect Research
Research Intelligence

Decompiling Kasada's JavaScript VM

A thesis-level review of a Kasada JavaScript virtual-machine decompiler, focused on bytecode decoding, dynamic interpreter analysis, IR construction, control-flow recovery, and the limits of the evidence.

Paper analyzed: Decompilation of Kasada's JavaScript Virtual Machine, Bachelor's Thesis Computing Science, January 6, 2025.

Source type
Thesis

Bachelor's thesis in Computing Science, dated January 6, 2025

PDF pages
47

Source PDF length reported by pdfinfo

Figure examples
5

CFG, loop, conditional, and traversal figures extracted for review

Benchmark
None

No corpus size, accuracy rate, or runtime measurement is reported in the PDF

1. Executive Summary

Patrick van den Bosch's thesis describes a decompiler for Kasada's browser-side JavaScript virtual machine. The work treats the protection as a bytecode and interpreter problem: recover the bytecode parser, map per-instance control signals and opcode order, disassemble into an intermediate representation, reconstruct control flow, and generate readable JavaScript.

The strongest contribution is the decomposition of a VM-obfuscated anti-bot script into reviewable layers. The paper does not present a broad measurement of current Kasada deployments; it presents an implementation narrative with worked examples for decoding, value parsing, CFG construction, loop and conditional structuring, exception handling, and generated output.

Operational takeaway: VM-based bot protection should be reviewed as a moving interpreter plus bytecode system. Fixed opcode numbers, fixed value markers, or linear bytecode scans are brittle assumptions under the thesis model.

2. Research Question

The thesis asks whether classical decompilation techniques can be adapted to Kasada's JavaScript VM so that custom bytecode can be translated back into structured, readable JavaScript.

Interpreter discovery

Locate the bytecode parser, control-signal array, and opcode function array inside the obfuscated JavaScript instance.

Bytecode semantics

Convert encoded bytecode into values, instructions, registers, memory operations, jumps, function boundaries, and exception targets.

Graph recovery

Build function-level CFGs that preserve converged branches, loop edges, conditional exits, returns, catch handlers, and finally handlers.

Readable output

Structure the graph back into JavaScript constructs such as while, do-while, if, if-else, try, catch, and finally.

3. Target System

The paper describes Kasada's client-side protection as a browser VM script that collects signals and posts them for token generation, plus a second script that attaches that token to outgoing requests. This report focuses on the VM and bytecode layer, not on server-side scoring or token validation.

ComponentPaper descriptionWhy it matters for review
Bytecode arrayA sequence of 32-bit signed integers produced from an encoded string.The decompiler must recover instruction boundaries and literal values before higher-level analysis is possible.
Bytecode parserA JavaScript function that interprets primitive values, strings, booleans, null, undefined, doubles, and register references.The parser changes across instances, so the analysis must infer behavior from the current script.
Control-signal arrayA per-instance marker array used by the parser to distinguish value types.Value markers should be treated as instance-local, not fixed Kasada-wide constants.
Opcode arrayAn array of opcode functions whose order varies by VM instance.Opcode identification must come from function behavior and position analysis, not stable numeric opcode IDs.

4. Decompiler Pipeline

The thesis uses SWC to parse the obfuscated VM JavaScript into an AST and then analyzes that AST to locate interpreter components. After the bytecode is decoded and parsed, the disassembler emits a human-readable intermediate representation and builds basic blocks from reachable instruction pointers.

StagePaper methodReview note
Decode bytecode stringUse a custom alphabet and delimiter-like offset to turn encoded characters into integer bytecode values.Section 3.3 is a useful example of treating obfuscation as a parser, not a string-cleanup problem.
Parse valuesMap integers, doubles, strings, booleans, null, undefined, and register references from control-signal-driven parser branches.The sample mapping is explicitly an instance mapping, not a fixed signature.
Identify opcodesAnalyze the opcode function array and infer opcode assignments for the current VM instance.This protects the decompiler against opcode-order randomization described in the thesis.
Disassemble to IRRepresent assignments, property operations, memory cells, arrays, functions, branches, returns, throws, and exception operations.The IR is the bridge between VM bytecode and high-level JavaScript code generation.
Discover basic blocksTraverse reachable targets from the entrypoint, queueing jumps, function definitions, catch targets, and finally targets.The queue-based discovery avoids assuming source bytecode order is execution order.

5. Control Flow Recovery

Control-flow recovery is the center of the thesis. The decompiler builds per-function CFGs and then structures loops, exception handlers, and conditionals before code generation.

Function control flow graph with repeated conditional checks and a converged return node.
Figure 5.1 crop: a function CFG where repeated conditional checks converge on a return node.
Control flow graph for a try-catch-finally structure with a loop in the try body.
Figure 5.2 crop: exception edges, finally flow, and a loop inside the try body.
  • Loop analysis. The thesis starts with Johnson's algorithm for elementary circuits, then classifies loops as while, do-while, or infinite-loop-with-break patterns based on header, latch, condition, and external-exit structure.
  • Exception analysis. Exception handlers are structured before ordinary conditionals because catch and finally edges can otherwise be misread as branch edges.
  • Conditional analysis. The decompiler uses a Cifuentes-style structuring approach and dominance information to find one-way and two-way conditional regions and their follow nodes.
Loop control flow patterns for do-while, infinite, and while loops.
Figure 6.1 crop: loop patterns used to classify do-while, infinite, and while loops.
One-way and two-way conditional control flow patterns.
Figure 6.2 crop: one-way and two-way conditional patterns used during structuring.

6. Evidence

The thesis evidence is qualitative. It includes worked bytecode and parser examples, an IR example, CFG figures, explicit loop and conditional patterns, a try-body traversal figure, and a generated JavaScript listing. It does not include a public benchmark or measured coverage across a corpus of Kasada VM instances.

Parser mapping exampleThe paper maps positions 0-5 to number, true, string, false, null, and undefined for one control-signal example.
IR exampleA function entry block beginning at bytecode position 53 shows register assignments, memory assignments, branches, jumps, and convergence.
Generated outputListing 1 demonstrates generated JavaScript with register declarations, memory assignment, fallback calls, and a final return.
Evaluation scopeChapter 9 claims a working decompiler and successful reconstruction of nested loops and exception handlers, without quantitative metrics.
Graph traversal figure for generating a try body inside try-catch-finally control flow.
Figure 7.1 crop: graph traversal constrained to a try body during code generation.

7. Code Artifacts

These local review helpers are reconstructed or modernized from the thesis descriptions. They are not the author's original decompiler and do not reproduce Kasada production bytecode.

Custom alphabet bytecode decoder Modernized from Section 3.3
Purpose
Demonstrate the custom alphabet and delimiter-style decoding scheme used to turn an encoded bytecode string into integer values.
Paper basis
Section 3.3, pages 16-17.
Caveat
The sample encoder creates local test input. It does not reproduce Kasada's actual alphabet, delimiter, string handling, or bytecode.
;(async () => {
  const alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
  const delimiter = 40;
  const base = alphabet.length - delimiter;

  function encodeNumber(value) {
    const chars = [];
    let remaining = value;

    while (remaining >= delimiter) {
      const charIndex = delimiter + ((remaining - delimiter) % base);
      chars.push(alphabet[charIndex]);
      remaining = (remaining - charIndex) / base;
    }

    chars.push(alphabet[remaining]);
    return chars.join("");
  }

  function decodeBytecode(encoded) {
    const values = [];

    for (let i = 0; i < encoded.length; ) {
      let value = 0;
      let multiplier = 1;

      while (i < encoded.length) {
        const charIndex = alphabet.indexOf(encoded[i++]);
        if (charIndex < 0) throw new Error("Character is not in alphabet");

        value += multiplier * (charIndex % delimiter);
        if (charIndex < delimiter) break;

        value += delimiter * multiplier;
        multiplier *= base;
      }

      values.push(value | 0);
    }

    return values;
  }

  const sampleValues = [1, 14, 127, 2048];
  const encoded = sampleValues.map(encodeNumber).join("");
  const result = { encoded, decoded: decodeBytecode(encoded), sampleValues };

  console.log(result);
  return result;
})()
Control-signal value reader Reconstructed from Sections 3.4.1-3.4.3
Purpose
Show how a bytecode reader can interpret small integers, typed constants, strings, booleans, null, undefined, and register references through a per-instance control-signal array.
Paper basis
Sections 3.4.1-3.4.3, pages 17-20.
Caveat
This uses toy control signals and a toy stream. It demonstrates parser shape, not Kasada's complete bytecode format.
;(async () => {
  const controls = [6, 18, 14, 2, 20, 40];
  const stream = [
    (21 << 1) | 1,
    controls[1],
    controls[3],
    controls[4],
    controls[5],
    controls[2],
    2,
    88,
    107,
    7 << 5
  ];

  function decodeStringInteger(k) {
    return (k & 0xffffffc0) | ((k * 59) & 0x3f);
  }

  function readValue(state) {
    const r = state.stream[state.ip++];

    if (r & 1) return { type: "integer", value: r >> 1 };
    if (r === controls[1]) return { type: "boolean", value: true };
    if (r === controls[3]) return { type: "boolean", value: false };
    if (r === controls[4]) return { type: "null", value: null };
    if (r === controls[5]) return { type: "undefined", value: undefined };

    if (r === controls[2]) {
      const length = state.stream[state.ip++];
      let value = "";

      for (let i = 0; i < length; i++) {
        value += String.fromCharCode(decodeStringInteger(state.stream[state.ip++]));
      }

      return { type: "string", value };
    }

    return { type: "register", value: `r${r >> 5}` };
  }

  const state = { ip: 0, stream };
  const decoded = [];
  while (state.ip < state.stream.length) decoded.push(readValue(state));

  const result = { controls, stream, decoded };
  console.log(result);
  return result;
})()
Simple loop pattern classifier Inferred from Section 6.1 and Figure 6.1
Purpose
Convert the loop-pattern descriptions into a compact classifier for already-isolated toy loop shapes.
Paper basis
Section 6.1 and Figure 6.1, pages 31-32.
Caveat
The real decompiler first finds cycles with Johnson's algorithm and reasons over a full CFG. This snippet only classifies simplified patterns.
;(async () => {
  function classifyLoop(pattern) {
    if (pattern.headerHasCondition && pattern.latchJumpsToHeader) return "while";
    if (pattern.latchHasCondition && pattern.latchJumpsToHeader) return "do-while";
    if (pattern.latchJumpsToHeader && pattern.hasExternalBreak) return "infinite-with-break";
    if (pattern.latchJumpsToHeader) return "infinite";
    return "not-a-loop";
  }

  const samples = {
    whileLoop: {
      headerHasCondition: true,
      latchHasCondition: false,
      latchJumpsToHeader: true,
      hasExternalBreak: false
    },
    doWhileLoop: {
      headerHasCondition: false,
      latchHasCondition: true,
      latchJumpsToHeader: true,
      hasExternalBreak: false
    },
    infiniteLoop: {
      headerHasCondition: false,
      latchHasCondition: false,
      latchJumpsToHeader: true,
      hasExternalBreak: true
    }
  };

  const result = Object.fromEntries(
    Object.entries(samples).map(([name, pattern]) => [name, classifyLoop(pattern)])
  );

  console.log(result);
  return result;
})()

8. Limitations

  • No data flow analysis. The paper's explicit main limitation is the absence of DFA, which would improve data relationship tracking, variable naming, compound expression identification, and generated-code optimization.
  • No quantitative evaluation. The PDF does not report a corpus size, number of VM instances, opcode-identification accuracy, CFG-recovery accuracy, JavaScript equivalence tests, runtime, or scalability measurements.
  • No public implementation link. The thesis says sensitive implementation details are intentionally withheld, so the decompiler cannot be replicated from the PDF alone.
  • Current-production scope is unverified. Treat the Kasada details as the thesis author's analyzed system description, not as an independently validated statement about Kasada's current production behavior.
  • Generated JavaScript remains qualitative evidence. The example output shows feasibility, but the PDF does not prove semantic equivalence across a broad set of bytecode programs.

9. Reviewer Notes

The extracted source identifies the work as Patrick van den Bosch, Decompilation of Kasada's JavaScript Virtual Machine, Bachelor's Thesis Computing Science, January 6, 2025. The PDF has no DOI, arXiv URL, public repository, or complete institution metadata in the extracted title-page text.

  • Keep the page in Draft until a reviewer confirms the thesis label, venue/institution wording, and PDF download availability.
  • Do not convert the working-decompiler claim into a broad statement about Kasada VM coverage or current anti-bot bypass capability.
  • Scope dynamic opcode and control-signal variation to the analyzed VM instances described by the thesis.
  • The source PDF path was normalized to content/pdf/2025 - Decompilation of Kasadas JavaScript Virtual Machine.pdf; the display title preserves the apostrophe in Kasada's.
Appendix
Updated 2026-06-20

Glossary

Linked terms for reviewing the Kasada JavaScript VM decompilation thesis: browser-side signal collection, VM-based obfuscation, bytecode parsing, control-flow recovery, decompilation limits, and tooling.
DOM and Browser APIs
Document Object Model (DOM)
Context: The raw thesis text lists DOM as a browser capability that a JavaScript VM can reach through its host JavaScript interface or FFI.
Meaning: The DOM is the browser document API that exposes HTML, SVG, and XML documents as objects and nodes that scripts can inspect or modify.
HTTP cookies and request headers
Context: The report notes that Kasada's second client-side script attaches a returned token to later requests either as a header or as a cookie.
Meaning: HTTP cookies store small name/value records for later requests, while HTTP headers carry request and response metadata such as authentication or custom token fields.
JavaScript try...catch...finally
Context: The decompiler converts VM exception-handler instruction pointers into JavaScript try, catch, and finally structures.
Meaning: `try...catch...finally` is JavaScript's statement form for running protected code, handling thrown exceptions, and executing cleanup code before control leaves the construct.
JavaScript Number, NaN, and Infinity
Context: The bytecode parser example reconstructs IEEE 754-style floating-point values and handles NaN and Infinity cases explicitly.
Meaning: JavaScript `Number` values are double-precision floating-point values, with special numeric values such as `NaN` and positive or negative `Infinity`.
JavaScript bitwise operators
Context: The reconstructed parser uses bitwise tests and shifts to decode small integers, string characters, and floating-point components from 32-bit integer bytecode values.
Meaning: JavaScript bitwise operators coerce operands to integer-like bit patterns and operate on those bits, which makes them useful for compact encodings and binary parsing.
Browser Environment and Obfuscation Concepts
Browser signals and browser fingerprinting
Context: The thesis describes a browser-side VM script that collects signals used to distinguish human visitors from automated systems.
Meaning: Browser fingerprinting combines observable browser, device, network, or environment characteristics; a bot-protection signal can be one such observable input to a server-side decision.
Browser automation / WebDriver
Context: The report tags the thesis as relevant to browser automation and anti-bot research because Kasada's system is framed around automated versus human traffic.
Meaning: Browser automation is programmatic control of a browser; WebDriver is the W3C browser automation protocol and is one common automation surface considered by detection systems.
Custom bytecode
Context: The decompiler decodes an encoded string into an array of 32-bit integers and then interprets those integers as VM instructions and operands.
Meaning: Bytecode is an intermediate instruction encoding for an interpreter or virtual machine; in this thesis it is Kasada-specific rather than browser-engine JavaScript bytecode.
Opcode array and instruction set
Context: The thesis says the order of opcode functions varies between VM instances, so the decompiler infers the current opcode assignments instead of relying on stable opcode numbers.
Meaning: An opcode identifies the operation to execute in an instruction set; an opcode array is a dispatch table from encoded operation identifiers to handler functions.
Control-signal array
Context: The Kasada parser example uses a per-instance array of markers to distinguish numbers, strings, booleans, null, undefined, doubles, and register references.
Meaning: In this report, a control-signal array is the thesis author's name for instance-specific marker values consumed by the bytecode parser; it is not presented as a standard web-platform term.
Needs review: no external canonical reference was found for Kasada-specific control signals; these links explain the related VM-bytecode and JavaScript bit parsing mechanics.
Program Analysis and Decompilation
Abstract Syntax Tree (AST)
Context: The decompiler uses SWC to parse obfuscated JavaScript into an AST and then locates parser, control-signal, and opcode structures in that tree.
Meaning: An AST is a structured tree representation of parsed source code, used by JavaScript tooling to inspect and transform program syntax.
Intermediate Representation (IR)
Context: The thesis converts VM bytecode into a human-readable IR before CFG construction and JavaScript code generation.
Meaning: An IR is a compiler or analysis representation between source and final output; here it models registers, memory cells, expressions, jumps, returns, and exception operations.
Basic block
Context: The disassembler keys basic blocks by bytecode instruction pointer and can split blocks when converged paths share code.
Meaning: A basic block is a straight-line code region with one entry and one exit, commonly used as the node unit for CFG construction.
Control Flow Graph (CFG)
Context: The thesis builds per-function CFGs whose nodes are basic blocks and whose edges represent jumps, conditionals, returns, and exception flows.
Meaning: A CFG is a directed graph representation of possible control transfers through a function or program region.
Control Flow Analysis (CFA)
Context: The control-flow recovery section structures loops, exception handlers, and conditionals so the generator can emit high-level JavaScript constructs.
Meaning: Control-flow analysis reasons over CFG structure to recover or verify possible execution paths and higher-level control regions.
Needs review: the Allen DOI resolves through ACM, which may require library or browser access.
Data Flow Analysis (DFA)
Context: The paper's explicit main limitation is the absence of DFA, which would improve data relationship tracking, variable naming, compound expression recovery, and output optimization.
Meaning: DFA tracks how values, definitions, and uses move through a program; liveness and reaching-definition analyses are common examples.
Dominator tree / dominance
Context: The conditional-structuring algorithm looks for follow nodes using dominance information and Cooper, Harvey, and Kennedy's dominance algorithm.
Meaning: A node dominates another when every path from the function entry to the second node must pass through the first; a dominator tree records immediate dominance relationships.
Breadth-first and depth-first traversal (BFS / DFS)
Context: The disassembler uses a queue/BFS-style traversal to discover reachable instruction pointers, while code generation uses a modified DFS over structured CFG nodes.
Meaning: BFS explores graph nodes by increasing distance from the start node, while DFS follows paths deeply before backtracking.
Johnson's algorithm / elementary circuits
Context: Loop detection starts by finding elementary circuits in the CFG before classifying while, do-while, and infinite-loop patterns.
Meaning: Johnson's algorithm enumerates simple directed cycles; the thesis applies that cycle discovery step to candidate loop regions in a CFG.
Needs review: the SIAM DOI resolves to the publisher landing page, which may challenge or require institutional access.
Cifuentes-style structuring
Context: The conditional reconstruction section describes a Cifuentes-style algorithm for finding one-way and two-way conditionals and their follow nodes.
Meaning: Structuring in decompilation converts low-level jumps and regions back into higher-level constructs such as if, if-else, loops, and exception blocks.
Needs review: this points to Cifuentes' thesis record as a source lead; the specific 1993 structuring paper may require library access.
Tools and Libraries
SWC (Speedy Web Compiler)
Context: The thesis uses SWC to parse obfuscated JavaScript into an AST before identifying bytecode parser and opcode-array structures.
Meaning: SWC is a Rust-based JavaScript and TypeScript compiler platform with parser and transform APIs exposed through packages such as `@swc/core`.
petgraph
Context: The raw thesis text states that the implementation uses Rust's petgraph library for CFG construction and graph manipulation.
Meaning: petgraph is a Rust graph data structure library that provides graph types and traversal/manipulation utilities.