1. Executive Summary
Patrick van den Bosch's thesis describes a decompiler for Kasada's browser-side JavaScript virtual machine. The work treats the protection as a bytecode and interpreter problem: recover the bytecode parser, map per-instance control signals and opcode order, disassemble into an intermediate representation, reconstruct control flow, and generate readable JavaScript.
The strongest contribution is the decomposition of a VM-obfuscated anti-bot script into reviewable layers. The paper does not present a broad measurement of current Kasada deployments; it presents an implementation narrative with worked examples for decoding, value parsing, CFG construction, loop and conditional structuring, exception handling, and generated output.
2. Research Question
The thesis asks whether classical decompilation techniques can be adapted to Kasada's JavaScript VM so that custom bytecode can be translated back into structured, readable JavaScript.
Interpreter discovery
Locate the bytecode parser, control-signal array, and opcode function array inside the obfuscated JavaScript instance.
Bytecode semantics
Convert encoded bytecode into values, instructions, registers, memory operations, jumps, function boundaries, and exception targets.
Graph recovery
Build function-level CFGs that preserve converged branches, loop edges, conditional exits, returns, catch handlers, and finally handlers.
Readable output
Structure the graph back into JavaScript constructs such as while, do-while, if, if-else, try, catch, and finally.
3. Target System
The paper describes Kasada's client-side protection as a browser VM script that collects signals and posts them for token generation, plus a second script that attaches that token to outgoing requests. This report focuses on the VM and bytecode layer, not on server-side scoring or token validation.
| Component | Paper description | Why it matters for review |
|---|---|---|
| Bytecode array | A sequence of 32-bit signed integers produced from an encoded string. | The decompiler must recover instruction boundaries and literal values before higher-level analysis is possible. |
| Bytecode parser | A JavaScript function that interprets primitive values, strings, booleans, null, undefined, doubles, and register references. | The parser changes across instances, so the analysis must infer behavior from the current script. |
| Control-signal array | A per-instance marker array used by the parser to distinguish value types. | Value markers should be treated as instance-local, not fixed Kasada-wide constants. |
| Opcode array | An array of opcode functions whose order varies by VM instance. | Opcode identification must come from function behavior and position analysis, not stable numeric opcode IDs. |
4. Decompiler Pipeline
The thesis uses SWC to parse the obfuscated VM JavaScript into an AST and then analyzes that AST to locate interpreter components. After the bytecode is decoded and parsed, the disassembler emits a human-readable intermediate representation and builds basic blocks from reachable instruction pointers.
| Stage | Paper method | Review note |
|---|---|---|
| Decode bytecode string | Use a custom alphabet and delimiter-like offset to turn encoded characters into integer bytecode values. | Section 3.3 is a useful example of treating obfuscation as a parser, not a string-cleanup problem. |
| Parse values | Map integers, doubles, strings, booleans, null, undefined, and register references from control-signal-driven parser branches. | The sample mapping is explicitly an instance mapping, not a fixed signature. |
| Identify opcodes | Analyze the opcode function array and infer opcode assignments for the current VM instance. | This protects the decompiler against opcode-order randomization described in the thesis. |
| Disassemble to IR | Represent assignments, property operations, memory cells, arrays, functions, branches, returns, throws, and exception operations. | The IR is the bridge between VM bytecode and high-level JavaScript code generation. |
| Discover basic blocks | Traverse reachable targets from the entrypoint, queueing jumps, function definitions, catch targets, and finally targets. | The queue-based discovery avoids assuming source bytecode order is execution order. |
5. Control Flow Recovery
Control-flow recovery is the center of the thesis. The decompiler builds per-function CFGs and then structures loops, exception handlers, and conditionals before code generation.
- Loop analysis. The thesis starts with Johnson's algorithm for elementary circuits, then classifies loops as while, do-while, or infinite-loop-with-break patterns based on header, latch, condition, and external-exit structure.
- Exception analysis. Exception handlers are structured before ordinary conditionals because catch and finally edges can otherwise be misread as branch edges.
- Conditional analysis. The decompiler uses a Cifuentes-style structuring approach and dominance information to find one-way and two-way conditional regions and their follow nodes.
6. Evidence
The thesis evidence is qualitative. It includes worked bytecode and parser examples, an IR example, CFG figures, explicit loop and conditional patterns, a try-body traversal figure, and a generated JavaScript listing. It does not include a public benchmark or measured coverage across a corpus of Kasada VM instances.
7. Code Artifacts
These local review helpers are reconstructed or modernized from the thesis descriptions. They are not the author's original decompiler and do not reproduce Kasada production bytecode.
Custom alphabet bytecode decoder Modernized from Section 3.3
;(async () => {
const alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
const delimiter = 40;
const base = alphabet.length - delimiter;
function encodeNumber(value) {
const chars = [];
let remaining = value;
while (remaining >= delimiter) {
const charIndex = delimiter + ((remaining - delimiter) % base);
chars.push(alphabet[charIndex]);
remaining = (remaining - charIndex) / base;
}
chars.push(alphabet[remaining]);
return chars.join("");
}
function decodeBytecode(encoded) {
const values = [];
for (let i = 0; i < encoded.length; ) {
let value = 0;
let multiplier = 1;
while (i < encoded.length) {
const charIndex = alphabet.indexOf(encoded[i++]);
if (charIndex < 0) throw new Error("Character is not in alphabet");
value += multiplier * (charIndex % delimiter);
if (charIndex < delimiter) break;
value += delimiter * multiplier;
multiplier *= base;
}
values.push(value | 0);
}
return values;
}
const sampleValues = [1, 14, 127, 2048];
const encoded = sampleValues.map(encodeNumber).join("");
const result = { encoded, decoded: decodeBytecode(encoded), sampleValues };
console.log(result);
return result;
})() Control-signal value reader Reconstructed from Sections 3.4.1-3.4.3
;(async () => {
const controls = [6, 18, 14, 2, 20, 40];
const stream = [
(21 << 1) | 1,
controls[1],
controls[3],
controls[4],
controls[5],
controls[2],
2,
88,
107,
7 << 5
];
function decodeStringInteger(k) {
return (k & 0xffffffc0) | ((k * 59) & 0x3f);
}
function readValue(state) {
const r = state.stream[state.ip++];
if (r & 1) return { type: "integer", value: r >> 1 };
if (r === controls[1]) return { type: "boolean", value: true };
if (r === controls[3]) return { type: "boolean", value: false };
if (r === controls[4]) return { type: "null", value: null };
if (r === controls[5]) return { type: "undefined", value: undefined };
if (r === controls[2]) {
const length = state.stream[state.ip++];
let value = "";
for (let i = 0; i < length; i++) {
value += String.fromCharCode(decodeStringInteger(state.stream[state.ip++]));
}
return { type: "string", value };
}
return { type: "register", value: `r${r >> 5}` };
}
const state = { ip: 0, stream };
const decoded = [];
while (state.ip < state.stream.length) decoded.push(readValue(state));
const result = { controls, stream, decoded };
console.log(result);
return result;
})() Simple loop pattern classifier Inferred from Section 6.1 and Figure 6.1
;(async () => {
function classifyLoop(pattern) {
if (pattern.headerHasCondition && pattern.latchJumpsToHeader) return "while";
if (pattern.latchHasCondition && pattern.latchJumpsToHeader) return "do-while";
if (pattern.latchJumpsToHeader && pattern.hasExternalBreak) return "infinite-with-break";
if (pattern.latchJumpsToHeader) return "infinite";
return "not-a-loop";
}
const samples = {
whileLoop: {
headerHasCondition: true,
latchHasCondition: false,
latchJumpsToHeader: true,
hasExternalBreak: false
},
doWhileLoop: {
headerHasCondition: false,
latchHasCondition: true,
latchJumpsToHeader: true,
hasExternalBreak: false
},
infiniteLoop: {
headerHasCondition: false,
latchHasCondition: false,
latchJumpsToHeader: true,
hasExternalBreak: true
}
};
const result = Object.fromEntries(
Object.entries(samples).map(([name, pattern]) => [name, classifyLoop(pattern)])
);
console.log(result);
return result;
})() 8. Limitations
- No data flow analysis. The paper's explicit main limitation is the absence of DFA, which would improve data relationship tracking, variable naming, compound expression identification, and generated-code optimization.
- No quantitative evaluation. The PDF does not report a corpus size, number of VM instances, opcode-identification accuracy, CFG-recovery accuracy, JavaScript equivalence tests, runtime, or scalability measurements.
- No public implementation link. The thesis says sensitive implementation details are intentionally withheld, so the decompiler cannot be replicated from the PDF alone.
- Current-production scope is unverified. Treat the Kasada details as the thesis author's analyzed system description, not as an independently validated statement about Kasada's current production behavior.
- Generated JavaScript remains qualitative evidence. The example output shows feasibility, but the PDF does not prove semantic equivalence across a broad set of bytecode programs.
9. Reviewer Notes
The extracted source identifies the work as Patrick van den Bosch, Decompilation of Kasada's JavaScript Virtual Machine, Bachelor's Thesis Computing Science, January 6, 2025. The PDF has no DOI, arXiv URL, public repository, or complete institution metadata in the extracted title-page text.
- Keep the page in Draft until a reviewer confirms the thesis label, venue/institution wording, and PDF download availability.
- Do not convert the working-decompiler claim into a broad statement about Kasada VM coverage or current anti-bot bypass capability.
- Scope dynamic opcode and control-signal variation to the analyzed VM instances described by the thesis.
- The source PDF path was normalized to
content/pdf/2025 - Decompilation of Kasadas JavaScript Virtual Machine.pdf; the display title preserves the apostrophe in Kasada's.