Part 4 β Module 4: Control Flow & Functions
Difficulty: Intermediate
Estimated reading time: ~50 minutes | Exercises: ~3-4 hours
π Table of Contents
Control Flow in Yul
- Yul
ifβ Conditional Execution switch/case/defaultβ Multi-Branch LogicforLoops β Gas-Efficient Iterationleaveβ Early Exit- Deep Dive: From Yul to JUMP/JUMPI β Bytecode Comparison
Yul Functions (Internal)
- Defining and Calling Yul Functions
- Inlining Behavior β When Functions Become JUMPs
- Stack Depth and Yul Functions
Function Selector Dispatch
Error Handling Patterns in Yul
How to Study
Exercises
Wrap-Up
Control Flow in Yul
Modules 1-3 gave you the building blocks: opcodes and gas costs, memory and calldata layout, storage slots and packing. Now you write programs. In Module 1 you saw if, switch, and for in passing as Yul syntax elements. This module goes deep on each one β how they compile to bytecode, what they cost, and how to use them in production assembly.
By the end of this section, youβll understand why every require() in Solidity is an if iszero(...) { revert } under the hood, and youβll be able to write complete dispatch tables by hand.
π‘ Concept: Yul if β Conditional Execution
Why this matters: The if statement is the most common control flow in assembly. Every access check, every balance validation, every sanity guard compiles to an if in Yul. Mastering its quirks β especially the lack of else β is essential for writing correct assembly.
Yulβs if is simpler than Solidityβs:
if condition {
// executed when condition is nonzero
}
Key rules:
- Any nonzero value is true. There is no boolean type.
1,42,0xffffffffffffffffβ all true. Only0is false. - There is no
else. This is by design. You useswitchfor if/else patterns. - Negation uses
iszero(): To express βif NOT condition,β writeif iszero(condition) { }.
Pattern: Guard clauses β the bread and butter of assembly:
assembly {
// Ownership check: revert if caller is not owner
if iszero(eq(caller(), sload(0))) { // slot 0 = owner
revert(0, 0)
}
// Zero-address validation
if iszero(calldataload(4)) { // first arg is address
mstore(0x00, 0x00000000) // could store error selector
revert(0x00, 0x04)
}
// Balance check: revert if balance < amount
let bal := sload(balanceSlot)
let amount := calldataload(36)
if lt(bal, amount) {
revert(0, 0)
}
}
Every require(condition, "message") in Solidity compiles to exactly this pattern: if iszero(condition) { /* encode error */ revert(...) }. When you write assembly, youβre writing what the compiler would generate.
π» Quick Try:
Test the iszero pattern in Remix:
// SPDX-License-Identifier: MIT
pragma solidity ^0.8.28;
contract GuardTest {
address public owner;
constructor() { owner = msg.sender; }
function onlyOwnerAction() external view returns (uint256) {
assembly {
if iszero(eq(caller(), sload(0))) {
revert(0, 0)
}
mstore(0x00, 42)
return(0x00, 0x20)
}
}
}
Deploy, call onlyOwnerAction() from the deployer (returns 42), then switch accounts and call again (reverts). That if iszero(eq(...)) pattern is the one youβll write most often.
β οΈ Common Mistakes
- Forgetting
iszero()for negation.if eq(x, 0) { }does NOT mean βif x equals 0, do nothing.β It means βif eq returns 1 (true), execute the block.β This does execute when x is 0. The confusion is thinkingifwith a condition that evaluates to βzero equals zeroβ skips β it doesnβt. For clarity, always useif iszero(x) { }when you mean βif x is zero.β - Using
ifwhenswitchis clearer. If you have more than two branches, chainedifstatements are harder to read than aswitch. Preferswitchfor value-matching dispatch. - Not masking addresses.
if eq(caller(), addr)can fail ifaddrhas dirty upper bits (bits 160-255 nonzero). Addresses are 20 bytes, but stack values are 32 bytes. Always ensure address values are clean, or mask withand(addr, 0xffffffffffffffffffffffffffffffffffffffff). - Using
iffor early return.ifcannot return a value β it only gates a block. For early-return patterns in Yul, you needleaveinside a Yul function (covered below).
πΌ Job Market Context
βWhy doesnβt Yul have else?β
- Good: βYou use
switchwith two cases insteadβ - Great: βYul is intentionally minimal β it maps closely to EVM opcodes. Thereβs no JUMPELSE opcode, only JUMPI (conditional jump). An if-else would compile to JUMPI + JUMP, same as a
switchwithcase 0/default. Yul makes you choose the right construct explicitly rather than hiding the cost. In practice, most assembly code uses guard-clause-styleif iszero(...) { revert }β you rarely needelsebecause the revert terminates executionβ
π© Red flag: Not knowing iszero is the standard negation pattern
Pro tip: Every require() in Solidity compiles to if iszero(condition) { revert } β the pattern youβll write most often. Interviewers who see you instinctively write if iszero(...) instead of struggling with negation know youβve written real assembly
π‘ Concept: switch/case/default β Multi-Branch Logic
Why this matters: switch is how you write if/else logic in Yul, and itβs the foundation of function selector dispatch β the most important control flow pattern in smart contracts.
switch expr
case value1 {
// executed if expr == value1
}
case value2 {
// executed if expr == value2
}
default {
// executed if no case matched
}
Key rules:
- No fall-through. Unlike C, JavaScript, or Goβs
switch, Yul cases do NOT fall through to the next case. Each case is independent β nobreakneeded. - Must have at least one
caseOR adefault. You canβt have an empty switch. - Cases must be literal values. You canβt use variables or expressions as case values β only integer literals or string literals.
- The βelseβ replacement: Since Yul has no
else, use a two-branch switch:
// "if condition { A } else { B }" in Yul:
switch condition
case 0 {
// else branch (condition was false/zero)
}
default {
// if branch (condition was nonzero/true)
}
Note the inversion: case 0 is the false branch because 0 means false. default catches all nonzero values (true).
Example: Classify a value into tiers:
assembly {
let amount := calldataload(4)
let tier
// Determine tier based on thresholds
switch gt(amount, 1000000000000000000) // > 1 ETH?
case 0 {
tier := 1 // small
}
default {
switch gt(amount, 100000000000000000000) // > 100 ETH?
case 0 {
tier := 2 // medium
}
default {
tier := 3 // large (whale)
}
}
mstore(0x00, tier)
return(0x00, 0x20)
}
π» Quick Try:
Rewrite this Solidity if-chain as a Yul switch:
function classify(uint256 x) external pure returns (uint256) {
// Solidity version:
// if (x == 1) return 10;
// else if (x == 2) return 20;
// else if (x == 3) return 30;
// else return 0;
assembly {
switch x
case 1 { mstore(0x00, 10) }
case 2 { mstore(0x00, 20) }
case 3 { mstore(0x00, 30) }
default { mstore(0x00, 0) }
return(0x00, 0x20)
}
}
Deploy, call with different values. Verify the outputs match. At the bytecode level, both the if-chain and switch compile to the same JUMPI sequence β but switch makes intent explicit.
Gas comparison: switch and chained if produce identical bytecode β both are linear JUMPI chains. The choice is about readability, not performance.
πΌ Job Market Context
βWhen do you use switch vs if in Yul?β
- Good: β
switchfor matching specific values,iffor boolean conditionsβ - Great: β
switchwhen dispatching on a known set of values β selector dispatch, enum handling, error codes.iffor boolean guards β access control, balance checks, zero-address validation. At the bytecode level they compile to the same JUMPI chains, butswitchmakes the intent explicit β especially important in audit-facing code. The Solidity compiler itself usesswitchinternally for selector dispatch in the Yul IR outputβ
π© Red flag: Assuming switch has fall-through like C
Pro tip: The Solidity compiler uses switch internally for selector dispatch β youβre writing what the compiler would generate. Knowing this shows you understand the compilation pipeline, not just the surface syntax
π‘ Concept: for Loops β Gas-Efficient Iteration
Why this matters: Loops are where assembly gas savings are most dramatic β and where bugs are most dangerous. A single unbounded loop can make a contract DoS-vulnerable. Understanding the exact gas cost per iteration lets you make informed decisions about loop design.
Yulβs for loop has explicit C-like syntax:
for { /* init */ } /* condition */ { /* post */ } {
/* body */
}
A concrete example β iterate 0 to 9:
for { let i := 0 } lt(i, 10) { i := add(i, 1) } {
// body: runs with i = 0, 1, 2, ..., 9
}
Key differences from Solidity:
- No
i++or++isyntax. Usei := add(i, 1). - No
<=opcode. Thereβslt(less than) andgt(greater than), but noleorge. For βless than or equal,β useiszero(gt(i, limit))or restructure:lt(i, add(limit, 1))(but watch for overflow iflimitistype(uint256).max). - No
breakorcontinue. If you need early exit, wrap the loop in a Yul function and useleave. To skip iterations, use anifguard inside the body.
Gas-efficient patterns:
// GOOD: Cache array length outside the loop
let len := mload(arr) // read length once
for { let i := 0 } lt(i, len) { i := add(i, 1) } {
let element := mload(add(add(arr, 0x20), mul(i, 0x20)))
// process element
}
// BAD: Read length every iteration (for storage arrays)
// for { let i := 0 } lt(i, sload(lenSlot)) { i := add(i, 1) } {
// ^^^^ SLOAD every iteration = 2100 gas cold, 100 warm per loop!
// }
When loops are safe vs dangerous:
| Pattern | Safety | Why |
|---|---|---|
Fixed bounds (i < 10) | Safe | Gas cost is constant, known at compile time |
Bounded by constant (i < MAX_BATCH) | Safe | Worst case is bounded, auditable |
| Bounded by storage length | Dangerous | Attacker can grow the array to exhaust gas |
| Unbounded iteration | Critical risk | Block gas limit is the only bound β DoS vector |
π» Quick Try:
Sum an array of 5 uint256s in Yul and compare gas to Solidity:
// SPDX-License-Identifier: MIT
pragma solidity ^0.8.28;
contract LoopGas {
function sumSolidity(uint256[] calldata arr) external pure returns (uint256 total) {
for (uint256 i = 0; i < arr.length; i++) {
total += arr[i];
}
}
function sumYul(uint256[] calldata arr) external pure returns (uint256) {
assembly {
let total := 0
// arr.offset is at position calldataload(4), arr.length at calldataload(36)
// For calldata arrays: offset is in arg slot 0, length at the offset
let offset := add(4, calldataload(4)) // skip selector + follow offset
let len := calldataload(offset)
let dataStart := add(offset, 0x20) // elements start after length
for { let i := 0 } lt(i, len) { i := add(i, 1) } {
total := add(total, calldataload(add(dataStart, mul(i, 0x20))))
}
mstore(0x00, total)
return(0x00, 0x20)
}
}
}
Call both with [10, 20, 30, 40, 50] and compare gas. The Yul version skips bounds checks and overflow checks, saving ~15-20 gas per iteration.
π Deep Dive: Loop Gas Anatomy
Every loop iteration has fixed overhead from the control flow opcodes, regardless of what the body does:
Per-iteration overhead:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β JUMPDEST β 1 gas β loop_start label β
β [condition] β ~6 gas β e.g., LT(3) on two stack vals β
β ISZERO β 3 gas β negate for skip pattern β
β PUSH2 loop_end β 3 gas β destination for exit β
β JUMPI β 10 gas β conditional jump β
β [body] β ? gas β your actual work β
β [post] β ~6 gas β e.g., ADD(3) + DUP/SWAP β
β PUSH2 loop_startβ 3 gas β destination for loop back β
β JUMP β 8 gas β unconditional jump β
ββββββββββββββββββββΌββββββββββΌβββββββββββββββββββββββββββββββββββ€
β Total overhead β ~31 gas β per iteration, excluding body β
ββββββββββββββββββββ΄ββββββββββ΄βββββββββββββββββββββββββββββββββββ
Practical impact:
- 100 iterations x 31 gas overhead = 3,100 gas just for loop control
- If the body does an SLOAD (100 gas warm), total per iteration = ~131 gas
- If the body does an SSTORE (5,000 gas), the 31 gas overhead is negligible
Why unchecked { ++i } in Solidity matches Yulβs i := add(i, 1):
Both skip the overflow check. In checked Solidity, i++ adds ~20 gas per iteration for the overflow comparison. Since loop indices almost never overflow (youβd need 2^256 iterations), unchecked is standard practice in gas-optimized Solidity. In Yul, you get this by default β add does not check for overflow.
π DeFi Pattern Connection
Where loops matter in DeFi:
-
Batch operations: Airdrop contracts, multi-transfer, batch liquidation. These iterate over recipients and amounts. Uniswap V3βs
collect()and Aave V3βsexecuteBatchFlashLoan()both use bounded loops. -
Array iteration: Token allowlist checks, validator set updates, reward distribution. The gas cost of iterating a 100-element array is ~3,100 gas overhead + body cost β manageable for most operations.
-
The βbounded loopβ audit rule: Auditors flag unbounded loops as high severity. If a user can grow the array (e.g., by calling
addToList()repeatedly), they can make any function that iterates the list exceed the block gas limit. The standard fix: paginated iteration withstartIndexandbatchSizeparameters. -
Curveβs StableSwap: The
get_D()function uses a Newton-Raphson loop to find the invariant. Itβs bounded byMAX_ITERATIONS = 255β if it doesnβt converge, it reverts. This is the textbook example of a safe math loop.
β οΈ Common Mistakes
- Off-by-one with
lt.for { let i := 0 } lt(i, len) { i := add(i, 1) }iterates0tolen-1(correct for array indexing). Usinggt(len, i)is equivalent but less readable. Usingiszero(eq(i, len))also works but costs an extra opcode. - Forgetting thereβs no
breakin Yul for-loops. You cannot exit a loop early withbreak. The workaround: wrap the loop body in a Yul function and useleaveto exit, or restructure the loop condition to include your exit criteria. Example:for { let i := 0 } and(lt(i, len), iszero(found)) { ... }. - Modifying the loop variable inside the body.
i := add(i, 2)inside the body, combined withi := add(i, 1)in the post block, increments by 3 total. This leads to skipped or repeated iterations. Only modify the loop variable in the post block. - Not caching storage reads.
for { let i := 0 } lt(i, sload(lenSlot)) { ... }does an SLOAD every iteration. Cold first access = 2,100 gas, then 100 gas per subsequent check. For a 100-iteration loop, thatβs 12,000 gas wasted on length reads alone. Always cache:let len := sload(lenSlot).
πΌ Job Market Context
βHow do you iterate arrays safely in assembly?β
- Good: βUse a
forloop withlt(i, length), pre-compute the lengthβ - Great: βCache the length in a local variable to avoid repeated SLOAD/MLOAD. Use
lt(i, len)for the condition β thereβs noleopcode, so<=requiresiszero(gt(i, len))orlt(i, add(len, 1)), which can overflow at type max. For storage arrays, load the length once withsloadand compute element slots withadd(baseSlot, i). Always ensure the loop is bounded β unbounded loops are an audit finding because an attacker can grow the array to make the function exceed the block gas limitβ
π© Red flag: Writing unbounded loops over user-controlled arrays
Pro tip: In interviews, always mention the DoS vector β it shows security awareness alongside assembly skill. If you can also cite Curveβs Newton-Raphson bounded loop or Aaveβs batch size limits, you demonstrate real protocol knowledge
π‘ Concept: leave β Early Exit
Why this matters: leave is Yulβs equivalent of return in other languages β it exits the current Yul function immediately. Without it, youβd need deeply nested if blocks for guard-then-compute patterns.
function findIndex(arr, len, target) -> idx {
idx := 0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff // not found sentinel
for { let i := 0 } lt(i, len) { i := add(i, 1) } {
if eq(mload(add(arr, mul(add(i, 1), 0x20))), target) {
idx := i
leave // exit the function immediately
}
}
// if we get here, target wasn't found; idx is still the sentinel
}
Key rules:
leaveonly works inside Yul functions, not in top-levelassembly { }blocks. If you try to useleaveoutside a function, the compiler will error.- It exits the innermost function β if you have nested Yul functions,
leaveexits the one itβs in, not the outer one. - For top-level assembly blocks, use
return(ptr, size)orrevert(ptr, size)to exit execution entirely.
How leave compiles: Itβs a JUMP to the functionβs exit JUMPDEST β the cleanup point where return values are on the stack and the return program counter is used. No special opcode, just a JUMP.
Pattern: Guard-and-compute in Yul functions:
function safeDiv(a, b) -> result {
if iszero(b) {
result := 0
leave // don't divide by zero
}
result := div(a, b)
}
This is cleaner than the alternative without leave:
function safeDiv(a, b) -> result {
switch iszero(b)
case 1 { result := 0 }
default { result := div(a, b) }
}
Both work, but leave scales better when you have multiple guard conditions β each can leave independently without nesting.
π Deep Dive: From Yul to JUMP/JUMPI β Bytecode Comparison
In Module 1 you learned that JUMP costs 8 gas, JUMPI costs 10 gas, and JUMPDEST costs 1 gas. Now you can see exactly how your Yul code maps to these opcodes.
if compiles to JUMPI (skip pattern):
Yul: Bytecode:
if condition { [push condition value]
body ISZERO ; negate: skip body if false
} PUSH2 end_label
JUMPI ; jump past body if condition was 0
[body opcodes]
JUMPDEST ; end_label -- execution continues here
The compiler inverts the condition with ISZERO so JUMPI skips the body when the original condition is false. This is the βskip patternβ β the most common JUMPI usage.
switch (2 cases + default) compiles to chained JUMPI:
Yul: Bytecode:
switch selector [push selector]
case 0xAAAAAAAA { case1_body } DUP1
case 0xBBBBBBBB { case2_body } PUSH4 0xAAAAAAAA
default { default_body } EQ
PUSH2 case1_label
JUMPI ; jump if match
DUP1
PUSH4 0xBBBBBBBB
EQ
PUSH2 case2_label
JUMPI ; jump if match
POP ; clean up selector
[default body]
PUSH2 end
JUMP
JUMPDEST ; case1_label
POP ; clean up selector
[case1 body]
PUSH2 end
JUMP
JUMPDEST ; case2_label
POP ; clean up selector
[case2 body]
JUMPDEST ; end
Notice: each case costs EQ(3) + JUMPI(10) = 13 gas to check. A switch with 10 cases means up to 130 gas just searching for the right case (linear scan). This is why Solidityβs compiler switches to binary search for larger contracts.
for loop compiles to JUMP + JUMPI:
Yul: Bytecode:
for { let i := 0 } PUSH1 0x00 ; [init] i = 0
lt(i, 10) JUMPDEST ; loop_start
{ i := add(i, 1) } DUP1
{ PUSH1 0x0A ; 10
body LT
} ISZERO
PUSH2 loop_end
JUMPI ; exit if i >= 10
[body opcodes]
PUSH1 0x01
ADD ; [post] i = i + 1
PUSH2 loop_start
JUMP ; back to condition
JUMPDEST ; loop_end
Each iteration: JUMPDEST(1) + condition(~6) + ISZERO(3) + PUSH2(3) + JUMPI(10) + body + post(~6) + PUSH2(3) + JUMP(8) = ~31 gas overhead plus whatever the body costs.
π» Quick Try:
Compile a simple contract and inspect the bytecode:
# Create a minimal contract
cat > /tmp/Switch.sol << 'EOF'
// SPDX-License-Identifier: MIT
pragma solidity ^0.8.28;
contract Switch {
fallback() external payable {
assembly {
switch calldataload(0)
case 1 { mstore(0, 10) return(0, 32) }
case 2 { mstore(0, 20) return(0, 32) }
default { revert(0, 0) }
}
}
}
EOF
# Inspect the Yul IR
forge inspect Switch ir-optimized
# Or disassemble the bytecode
cast disassemble $(forge inspect Switch bytecode)
Look for the JUMPI instructions in the output. Count them β you should see exactly 2 (one per case).
Connection back to Module 1: In Module 1 you learned JUMP costs 8 gas, JUMPI costs 10, JUMPDEST costs 1. Now you can see exactly how many JUMPs your Yul code generates β and why a switch with 10 cases creates 10 JUMPI instructions (linear scan), costing up to 130 gas just to find the matching case.
Yul Functions (Internal)
Yul functions are how you organize assembly code. Without them, complex assembly becomes an unreadable wall of opcodes. They reduce stack pressure (each function scope has its own variable space), enable code reuse, and make assembly readable enough to audit.
This section covers defining functions, understanding when the optimizer inlines them, and managing the 16-slot stack depth limit.
π‘ Concept: Defining and Calling Yul Functions
Why this matters: In production assembly (Solady, Uniswap V4), youβll see dozens of Yul functions per contract. Theyβre the primary unit of code organization in assembly β the equivalent of internal functions in Solidity.
Syntax:
// Single return value
function name(param1, param2) -> result {
result := add(param1, param2)
}
// Multiple return values
function divmod(a, b) -> quotient, remainder {
quotient := div(a, b)
remainder := mod(a, b)
}
// No return value (side effects only)
function requireNonZero(value) {
if iszero(value) { revert(0, 0) }
}
Key rules:
- Functions can only be called within the same
assemblyblock where theyβre defined. They donβt exist outside assembly. - Variables declared inside a function are scoped to that function. This is the key benefit for stack management β each function gets a clean variable scope.
- Functions can call other Yul functions defined in the same assembly block.
- Return values must be assigned. If you declare
-> resultbut donβt assign it,resultdefaults to0.
π» Quick Try:
Define min and max as Yul functions:
function minMax(uint256 a, uint256 b) external pure returns (uint256, uint256) {
assembly {
function min(x, y) -> result {
result := y
if lt(x, y) { result := x }
}
function max(x, y) -> result {
result := x
if lt(x, y) { result := y }
}
mstore(0x00, min(a, b))
mstore(0x20, max(a, b))
return(0x00, 0x40)
}
}
Deploy and test with (100, 200). You should get (100, 200). Test with (300, 50) β should get (50, 300).
π Intermediate Example: Building a Utility Library in Yul
Before you write full contracts in assembly, you need a toolkit. Here are the helper functions youβll reuse across nearly every assembly block:
assembly {
// ββ Guards ββββββββββββββββββββββββββββββββββββββββββββββ
// Revert with no data (cheapest revert)
function require(condition) {
if iszero(condition) { revert(0, 0) }
}
// Revert with a 4-byte error selector
function requireWithSelector(condition, sel) {
if iszero(condition) {
mstore(0x00, shl(224, sel))
revert(0x00, 0x04)
}
}
// ββ Math ββββββββββββββββββββββββββββββββββββββββββββββββ
// Overflow-checked addition
function safeAdd(a, b) -> result {
result := add(a, b)
if lt(result, a) { revert(0, 0) } // overflow
}
// Min / Max
function min(a, b) -> result {
result := b
if lt(a, b) { result := a }
}
function max(a, b) -> result {
result := a
if lt(a, b) { result := b }
}
// ββ Storage helpers βββββββββββββββββββββββββββββββββββββ
// Compute mapping slot: keccak256(key . baseSlot)
// Reuses the formula from Module 3
function getMappingSlot(key, baseSlot) -> slot {
mstore(0x00, key)
mstore(0x20, baseSlot)
slot := keccak256(0x00, 0x40)
}
// Compute nested mapping slot: mapping[key1][key2]
function getNestedMappingSlot(key1, key2, baseSlot) -> slot {
mstore(0x00, key1)
mstore(0x20, baseSlot)
let intermediate := keccak256(0x00, 0x40)
mstore(0x00, key2)
mstore(0x20, intermediate)
slot := keccak256(0x00, 0x40)
}
}
Note how getMappingSlot reuses the Module 3 mapping formula as a callable function. This is the pattern in production assembly β define your slot computation functions once at the top of the assembly block, then call them throughout.
Solady uses this exact pattern. Open any Solady contract and youβll see a library of internal Yul functions at the top of the assembly block. The naming conventions are consistent: _get, _set, _require, etc.
π‘ Concept: Inlining Behavior β When Functions Become JUMPs
Why this matters: Yul functions can either be inlined (copied into the call site) or compiled as JUMP targets (called via JUMP/JUMPDEST). The optimizer decides which approach to use, and the choice affects both gas cost and bytecode size.
Inlining: The compiler copies the functionβs body directly into every call site. No JUMP, no JUMPDEST, no call overhead. The function βdisappearsβ from the bytecode.
// This will likely be inlined (tiny body)
function isZero(x) -> result {
result := iszero(x)
}
// After inlining, "isZero(val)" just becomes "iszero(val)" at the call site
JUMP target: The compiler emits the function body once, and each call site JUMPs to it and JUMPs back. This saves bytecode size but costs ~20 gas per call (JUMP to function + JUMPDEST + JUMP back + JUMPDEST).
// This is more likely to be a JUMP target (larger body, multiple call sites)
function getMappingSlot(key, baseSlot) -> slot {
mstore(0x00, key)
mstore(0x20, baseSlot)
slot := keccak256(0x00, 0x40)
}
The optimizerβs heuristic:
- Small functions (1-2 opcodes): almost always inlined
- Large functions called from one site: inlined (no size penalty)
- Large functions called from multiple sites: JUMP target (saves bytecode)
- The decision is automatic β you cannot force inlining in Yul
How to check: Run forge inspect Contract ir-optimized and look for your function names. Inlined functions disappear entirely β their body appears at each call site. JUMP-target functions appear as labeled blocks.
Trade-off:
| Approach | Gas per call | Bytecode size | Best when |
|---|---|---|---|
| Inlined | 0 overhead | Larger (duplicated) | Small functions, few call sites |
| JUMP target | ~20 gas | Smaller (shared) | Large functions, many call sites |
For production code: Let the optimizer decide. Only manually inline (by not using a function at all) if gas profiling shows a hot path where the 20-gas JUMP overhead matters. In most DeFi contracts, storage operations dominate gas costs, making the JUMP overhead negligible.
π‘ Concept: Stack Depth and Yul Functions
Why this matters: βStack too deepβ is one of the most common errors in Solidity, and understanding why it happens β itβs a hardware constraint, not a language bug β is essential for working in assembly. Yul functions are the primary tool for managing stack depth.
The EVMβs DUP and SWAP opcodes can only reach 16 items deep on the stack. DUP1 copies the top item, DUP16 copies the 16th item from the top. There is no DUP17. If the compiler needs to access a variable thatβs buried deeper than 16 slots, it canβt β hence βstack too deep.β
Each Yul function creates a new scope. Only the functionβs parameters, local variables, and return values occupy its stack frame. This means you can have 50 variables across your entire assembly block, but as long as no single function uses more than ~14 simultaneously, youβll never hit the limit.
π Deep Dive: Stack Layout During a Yul Function Call
When a Yul function is called (not inlined), the stack looks like this:
Before call: [...existing stack items...]
Push args: [...existing...][arg1][arg2]
JUMP to func: [...existing...][return_pc][arg1][arg2]
β pushed by the JUMP mechanism
Inside function: [...existing...][return_pc][arg1][arg2][local1][local2][result]
β DUP/SWAP
can only
ββββββββββββ 16 slots reachable from top βββββββββββββββ reach
here
The reachable window is always the top 16 slots. Everything below is invisible to DUP/SWAP. This means:
Parameters + locals + return values must fit in ~12-14 stack slots (leaving room for temporary values during expression evaluation).
If you exceed this:
Solution 1: Decompose into smaller functions. Each function gets its own scope. A function that takes 4 params and uses 4 locals is fine (8 slots). Calling another function from inside passes values as arguments, keeping each scope small.
// BAD: Too many variables in one function
function doEverything(a, b, c, d, e, f, g, h) -> result {
let x := add(a, b)
let y := mul(c, d)
let z := sub(e, f)
let w := div(g, h)
// ... stack too deep when using x, y, z, w together with a-h
}
// GOOD: Decompose
function computeFirst(a, b, c, d) -> partial1 {
partial1 := add(mul(a, b), mul(c, d))
}
function computeSecond(e, f, g, h) -> partial2 {
partial2 := add(sub(e, f), div(g, h))
}
function combine(a, b, c, d, e, f, g, h) -> result {
result := add(computeFirst(a, b, c, d), computeSecond(e, f, g, h))
}
Solution 2: Spill to memory. Use scratch space (0x00-0x3f) or allocated memory for intermediate values. Each spill costs 3 gas (MSTORE) + 3 gas (MLOAD) = 6 gas, but frees a stack slot.
// Spill intermediate to memory scratch space
mstore(0x00, expensiveComputation) // save to scratch
// ... do other work with freed stack slot ...
let saved := mload(0x00) // restore when needed
Solution 3: Restructure. Sometimes the code can be rewritten to reduce the number of simultaneously live variables. Compute and consume values immediately rather than holding everything until the end.
What via_ir does: The Solidity compilerβs via_ir codegen pipeline automatically moves variables to memory when stack depth is exceeded. Thatβs why enabling via_ir βfixesβ stack-too-deep errors in Solidity. But it adds gas overhead for the memory spills. Hand-written assembly gives you control over which values live in memory vs stack β important for gas-critical paths.
β οΈ Common Mistakes
- Too many local variables in one function. If you declare 10
letvariables plus have 4 parameters, thatβs 14 slots before any temporary values. Youβll hit the limit. Split into helper functions. - Passing too many parameters. A function with 8+ parameters is a design smell. Group related values or compute them inside the function from fewer inputs.
- Forgetting that return values also consume stack slots.
function f(a, b, c) -> x, y, zuses 6 slots (3 params + 3 returns) before any locals. Add 3 locals and youβre at 9 β getting close. - Not accounting for expression temporaries.
add(mul(a, b), mul(c, d))needs stack space for the intermediatemulresults. The compiler handles this, but deeply nested expressions push the limit.
πΌ Job Market Context
βHow do you handle βstack too deepβ in assembly?β
- Good: βBreak the code into smaller Yul functions to reduce variables per scopeβ
- Great: βThe stack limit is 16 reachable slots (DUP16/SWAP16 max). Each Yul function gets a clean scope β only its parameters, locals, and return values count. So the fix is decomposition: extract logic into Yul functions with focused parameter lists. For truly complex operations, spill intermediate values to memory (0x00-0x3f scratch space or allocated memory). The
via_ircompiler does this automatically, but hand-written assembly gives you control over which values live in memory vs stack, which matters for gas-critical pathsβ
π© Red flag: Not knowing why βstack too deepβ happens (itβs not a language bug, itβs a hardware constraint β the DUP/SWAP opcodes only reach 16 deep)
Pro tip: Counting stack depth by hand is a real skill for auditors. Practice by tracing through Soladyβs complex functions β pick a function, list the variables, count the max simultaneous live count
Function Selector Dispatch
The dispatch table is the entry point of every Solidity contract. When you call transfer(), the EVM doesnβt know what βfunctionsβ are β it sees raw bytes. The dispatcher examines the first 4 bytes of calldata and routes execution to the right code. Every Solidity contract has this logic auto-generated. Now youβll build one by hand.
This is where Modules 2, 3, and 4 converge: you need calldata decoding (Module 2), storage operations (Module 3), and control flow (this module) all working together.
π‘ Concept: The Dispatch Problem
Why this matters: Understanding dispatch is understanding how the EVM βfindsβ your function. This knowledge is essential for proxy patterns, gas optimization (ordering functions by call frequency), and building contracts in raw assembly.
Every external call to a contract follows this sequence:
- Extract selector β read the first 4 bytes of calldata
- Find matching function β compare the selector against known values
- Decode arguments β read parameters from calldata positions 4+
- Execute β run the function logic
- Encode return β write the result to memory and RETURN
Steps 1 and 2 are the dispatch table. In Solidity, the compiler generates this automatically. In assembly, you write it yourself.
Recap from Module 2: The selector is extracted with:
let selector := shr(224, calldataload(0))
calldataload(0) reads 32 bytes starting at offset 0. shr(224, ...) shifts right by 224 bits (256 - 32 = 224), leaving just the first 4 bytes in the low 32 bits of the stack value. What Solidity generates automatically, youβll now write by hand.
π‘ Concept: if-Chain Dispatch
Why this matters: This is the simplest dispatch pattern β straightforward to write and easy to understand. Itβs what the Solidity compiler generates for small contracts.
assembly {
let selector := shr(224, calldataload(0))
if eq(selector, 0x18160ddd) { // totalSupply()
mstore(0x00, sload(0)) // slot 0 = totalSupply
return(0x00, 0x20)
}
if eq(selector, 0x70a08231) { // balanceOf(address)
let account := calldataload(4)
// compute mapping slot
mstore(0x00, account)
mstore(0x20, 1) // slot 1 = balances mapping base
let bal := sload(keccak256(0x00, 0x40))
mstore(0x00, bal)
return(0x00, 0x20)
}
if eq(selector, 0xa9059cbb) { // transfer(address,uint256)
// decode, validate, update storage...
mstore(0x00, 1) // return true
return(0x00, 0x20)
}
revert(0, 0) // unknown selector
}
Gas characteristics:
- Linear scan β the first function is cheapest to reach (1 comparison), the last is most expensive (N comparisons).
- Each comparison costs: EQ(3) + JUMPI(10) = 13 gas.
- For 3 functions: worst case = 39 gas. For 10 functions: worst case = 130 gas.
- Optimization: Put the most frequently called function first. For an ERC-20,
transferandbalanceOfare called far more often thannameorsymbol.
When optimal: Few functions (4 or fewer). Above that, the linear cost starts to matter, and switch or binary search becomes better.
π» Quick Try:
Write a 3-function dispatcher and test it:
// SPDX-License-Identifier: MIT
pragma solidity ^0.8.28;
contract SimpleDispatch {
fallback() external payable {
assembly {
let sel := shr(224, calldataload(0))
if eq(sel, 0x18160ddd) { // totalSupply()
mstore(0x00, 1000)
return(0x00, 0x20)
}
if eq(sel, 0x70a08231) { // balanceOf(address)
mstore(0x00, 42)
return(0x00, 0x20)
}
if eq(sel, 0x06fdde03) { // name()
// Return "Test" as string
mstore(0x00, 0x20) // offset
mstore(0x20, 4) // length
mstore(0x40, "Test") // data
return(0x00, 0x60)
}
revert(0, 0)
}
}
}
Deploy, then test with cast:
cast call <address> "totalSupply()" --rpc-url <rpc>
cast call <address> "balanceOf(address)" 0x1234...
π‘ Concept: switch-Based Dispatch
Why this matters: switch is the preferred pattern for hand-written dispatchers. It produces the same bytecode as an if-chain but makes the dispatch table structure explicit and readable.
assembly {
switch shr(224, calldataload(0))
case 0x18160ddd { // totalSupply()
mstore(0x00, sload(0))
return(0x00, 0x20)
}
case 0x70a08231 { // balanceOf(address)
let account := calldataload(4)
mstore(0x00, account)
mstore(0x20, 1)
mstore(0x00, sload(keccak256(0x00, 0x40)))
return(0x00, 0x20)
}
case 0xa9059cbb { // transfer(address,uint256)
// ... implementation
mstore(0x00, 1)
return(0x00, 0x20)
}
default {
revert(0, 0) // unknown selector
}
}
Same gas as if-chain at the bytecode level (both compile to linear JUMPI chains). But the advantages are:
- Cleaner syntax β the dispatch table is visually obvious.
- The
defaultbranch naturally handles both unknown selectors and serves as the fallback function. - Easier to maintain β adding a new function is adding a new
case, not threading anotherifinto the chain.
This is what youβll see in most hand-written assembly contracts and what youβll write in the exercises.
π Deep Dive: How Solidity Actually Dispatches
For small contracts with 4 or fewer external functions, Solidity generates a simple linear if-chain β similar to what you just wrote. But for larger contracts, it switches to something smarter.
Binary search dispatch:
For contracts with more than ~4 external functions, the Solidity compiler sorts selectors numerically and generates a binary search tree. Instead of checking selectors one by one (O(n)), it compares against the middle value and branches left or right (O(log n)).
Hereβs how it works for a contract with 8 external functions. Assume the selectors, sorted numerically, are:
0x06fdde03 (name)
0x095ea7b3 (approve)
0x18160ddd (totalSupply)
0x23b872dd (transferFrom)
0x70a08231 (balanceOf)
0x95d89b41 (symbol)
0xa9059cbb (transfer)
0xdd62ed3e (allowance)
The compiler generates a binary search tree:
sel < 0x70a08231?
β± β²
sel < 0x18160ddd? sel < 0xa9059cbb?
β± β² β± β²
sel < 0x095ea7b3? eq 0x18160ddd? eq 0x70a08231? sel < 0xdd62ed3e?
β± β² β β² β β² β± β²
eq 0x06fdde03 eq 0x095ea7b3 eq 0x23b872dd eq 0x95d89b41 eq 0xa9059cbb eq 0xdd62ed3e
(name) (approve) (totalSupply) (transferFrom) (balanceOf) (symbol) (transfer) (allowance)
Gas impact:
- Linear dispatch with 8 functions: worst case = 8 x 13 = 104 gas
- Binary search with 8 functions: worst case = 3 comparisons = 39 gas
- For 32 functions: linear = 416 gas, binary = 5 comparisons = 65 gas
Why function ordering matters for gas:
The binary search uses numerically sorted selectors β you canβt control the tree structure directly in Solidity. But in assembly, you can:
- Order your if-chain or switch by call frequency (hot functions first)
- Use a jump table for O(1) dispatch (advanced β covered in Module 6)
How to inspect dispatch logic:
# View the Yul IR (shows switch/if structure)
forge inspect MyContract ir-optimized
# Disassemble to raw opcodes
cast disassemble $(forge inspect MyContract bytecode)
Look for clusters of DUP1 PUSH4 EQ PUSH2 JUMPI β each cluster is one selector comparison.
Advanced: Beyond binary search:
Some ultra-optimized frameworks use different strategies:
- Huff / Solady: Can use jump tables for O(1) dispatch (one JUMPI regardless of function count). This requires computing the jump destination from the selector β covered in Module 6.
- Diamond Pattern (EIP-2535): Puts selectors in different βfacetsβ (contracts), so each facet has a small dispatch table. The main contract looks up which facet handles a selector, then DELEGATECALLs to it.
πΌ Job Market Context
βHow does the Solidity compiler handle function dispatch?β
- Good: βIt checks the selector against each function and routes to the right oneβ
- Great: βFor 4 or fewer functions, itβs a linear if-chain of JUMPI instructions β each costing 13 gas (EQ + JUMPI). For more functions, it uses binary search: selectors are sorted numerically, and the dispatcher does log(n) comparisons. A contract with 32 functions needs ~5 comparisons (65 gas) to find any function. This is why some protocols put frequently-called functions in a separate facet (Diamond pattern) β to keep the dispatch table small on hot paths. In hand-written assembly, you can go further: arrange selectors by call frequency or use jump tables for O(1) dispatchβ
π© Red flag: Thinking dispatch is free or constant-cost
Pro tip: Know that function selector values affect gas cost. 0x00000001 would be found fastest in a binary search (always takes the left branch). Some MEV-optimized contracts pick selectors strategically using vanity selector mining via CREATE2. Tools like cast sig compute selectors from signatures
π‘ Concept: Fallback and Receive in Assembly
Why this matters: Every Solidity contract has implicit dispatch for two special cases: receiving ETH with no calldata (receive), and handling calls with unknown selectors (fallback). In assembly, you write these explicitly.
Receive: Triggered when calldatasize() == 0 β a plain ETH transfer with no function call.
Fallback: The catch-all after selector matching fails β the default branch of your switch, or the final revert after all if checks.
Complete dispatch skeleton:
assembly {
// ββ Step 1: Check for receive (no calldata = plain ETH transfer) ββ
if iszero(calldatasize()) {
// Receive logic: accept ETH, maybe emit event, then stop
// log0(0, 0) -- or log with Transfer topic
stop()
}
// ββ Step 2: Extract selector ββ
let selector := shr(224, calldataload(0))
// ββ Step 3: Dispatch ββ
switch selector
case 0x18160ddd {
// totalSupply()
mstore(0x00, sload(0))
return(0x00, 0x20)
}
case 0x70a08231 {
// balanceOf(address)
let account := calldataload(4)
mstore(0x00, account)
mstore(0x20, 1)
mstore(0x00, sload(keccak256(0x00, 0x40)))
return(0x00, 0x20)
}
case 0xa9059cbb {
// transfer(address,uint256)
// ... full implementation
mstore(0x00, 1)
return(0x00, 0x20)
}
default {
// ββ Step 4: Fallback ββ
// Unknown selector: revert (no fallback logic)
revert(0, 0)
}
}
Design decisions for the default branch:
- No fallback:
revert(0, 0)β the safest choice. Prevents accidental calls. - Accept any call:
stop()β dangerous, but used in some proxy patterns. - Forward to another contract: DELEGATECALL in the default branch β this is the Diamond Pattern.
π Intermediate Example: Complete Dispatch with Receive + Fallback
Hereβs a full, compilable contract that accepts ETH, dispatches three functions, and reverts on unknown selectors:
// SPDX-License-Identifier: MIT
pragma solidity ^0.8.28;
contract YulContract {
// Storage layout:
// Slot 0: owner (address)
// Slot 1: balances mapping base
// Slot 2: totalDeposited (uint256)
constructor() {
assembly {
sstore(0, caller()) // set owner
}
}
fallback() external payable {
assembly {
// ββ Receive: plain ETH transfer ββ
if iszero(calldatasize()) {
// Accept ETH, increment totalDeposited
let current := sload(2)
sstore(2, add(current, callvalue()))
stop()
}
// ββ Helper functions ββ
function require(condition) {
if iszero(condition) { revert(0, 0) }
}
function getMappingSlot(key, base) -> slot {
mstore(0x00, key)
mstore(0x20, base)
slot := keccak256(0x00, 0x40)
}
// ββ Dispatch ββ
let selector := shr(224, calldataload(0))
switch selector
case 0x8da5cb5b {
// owner() -> address
mstore(0x00, sload(0))
return(0x00, 0x20)
}
case 0x70a08231 {
// balanceOf(address) -> uint256
let account := calldataload(4)
mstore(0x00, sload(getMappingSlot(account, 1)))
return(0x00, 0x20)
}
case 0xd0e30db0 {
// deposit() -- payable
let depositor := caller()
let amount := callvalue()
require(amount) // must send ETH
// Update balance
let slot := getMappingSlot(depositor, 1)
sstore(slot, add(sload(slot), amount))
// Update total
sstore(2, add(sload(2), amount))
// Return success (empty return)
return(0, 0)
}
default {
// Unknown selector: revert
revert(0, 0)
}
}
}
}
This contract demonstrates the full pattern: receive handling, Yul helper functions, storage operations using Module 3 patterns, and switch-based dispatch. Every piece youβve learned in Modules 1-4 is at work here.
π DeFi Pattern Connection: Dispatch in Production
Where dispatch patterns appear in real protocols:
1. EIP-1167 Minimal Proxy β the entire contract IS a dispatcher:
The minimal proxy is ~45 bytes of raw bytecode. No Solidity, no Yul β pure opcodes. It copies all calldata, DELEGATECALLs to a hardcoded implementation address, and returns or reverts the result.
363d3d373d3d3d363d73<20-byte-impl-addr>5af43d82803e903d91602b57fd5bf3
Annotated bytecode walkthrough:
Opcode(s) Stack (top β right) Purpose
βββββββββ βββββββββββββββββββ βββββββ
36 [cds] CALLDATASIZE β push calldata length
3d [0, cds] RETURNDATASIZE β push 0 (cheaper than PUSH1 0)
3d [0, 0, cds] push 0 again
37 [] CALLDATACOPY(0, 0, cds) β copy all calldata to memory[0]
3d [0] push 0 (retOffset for DELEGATECALL)
3d [0, 0] push 0 (retSize β we'll handle return manually)
3d [0, 0, 0] push 0 (argsOffset β calldata starts at memory[0])
36 [cds, 0, 0, 0] CALLDATASIZE (argsSize)
3d [0, cds, 0, 0, 0] push 0 (value β not used in DELEGATECALL)
73<addr> [impl, 0, cds, 0, 0, 0] PUSH20 implementation address
5a [gas, impl, 0, cds, 0, 0, 0] GAS β forward all remaining gas
f4 [success, ...] DELEGATECALL(gas, impl, 0, cds, 0, 0)
3d [rds, success] RETURNDATASIZE β how much data came back
82 [success, rds, success] DUP3 (success flag)
80 [success, success, rds, ...] DUP1
3e [success] RETURNDATACOPY(0, 0, rds) β copy return data to memory[0]
90 [rds, success] SWAP β put returndatasize below
3d [rds, rds, success] RETURNDATASIZE
91 [success, rds, rds] SWAP2
602b [0x2b, success, rds, rds] PUSH1 0x2b (success JUMPDEST offset)
57 [rds, rds] JUMPI β jump to 0x2b if success != 0
fd [] REVERT(0, rds) β failure: revert with return data
5b [rds] JUMPDEST β success landing
f3 [] RETURN(0, rds) β success: return the data
This ~45-byte contract does what OpenZeppelinβs Proxy.sol does in Solidity β pure dispatch via DELEGATECALL, no selector routing needed. Itβs used everywhere: Uniswap V3 pool clones, Safe wallet proxies, minimal clone factories.
2. Diamond Pattern (EIP-2535) β multi-facet dispatch:
Instead of one big contract, the Diamond splits functions across multiple βfacetsβ (implementation contracts). The dispatch works differently:
// Simplified Diamond dispatch (conceptual)
let selector := shr(224, calldataload(0))
// Look up which facet handles this selector
mstore(0x00, selector)
mstore(0x20, facetMappingSlot)
let facet := sload(keccak256(0x00, 0x40)) // facet address from storage
if iszero(facet) { revert(0, 0) } // no facet registered
// DELEGATECALL to the facet
// (full delegatecall pattern covered in Module 5)
Each facet has its own small dispatch table. The main diamond contract just routes to the right facet. This keeps per-facet dispatch tables small (fast) while allowing unlimited total functions. Reference: Part 1 Module 6 β Proxy Patterns.
3. Soladyβs Assembly Organization:
Solady structures assembly with internal Yul functions for reusable logic:
// Pattern from Solady's ERC20
assembly {
// Utility functions defined first
function _revert(offset, size) { revert(offset, size) }
function _return(offset, size) { return(offset, size) }
// Storage slot functions (consistent naming)
function _balanceSlot(account) -> slot {
mstore(0x0c, account)
mstore(0x00, _BALANCE_SLOT_SEED)
slot := keccak256(0x0c, 0x20)
}
// Dispatch uses these building blocks
switch shr(224, calldataload(0))
case 0xa9059cbb { /* transfer β uses _balanceSlot */ }
// ...
}
Explore the full patterns at github.com/Vectorized/solady β particularly src/tokens/ERC20.sol.
πΌ Job Market Context
βWalk me through how a minimal proxy works at the bytecode levelβ
- Good: βIt copies calldata, DELEGATECALLs to the implementation, and returns or reverts the resultβ
- Great: βThe EIP-1167 proxy is ~45 bytes of raw bytecode with no Solidity. It uses CALLDATASIZE to get input length, CALLDATACOPY to move all calldata to memory at offset 0, then DELEGATECALL to the hardcoded implementation address forwarding all gas. After the call, RETURNDATACOPY moves the response to memory. It checks the success flag with JUMPI β REVERT if false (forwards the error), RETURN if true (forwards the response). Every byte is optimized: RETURNDATASIZE is used instead of PUSH1 0 because it produces zero on the stack for 2 gas and 1 byte, versus 3 gas and 2 bytes for PUSH1 0. The implementation address is embedded directly in the bytecode as a PUSH20 literalβ
π© Red flag: Not knowing that minimal proxies exist or how they save deployment gas (deploying a 45-byte clone vs a full contract)
Pro tip: Be able to decode the 45 bytes from memory β itβs a common interview exercise for L2/infrastructure roles. Practice by reading the EIP-1167 spec and hand-annotating the bytecode
Error Handling Patterns in Yul
This topic was covered in depth in Module 2 β Return Values & Errors. Here we apply those patterns specifically in the dispatch context, where error handling is most critical.
Recap: Reverting with a selector:
// Custom error: Unauthorized() selector = 0x82b42900
mstore(0x00, shl(224, 0x82b42900)) // shift selector to high bytes
revert(0x00, 0x04) // revert with 4-byte selector
Revert with parameters:
// Custom error: InsufficientBalance(uint256 available, uint256 required)
// selector = 0x2e1a7d4d (example)
mstore(0x00, shl(224, 0x2e1a7d4d)) // selector in first 4 bytes
mstore(0x04, availableBalance) // first param at offset 4
mstore(0x24, requiredAmount) // second param at offset 36
revert(0x00, 0x44) // 4 + 32 + 32 = 68 bytes
Pattern: Define require-like functions at the top of your assembly block:
assembly {
// ββ Error selectors ββ
// Unauthorized()
function _revertUnauthorized() {
mstore(0x00, shl(224, 0x82b42900))
revert(0x00, 0x04)
}
// InsufficientBalance(uint256, uint256)
function _revertInsufficientBalance(available, required) {
mstore(0x00, shl(224, 0x2e1a7d4d))
mstore(0x04, available)
mstore(0x24, required)
revert(0x00, 0x44)
}
// ββ Usage in dispatch ββ
switch shr(224, calldataload(0))
case 0xa9059cbb {
// transfer(address,uint256)
let to := calldataload(4)
let amount := calldataload(36)
let bal := sload(/* sender balance slot */)
if lt(bal, amount) {
_revertInsufficientBalance(bal, amount)
}
// ... rest of transfer
}
// ...
}
β οΈ Common Mistakes
- Forgetting to shift the selector left by 224 bits. Storing raw
0x82b42900at memory offset 0 puts it in the low bytes of the 32-byte word.mstorewrites a full 32-byte word, somstore(0x00, 0x82b42900)stores0x0000...0082b42900. You needshl(224, 0x82b42900)to put the selector in the high 4 bytes:0x82b42900000000...00. Alternatively, pre-compute the shifted value as a constant. - Using
revert(0, 0)everywhere. This gives no error information β debugging becomes impossible. Always encode a selector for debuggability. Etherscan, Tenderly, and other tools decode custom errors automatically. - Not bubbling up revert data from sub-calls. When your contract calls another contract and it reverts, you should forward the revert data so the caller sees the original error. This is covered in detail in Module 5 β External Calls.
How to Study
π How to Study Dispatch-Heavy Contracts
-
Start with
cast disassembleorforge inspectto see the dispatch table. Count the JUMPI instructions in the opening section β each one is a selector comparison. -
Count the selectors. More than ~4? The compiler probably used binary search. Fewer? Linear if-chain. In hand-written assembly (Huff, Yul), itβs always linear unless the author implemented something custom.
-
Trace one function call end-to-end: Extract selector from calldata β match in dispatch table β decode arguments from calldata β execute (storage reads/writes) β encode return value β RETURN. This is the complete lifecycle.
-
Compare hand-written vs Solidity-generated dispatch. Compile a simple ERC-20 in Solidity and inspect its bytecode. Then look at Soladyβs ERC-20 or a Huff ERC-20. Note the differences: hand-written code often has fewer safety checks and more optimized selector ordering.
-
Good contracts to study:
- Solady ERC20 β full assembly ERC-20 with Yul dispatch
- Huff ERC20 β ERC-20 in raw opcodes
- OpenZeppelin Proxy.sol β assembly dispatch for proxy forwarding
- EIP-1167 reference β the minimal proxy bytecode
π― Build Exercise: YulDispatcher
Workspace:
- Implementation:
workspace/src/part4/module4/exercise1-yul-dispatcher/YulDispatcher.sol - Tests:
workspace/test/part4/module4/exercise1-yul-dispatcher/YulDispatcher.t.sol
Build a mini ERC-20 entirely in Yul. The contract has a single fallback() function containing your dispatch logic. Storage layout, error selectors, and function selectors are provided as constants β you write all the assembly.
Whatβs provided:
- Storage slot constants (
TOTAL_SUPPLY_SLOT,BALANCES_SLOT,OWNER_SLOT) - Error selectors (
Unauthorized(),InsufficientBalance(uint256,uint256),ZeroAddress()) - Function selectors for the 5 functions youβll implement
- The constructor (sets owner and mints initial supply)
5 TODOs:
- Selector dispatch β Extract the selector from calldata and implement a
switchstatement routing to 5 function selectors. Revert with empty data on unknown selectors. totalSupply()β Load total supply from storage slot 0, ABI-encode it, and return. The simplest function β onesload, onemstore, onereturn.balanceOf(address)β Decode the address argument from calldata, compute the mapping slot using the Module 3 formula (keccak256(key . baseSlot)), load the balance, and return.transfer(address,uint256)β Decode both arguments, validate the sender has sufficient balance (revert withInsufficientBalanceif not), validate the recipient is not zero address, update both balances in storage, and returntrue(ABI-encoded asuint256(1)).mint(address,uint256)β Check that the caller is the owner (revert withUnauthorizedif not), validate the recipient is not zero address, increment the recipientβs balance and the total supply.
π― Goal: Combine calldata decoding (Module 2), storage operations (Module 3), and selector dispatch (this module) into a working contract. All 5 function calls should work identically to a standard Solidity ERC-20.
Run:
FOUNDRY_PROFILE=part4 forge test --match-path "test/part4/module4/exercise1-yul-dispatcher/*"
π― Build Exercise: LoopAndFunctions
Workspace:
- Implementation:
workspace/src/part4/module4/exercise2-loop-and-functions/LoopAndFunctions.sol - Tests:
workspace/test/part4/module4/exercise2-loop-and-functions/LoopAndFunctions.t.sol
Practice Yul functions and loop patterns. Each function has a Solidity signature with an assembly { } body β you write the internals. This exercise focuses on control flow and iteration, not dispatch.
Whatβs provided:
- Function signatures with parameter names
- Return types for each function
- Hints in comments pointing to relevant module sections
5 TODOs:
requireWithError(bool condition, bytes4 selector)β If condition is false, revert with the given 4-byte error selector. This is your reusable guard function.min(uint256,uint256)+max(uint256,uint256)β Implement both using Yul functions. The Solidity wrappers call the Yul functions internally. Use theif lt(a, b)pattern.sumArray(uint256[] calldata)β Loop through a calldata array and return the sum. Youβll need to decode the array offset, read the length, and iterate through elements usingcalldataloadwith computed offsets.findMax(uint256[] calldata)β Loop through a calldata array and return the maximum element. Combine the loop pattern from TODO 3 with themaxYul function from TODO 2.batchTransfer(address[] calldata recipients, uint256[] calldata amounts)β Loop through two parallel calldata arrays, performing storage writes for each pair. Validate that both arrays have the same length. This combines loops, storage (from Module 3), and error handling.
π― Goal: Practice Yul function definition, gas-efficient loops, and calldata array decoding in a controlled environment. Each TODO builds on the previous one.
Run:
FOUNDRY_PROFILE=part4 forge test --match-path "test/part4/module4/exercise2-loop-and-functions/*"
π Summary: Control Flow & Functions
Control Flow:
if condition { }β guard clauses; any nonzero value is true; useiszero()for negation; noelseswitch val case X { } default { }β multi-branch; no fall-through; the βelseβ replacement:switch cond case 0 { else } default { if }for { init } cond { post } { body }β explicit C-like loop; no++, useadd(i, 1); cache lengths; uselt(noleopcode)leaveβ early exit from Yul functions (not top-level assembly); compiles to JUMP- All control flow compiles to JUMP/JUMPI/JUMPDEST sequences β no special opcodes
Yul Functions:
function name(a, b) -> result { }β scoped variables, reduce stack pressure- Multiple returns:
function f(a) -> x, y { } - Small functions are inlined by the optimizer; larger ones become JUMP targets (~20 gas call overhead)
- Stack depth limit of 16 (DUP16/SWAP16 max) β decompose into focused functions to stay under
Function Dispatch:
- Extract selector:
shr(224, calldataload(0)) - if-chain or switch-based dispatch for hand-written contracts (both linear scan, same gas)
- Solidity uses binary search for >4 functions (O(log n) vs O(n))
- Fallback:
defaultbranch of switch; Receive: checkcalldatasize() == 0before dispatch - Minimal proxy (EIP-1167): ~45 bytes, pure DELEGATECALL forwarding, no selector routing
Key numbers:
- JUMP: 8 gas | JUMPI: 10 gas | JUMPDEST: 1 gas
- Selector comparison: EQ(3) + JUMPI(10) = 13 gas per check
- Loop overhead: ~31 gas per iteration (excluding body)
- Stack depth limit: 16 reachable slots (DUP16/SWAP16 max)
- Inlined function call: 0 gas overhead | JUMP-based call: ~20 gas overhead
Next: Module 5 β External Calls β call, staticcall, delegatecall in assembly, returndata handling, and error propagation across contracts.
π Resources
Essential References
- Yul Specification β Official Yul language reference (control flow, functions, scoping rules)
- evm.codes β Interactive opcode reference with gas costs for JUMP, JUMPI, JUMPDEST
- EVM Playground β Step through bytecode execution to see JUMP/JUMPI in action
EIPs Referenced
- EIP-1167: Minimal Proxy Contract β Clone factory standard (the 45-byte dispatcher)
- EIP-2535: Diamond Standard β Multi-facet proxy with selector-to-facet dispatch
Production Code
- Solady β Gas-optimized Solidity/assembly library; study
src/tokens/ERC20.solfor dispatch patterns - OpenZeppelin Proxy.sol β Proxy dispatch implemented in Solidity inline assembly
- Huff ERC-20 β Full ERC-20 in raw opcodes (no Yul, no Solidity)
Tools
forge inspect Contract ir-optimizedβ View the Yul IR output to see how Solidity compiles dispatch logiccast disassembleβ Decode deployed bytecode to human-readable opcodescast sig "transfer(address,uint256)"β Compute the 4-byte function selector from a signaturecast 4byte 0xa9059cbbβ Reverse-lookup a selector to its function signature
Navigation: Previous: Module 3 β Storage Deep Dive | Next: Module 5 β External Calls