The System Design Interview [Part 3] — Back Of The Envelope Calculations

Storm Anning

4 min readJan 21, 2022

Propose and justify back of the envelope estimations based on the defined requirements and design.

Tips

Data-Driven Decisions — Make sure your estimations go on to influence your design decisions.
Guesstimates/Buffers —In the interest of time, be pragmatic with your precision; interviewers are typically happy with ballpark numbers. Build in buffers and consider issues or changes that could affect your estimates.
Number Crunching — Have useful numbers to hand and be comfortable converting between units. I’ve included some useful resources at the end of this note.
Resource Sizes — Understand typical modern resource sizes e.g. for AWS instances (RAM, CPU, Bandwidth).

Common Topics

Behaviour

How does the system behave? How will users behave?
How often is each operation performed? At what ratios? At what rate?
What are the downstream effects? Fan-out requests?

Traffic

Throughput — QPS for reads and writes? per feature? per API? per resource? per geographical location?
Peak traffic distributions/loads — consider this over specific windows or regions

Storage

Entity Storage — # of objects we need to store * size of each object
Auxiliary Data-Structure storage costs (including backing up in-memory structures)
Storage size at rest on disk
—Consider: Retention, archiving, back-ups
Buffer room for growth, per day, per year, per decade

Bandwidth

Ingress/egress—data coming in and out of the system per second
Distributions/peak loads

Memory

What kind of data should be cached?
What ratio of hot data should be cached (80–20)?
How much memory (RAM) do we need? How many machines?

CPU

How many cores? Threads? GHz needed?
How many ops per second?

Other Resources

Connections
—Max # open per second, size of pool, timeout constraints
Disks/SDD’s
—Write/read speeds
—Replication, RAID, backups, etc.
Resource Locking
GPU
I/O
Network
—Packet sizes, client upload/download speeds/bandwidth?
Electrical power

Latencies

Acceptable latency of this system for each operation?
Acceptable latency for each node/edge in this system?
Acceptable latency for writes? Consistency delays?

Verification Questions

Does your analysis consider ‘worst case’ scenarios?
Unavailability? Peak loads? Seasonal events?
Are the quantitative assumptions you’ve made reasonable and verified?
Does this analysis hold over the next X years?
When will we outgrow resources? Storage? Data types? CPU?
Look at the design, what resources are stretched here?
How can we mitigate risks?
What optimizations can we introduce to be more efficient?

Number Crunching

Below are some useful resources you can use when performing estimations.

Handling Bytes

Powers of Two

Byte Conversions

With each, simply multiply by 1024. To convert kilobytes to terabytes simply divide your value by 1024 four times (1,024⁴).

Latency Numbers Every Programmer Should Know

Use these numbers to validate the performance of your design:

Time Conversions

1 ns (nanoseconds) = 10^-9 seconds
1 μs (microseconds) = 10^-6 seconds = 1,000 ns
1 ms (milliseconds) = 10^-3 seconds = 1,000 us = 1,000,000 ns
1 s (seconds) = 10¹ = 1,000 ms

Based on the metrics above you can:

Read sequentially from HDD at 30 MB/s
Read sequentially from 1 Gbps Ethernet at 100 MB/s
Read sequentially from SSD at 1 GB/s
Read sequentially from main memory at 4 GB/s
Make 6–7 world-wide round trips per second
Make 2,000 round trips per second within a datacenter

Typical Modern Resources Sizes

General Purpose Amazon EC2 Instances

m6g.medium, 4 GiB of Memory, 1 vCPU, EBS only, 64-bit Arm platform
m6g.16xlarge, 256 GiB of Memory, 64 vCPUs, EBS only, 64-bit Arm platform
Custom built AWS Graviton2 Processor with 64-bit Arm Neoverse cores
—Where 1 hertz = 1 cycle per second
—2.5GHZ (Gigahertz) — two billion cycles per second
Support for Enhanced Networking with Up to 25 Gbps of Network bandwidth

Compute Optimised Amazon EC2 Instances

c5a.24xlarge, 192 GiB of Memory, 96 vCPUs, EBS only, 64-bit platform

Memory Optimised Amazon EC2 Instances

r6g.16xlarge, 512 GiB of Memory, 64 vCPUs, EBS only, 64-bit Arm platform
x1.32xlarge: 1,952 GiB of memory, 128 vCPUs, 2 x 1,920 GB of SSD-based instance storage, 64-bit platform, 20 Gigabit Ethernet

Storage Optimised Amazon EC2 Instances

i3en.large: 16 GiB of memory, 2 vCPUs, 1 x 1.25TB NVMe SSD, 64-bit platform
i3en.24xlarge: 768 GiB of memory, 96 vCPUs, 8 x 7.5TB NVMe SSD, 64-bit platform
Connections — about 1000 concurrent connections