The System Design Interview [Part 3] — Back Of The Envelope Calculations

Storm Anning
4 min readJan 21, 2022

--

Propose and justify back of the envelope estimations based on the defined requirements and design.

Tips

  • Data-Driven Decisions — Make sure your estimations go on to influence your design decisions.
  • Guesstimates/Buffers —In the interest of time, be pragmatic with your precision; interviewers are typically happy with ballpark numbers. Build in buffers and consider issues or changes that could affect your estimates.
  • Number Crunching — Have useful numbers to hand and be comfortable converting between units. I’ve included some useful resources at the end of this note.
  • Resource SizesUnderstand typical modern resource sizes e.g. for AWS instances (RAM, CPU, Bandwidth).

Common Topics

Behaviour

  • How does the system behave? How will users behave?
  • How often is each operation performed? At what ratios? At what rate?
  • What are the downstream effects? Fan-out requests?

Traffic

  • Throughput — QPS for reads and writes? per feature? per API? per resource? per geographical location?
  • Peak traffic distributions/loads — consider this over specific windows or regions

Storage

  • Entity Storage — # of objects we need to store * size of each object
  • Auxiliary Data-Structure storage costs (including backing up in-memory structures)
  • Storage size at rest on disk
    —Consider: Retention, archiving, back-ups
  • Buffer room for growth, per day, per year, per decade

Bandwidth

  • Ingress/egress—data coming in and out of the system per second
  • Distributions/peak loads

Memory

  • What kind of data should be cached?
  • What ratio of hot data should be cached (80–20)?
  • How much memory (RAM) do we need? How many machines?

CPU

  • How many cores? Threads? GHz needed?
  • How many ops per second?

Other Resources

  • Connections
    —Max # open per second, size of pool, timeout constraints
  • Disks/SDD’s
    —Write/read speeds
    —Replication, RAID, backups, etc.
  • Resource Locking
  • GPU
  • I/O
  • Network
    —Packet sizes, client upload/download speeds/bandwidth?
  • Electrical power

Latencies

  • Acceptable latency of this system for each operation?
  • Acceptable latency for each node/edge in this system?
  • Acceptable latency for writes? Consistency delays?

Verification Questions

  • Does your analysis consider ‘worst case’ scenarios?
  • Unavailability? Peak loads? Seasonal events?
  • Are the quantitative assumptions you’ve made reasonable and verified?
  • Does this analysis hold over the next X years?
  • When will we outgrow resources? Storage? Data types? CPU?
  • Look at the design, what resources are stretched here?
  • How can we mitigate risks?
  • What optimizations can we introduce to be more efficient?

Number Crunching

Below are some useful resources you can use when performing estimations.

Handling Bytes

Powers of Two

Byte Conversions

With each, simply multiply by 1024. To convert kilobytes to terabytes simply divide your value by 1024 four times (1,024⁴).

Latency Numbers Every Programmer Should Know

Use these numbers to validate the performance of your design:

Time Conversions

  • 1 ns (nanoseconds) = 10^-9 seconds
  • 1 μs (microseconds) = 10^-6 seconds = 1,000 ns
  • 1 ms (milliseconds) = 10^-3 seconds = 1,000 us = 1,000,000 ns
  • 1 s (seconds) = 10¹ = 1,000 ms

Based on the metrics above you can:

  • Read sequentially from HDD at 30 MB/s
  • Read sequentially from 1 Gbps Ethernet at 100 MB/s
  • Read sequentially from SSD at 1 GB/s
  • Read sequentially from main memory at 4 GB/s
  • Make 6–7 world-wide round trips per second
  • Make 2,000 round trips per second within a datacenter

Typical Modern Resources Sizes

General Purpose Amazon EC2 Instances

  • m6g.medium, 4 GiB of Memory, 1 vCPU, EBS only, 64-bit Arm platform
  • m6g.16xlarge, 256 GiB of Memory, 64 vCPUs, EBS only, 64-bit Arm platform
  • Custom built AWS Graviton2 Processor with 64-bit Arm Neoverse cores
    —Where 1 hertz = 1 cycle per second
    —2.5GHZ (Gigahertz) — two billion cycles per second
  • Support for Enhanced Networking with Up to 25 Gbps of Network bandwidth

Compute Optimised Amazon EC2 Instances

  • c5a.24xlarge, 192 GiB of Memory, 96 vCPUs, EBS only, 64-bit platform

Memory Optimised Amazon EC2 Instances

  • r6g.16xlarge, 512 GiB of Memory, 64 vCPUs, EBS only, 64-bit Arm platform
  • x1.32xlarge: 1,952 GiB of memory, 128 vCPUs, 2 x 1,920 GB of SSD-based instance storage, 64-bit platform, 20 Gigabit Ethernet

Storage Optimised Amazon EC2 Instances

  • i3en.large: 16 GiB of memory, 2 vCPUs, 1 x 1.25TB NVMe SSD, 64-bit platform
  • i3en.24xlarge: 768 GiB of memory, 96 vCPUs, 8 x 7.5TB NVMe SSD, 64-bit platform
  • Connections — about 1000 concurrent connections

--

--

Storm Anning
Storm Anning

No responses yet