The Plumbing Beneath Every Backend System #
For most developers, networking is just like plumbing. It’s an abstraction that works silently until it catastrophically fails. For backend engineers, networking is the bedrock we build on. A shallow understanding of protocol trade-offs or distributed system failure modes guarantees a future filled with obscure bugs, brittle architectures and worst of all, painful debugging sessions.
Hence, at the very least, it is important for backend engineers to have at least foundational knowledge of the network stack, from the inherent trade-offs that protocols enforce and how they manifest as critical engineering challenges in the real world. Let’s dive in.
The Foundation: VPC, Subnets and Cloud Architecture #
At the bottom of the stack is the Internet Protocol (IP), the fundamental addressing scheme. For cloud-native engineering, mastering its segmentation is the blueprint for a secure and resilient system. This begins with the Virtual Private Cloud (VPC).

The VPC: Your Private Slice of the Cloud #
Think of a VPC as your own logically isolated, private data center within the cloud. When you create a VPC, you claim a large, private IP address range using CIDR (Classless Inter-Domain Routing) notation. A common choice is 10.0.0.0/16, which gives you the entire block from 10.0.0.0 to 10.0.255.255—a total of 65,536 private IP addresses to work with.
Subnets: Carving Up Your VPC for Purpose #
A subnet is a logical slice of your VPC’s IP address range. You don’t just place resources randomly into a giant /16 block; you carve it into smaller, purpose-built subnets. This is where security and architecture truly begin.
The Mechanics: CIDR Masks and Usable IPs
Using CIDR notation, you define the size of your subnet. If your VPC is 10.0.0.0/16, you might create subnets like:
10.0.1.0/2410.0.2.0/24
A /24 CIDR block means the first 24 bits of the 32-bit IP address are fixed as the network’ portion, leaving the remaining 8 bits for host addresses. This is equivalent to a subnet mask of 255.255.255.0.
Mathematically, 8 host bits give you 2⁸ = 256 addresses. However, you can never use all of them. Cloud providers reserve a few addresses in every subnet for networking functions. In AWS, the first four and the last IP address are always reserved:
10.0.1.0: Network Address. Identifies the subnet itself.10.0.1.1: VPC Router. The internal default gateway for the subnet.10.0.1.2: DNS Resolver. The AWS-provided DNS server.10.0.1.3: Reserved for future use.10.0.1.255: Network Broadcast Address. Not used in VPCs but reserved.
The practical takeaway: A /24 subnet provides 251 usable IP addresses for your resources (EC2 instances, containers, etc.).
Public vs. Private Subnets: A Matter of Routing #
The distinction between a ‘public’ and ‘private’ subnet has nothing to do with its IP address range (both use private IPs). It boils down to one simple question: how does it reach the internet? This is controlled by the subnet’s associated Route Table.
-
Public Subnet: A subnet is ‘public’ if its route table has a route to an Internet Gateway (IGW). An IGW is a horizontally scaled, redundant VPC component that allows two-way communication between your VPC and the public internet. An instance in a public subnet must have a public IP address or an Elastic IP to be reachable from the internet.
-
Private Subnet: A subnet is ‘private’ if it does not have a direct route to an IGW. Instances here cannot be reached from the public internet, nor can they reach it directly.
How can a service in a private subnet call an external API?The NAT Gateway. A Network Address Translation (NAT) Gateway is a managed AWS service that you place in a public subnet. You then add a route to the private subnet’s route table that directs all internet-bound traffic (
0.0.0.0/0) to the NAT Gateway. We will discuss NAT further in the following sections.
The Architectural Blueprint #
This public/private subnet design is the foundation of secure cloud architecture. Consider a standard three-tier web application:
- Presentation Tier (Public Subnet): Place your public-facing Application Load Balancer (ALB) here. Its job is to accept traffic from the internet via the IGW and forward it to your application tier.
- App Tier (Private Subnet): Place your EC2 instances or ECS containers running your application logic here. They are not directly addressable from the internet, protecting them from attack. They receive traffic only from the ALB. If they need to call an external API (e.g., Stripe), their traffic is routed through the NAT Gateway.
- Data Tier (Private Subnet): Place your RDS database instance in an even more isolated private subnet. This subnet’s route table has no route to a NAT Gateway because a database should never initiate an outbound connection. Its Network Access Control Lists (NACLs) and Security Groups are configured to only allow traffic from the application subnet’s IP range on the database port (e.g., 5432 for PostgreSQL).
This layered approach turns the network into a security control. Even if a web server gets compromised, it can’t directly exfiltrate data from your database.
TCP vs. UDP: The Grand Trade-off #
At the transport layer, TCP and UDP represent two opposing design philosophies. Your choice defines your application’s relationship with reliability and performance.
TCP: Reliability First #
TCP is a connection-oriented protocol built for guaranteed, ordered, error-checked delivery. Its guarantees are not free; they are paid for in latency via mechanisms like the Three-Way Handshake (SYN→SYN-ACK→ACK), sequence numbers for ordering and sophisticated congestion control algorithms.
TCP should be used when data integrity and order are non-negotiable.
- Web Traffic (HTTP/S): Every byte of a webpage or API response must arrive correctly and in order.
- Database Connections: Corrupted or out-of-order query data is unacceptable.
- File Transfers (FTP, SCP): A single corrupted bit can render a file useless.
- Remote Shell (SSH): Commands must be executed exactly as sent.
UDP: Speed First #
UDP is a connectionless, fire-and-forget protocol optimised for speed and minimal overhead. It wraps your data in a tiny 8-byte header and hands it off, offering zero guarantees of delivery, order, or integrity.
UDP should be used when timeliness is more important than perfect reliability.
- Online Gaming: A player’s position needs to be updated now. A late packet is worse than a lost one, which can be interpolated.
- Voice over IP (VoIP): A momentary audio drop is preferable to a long, delayed buffer.
- Live Video Streaming: Similar to VoIP, maintaining the live feed is prioritised over retransmitting a dropped frame from seconds ago.
- Modern Protocols like QUIC: The foundation of HTTP/3, QUIC uses UDP and implements its own superior reliability and congestion control mechanisms at the application layer.
NAT: How Private Servers Borrow a Public Identity #
Network Address Translation (NAT) is what allows many machines in a private network to access the wider internet through a single public IP. Without it, every EC2 instance or laptop in your house would need its own globally unique IP — and IPv4 simply doesn’t have enough of those.
How it works:
- Your private instance (say
10.0.2.5) sends a packet to142.250.191.100(Google). - The NAT device rewrites the source IP
10.0.2.5to its own public IP203.0.113.42and keeps a mapping in a table:203.0.113.42:40001→10.0.2.5:54321 - When Google replies, NAT looks up the mapping and sends the packet back to the correct private machine.
DNS: The Internet’s Fragile, Distributed Phonebook #
Remembering 172.217.13.142 is painful. Humans want www.google.com. Domain Name System (DNS) is the distributed database that makes that translation, such as api.example.com to 192.0.2.123. A first-time lookup is a latent, multi-step process involving recursive and iterative queries.
The Resolution Journey (in a nutshell):
- Stub Resolver (your laptop/app): Asks “what is
api.example.com?” - Recursive Resolver (your ISP or
8.8.8.8): Checks its cache. If empty, it walks the hierarchy:
- Root Servers (.) → “ask
.comservers” - TLD Servers (.com) → “ask
example.com’s nameservers” - Authoritative Servers (
ns1.example.com) → “the answer is192.0.2.123”
- The resolver caches the answer and returns it to you.
Record types to note:
- A: Maps a hostname to an IPv4 address.
- AAAA: Maps a hostname to an IPv6 address.
- CNAME: An alias, pointing one name to another (e.g.,
www.example.comtoexample.com). - MX: Specifies the mail servers for a domain.
- TXT: Holds arbitrary text, commonly used for domain verification (e.g., SPF records).
Use an Anycast DNS provider (Cloudflare, AWS Route 53). They announce your DNS records from dozens of global data centers, routing users to the nearest one. This cuts latency and provides massive resilience against DDoS attacks.
Securing the Channel: TLS and the Handshake #
Plain HTTP traffic in 2025 isn’t just frowned upon, at this point it’s malpractice. HTTPS provides three guarantees by layering HTTP over Transport Layer Security (TLS):
- Encryption: Nobody can eavesdrop.
- Authentication: You know who you’re talking to.
- Integrity: No one tampers with your data in flight.
It achieves this with a clever two-phase dance: use expensive asymmetric cryptography (public/private keys) once, to agree on secrets, then switch to cheap, fast symmetric keys for all actual data.
There are three main TLS versions, tho only two are relevant today:
- TLS 1.0 / 1.1: old and deprecated.
- TLS 1.2: still common today, but requires two network round trips (2-RTT) to establish a session.
- TLS 1.3, the modern standard: faster (only 1-RTT to set up, with 0-RTT possible on resumption) and safer (simpler, removes weak ciphers, forward secrecy by default).
Means how many back-and-forth exchanges are needed before the secure channel is ready. Fewer RTTs = faster page loads and API calls.
sequenceDiagram
participant Client
participant Server
Client->>Server: ClientHello + KeyShare (supported versions, ciphers, ECDHE key)
Server-->>Client: ServerHello + KeyShare (chosen cipher, ECDHE key)
Server-->>Client: [Encrypted] Certificate, CertificateVerify, Finished
Client->>Client: 🔎 Validate Certificate (CA chain, hostname, validity, revocation)
Client->>Server: Finished + First Encrypted Request (e.g. GET /)
Server-->>Client: Encrypted HTTP Response
Note over Client,Server: 🔐 Secure channel established (AEAD, forward secrecy)
What’s happening here:
-
ClientHello: the client proposes TLS parameters (supported versions, cipher suites, extensions) and sends its ephemeral ECDHE public key \( g^a \). The client private key, represented as exponent \( a \) never leaves the client.
-
ServerHello: the server chooses a TLS version and cipher suite and similarly, sends its own ephemeral ECDHE public key \( g^b \). With the keys exchanged, both sides can compute the shared secret:
- Client computes \( (g^b)^a = g^{ab} \)
- Server computes \( (g^a)^b = g^{ab} \)
Both arrive at the same Diffie–Hellman shared secret without ever exposing their private keys.
-
Server messages (encrypted under handshake traffic keys): from the shared secret \( g^{ab} \), handshake traffic keys are derived. Using those, the server sends:
- Certificate: a TLS certificate (often called an “SSL certificate”) containing the server’s long-term public key, signed by a trusted CA.
- CertificateVerify: a digital signature proving the server controls the private key corresponding to the certificate’s public key.
- Finished: a MAC that proves the server’s view of the handshake so far.
-
Client validation: the client decrypts the handshake messages, then validates:
- The server’s certificate chain against its local trust store of CA public keys.
- That the hostname matches, the certificate is within its validity period and it has not been revoked.
- The
CertificateVerifysignature (server controls its certificate’s private key). - The server’s
FinishedMAC.
-
Client Finished: after successful validation, the client sends its own
Finishedmessage, encrypted with the handshake traffic keys. This proves the client derived the same shared secret and commits to the transcript. This message can include the first encrypted HTTP request (e.g.,GET /). -
Application data: both sides derive application traffic keys from the shared secret
g^(ab)and the handshake transcript. From this point onward, application data is encrypted with symmetric AEAD ciphers such as AES-GCM or ChaCha20-Poly1305.
TLS 1.2 vs TLS 1.3: Key Differences #
- Round trips: TLS 1.2 takes two RTTs to establish; TLS 1.3 needs only one (and can use 0-RTT on resumption).
- Cipher suites: TLS 1.2 supported many, including weak/broken options (CBC, RC4). TLS 1.3 mandates modern AEAD ciphers only.
- Forward secrecy: optional in TLS 1.2, mandatory in TLS 1.3 (always ephemeral keys).
- Handshake encryption: in TLS 1.2, most of the handshake was plaintext; TLS 1.3 encrypts nearly everything after
ServerHello.
Mutual TLS (mTLS) #
For service-to-service traffic inside a zero-trust environment, you don’t just want the client to verify the server, the server should also verify the client. That’s Mutual TLS:
- The server requests a certificate from the client.
- The client responds with its own certificate and a
CertificateVerify. - Now both parties have cryptographic proof of each other’s identity.
mTLS replaces brittle API keys or tokens with strong, short-lived, cryptographically enforced identities. It’s a cornerstone of secure microservice communication.
API Paradigms: Choosing Your Interaction Model #
The choice between REST, RPC and GraphQL has profound architectural implications.
REST: The Lingua Franca #
REST is an architectural style leveraging standard HTTP to operate on resources. It is the default for public-facing services.
-
When to Use: Public APIs where broad compatibility, standardization and cacheability are paramount. Think of the public APIs of Stripe, Twilio, or GitHub (v3).
-
Weakness (The Under-fetching Problem): To display a user’s profile and their blog post titles, a client must make multiple requests, creating a request waterfall.
- First, get the user’s details:
Response:
GET /api/users/123 HTTP/1.1 Host: api.example.com{ "id": "123", "name": "Alice Smith", "postIds": ["p1", "p2", "p3"] } - Then, get the post details (another round-trip):
Response:
GET /api/users/123/posts HTTP/1.1 Host: api.example.com[ { "id": "p1", "title": "My First Post", "content": "..." }, { "id": "p2", "title": "Another Thought", "content": "..." } ]
- First, get the user’s details:
gRPC: High-Performance RPC for Microservices #
The RPC paradigm focuses on executing remote functions. gRPC (Google RPC) is its modern standard-bearer, purpose-built for high-performance internal communication.
-
When to Use: The high-performance internal communication backbone for companies like Netflix and Uber. Ideal for low-latency, contract-driven communication between microservices.
-
How it Looks: You define a contract in a
.protofile, which generates both client and server code.Proto Definition:
syntax = "proto3"; service UserService { rpc GetUserWithPosts(GetUserRequest) returns (UserResponse); } message GetUserRequest { string user_id = 1; } message PostSummary { string post_id = 1; string title = 2; } message UserResponse { string user_id = 1; string name = 2; repeated PostSummary posts = 3; }Conceptual Client Call (strongly typed):
# This code is generated from the .proto file request = GetUserRequest(user_id="123") # The network call looks like a local function call response = user_stub.GetUserWithPosts(request)
GraphQL: Client-Driven Data Fetching #
GraphQL is a query language that solves REST’s fetching problem by letting the client specify the exact shape of the data it needs in a single request.
-
When to Use: Powering complex frontends like Facebook’s mobile app feed or providing flexible data access via public APIs like GitHub’s v4 API. Perfect when you need to aggregate data from multiple backend services for a UI.
-
How it Works: The client sends a single
POSTrequest to/graphqlwith a query.Client Query:
{ "query": " query GetUserAndPostTitles($userId: ID!) { user(id: $userId) { name posts { title } } } ", "variables": { "userId": "123" } }Server Response (perfectly matching the query):
{ "data": { "user": { "name": "Alice Smith", "posts": [ { "title": "My First Post" }, { "title": "Another Thought" } ] } } }With one request, the client gets exactly what it needed—no more, no less. This shifts complexity from the client to the backend but provides unparalleled flexibility.