CMU Computer Systems: Network Programming (Part I)

154 阅读4分钟

A Client-Server Transaction

  • Most network applications are based on the client-server model:
    • A server process and one or more client processes
    • Server manages some resource
    • Server provides service by manipulating resource for clients
    • Server activated by request from client (vending machine analogy)
  • Computer Networks
    • A network is a hierarchical system of boxes and wires organized by geographical proximity
      • SAN (System Area Network) spans cluster or machine room
        • Switched Ethernet, Quadrics QSW, …
      • LAN (Local Area Network) spans a building or campus
        • Ethernet is most prominent example
      • WAN (Wide Area Network) spans country or world
        • Typically high-speed point-to-point phone lines
    • An internetwork (internet) is an interconnected set of networks
      • The Global IP Internet (uppercase "I") is the most famous example of an internet (lowercase "i")

Lowest Level: Ethernet Segment

  • Ethernet segment consists of a collection of hosts connected by wires (twisted pairs) to a hub
  • Spans room or floor in a building
  • Operation
    • Each Ethernet adapter has a unique 48-bit address (MAC address)
    • Hosts send bits to any other host in chunks called frames
    • Hub slavishly copies each bit from each port to every other port

Next Level: Bridged Ethernet Segment

  • Spans building or campus
  • Bridges cleverly learn which hosts are reachable from ports and then selectively copy frames from port to port

Next Level: Internets

  • Multiple incompatible LANs can be physically connected by specialized computers called routers
  • The connected networks are called an internet

Logical Structure of an internet

  • Ad hoc interconnection of networks
    • No particular topology
    • Vastly different router & link capacities
  • Send packets from source to destination by hopping the networks
    • Router forms bridge from one network to another
    • Different packets may take different routes

Internet Protocol

  • Notion
    • Protocol is a set of rules that governs how hosts and routers should cooperate when they transfer data from network to network,
    • Smooths out the differences between the different networks
  • Do what
    • Provides a naming scheme
      • An internet protocol defines a uniform format for host addresses
      • Each host (and router) is assigned at least one of these internet addresses that uniquely identifies it
    • Provides a delivery mechanism
      • An internet protocol defines a standard transfer unit (packet)
      • Packet consists of header and payload
        • Header: contains info such as packet size, source and destination addresses
        • Payload: contains data bits sent from source host

Global IP Internet (upper case)

  • Most famous example of an internet
  • Based on the TCP/IP protocol family
    • IP (Internet Protocol)
      • Provides basic naming schema and unreliable delivery capability of packets (datagrams) from host-to-host
    • UDP (Unreliable Datagram Protocol)
      • Uses IP to provide unreliable datagram delivery from process-to-process
    • TCP (Transmission Control Protocol)
      • Uses IP to provide reliable bytes streams from process-to-process over connections
  • Accessed via a mix of Unix file I/O and functions from sockets interface

IP Addresses

  • 32-bit IP addresses are stored in an IP address struct
    • IP addresses are always stored in memory in network byte order (big0endian byte order)
    • True in general for any integer transferred in a packet header from one machine to another

Domain Naming System (DNS)

  • The Internet maintains a mapping between IP addresses and domain names in a huge worldwide distributed database called DNS
  • Conceptually, programmers can view the DNS database as a collection of millions of host entries
    • Each host entry defines the mapping between a set of domain names and IP addresses
    • In a mathematical sense, a host entry is an equivalence class of domain names and IP addresses

Properties of DNS Mappings

  • Stuff
    • Can explore properties of DNS mappings using nslookup
      • Output edited for brevity
    • Each host has a locally defined domain name locallhost which always maps to the loopback address 127.0.0.1
    • Use hostname to determine real domain name of local host
  • Cases
    • One-to-one mapping between domain name and IP address
    • Multiple domain names mapped to the same IP address
    • Multiple domain names mapped to multiple IP addresses
    • Some valid domain names don’t map to any IP address

Internet Connections

  • Clients and servers communicate by sending streams of bytes over connections. Each connection is
    • Point-to-point: connects a pair of processes
    • Full-duplex: data can flow in both directions at the same time
    • Reliable: stream of bytes sent by the source is eventually received by the destination in the same order it was sent
  • A socket is an endpoint of a connection
    • Socket address is an IPaddress:port pair
  • A port is a 16-bit integer that identifies a process
    • Ephemeral port: Assigned automatically by client kernel when client makes a connection request
    • Well-known port: Associated with some service provided by a port

Sockets

  • What is a socket
    • To the kernel, a socket is an endpoint of communication
    • To an application, a socket is a file descriptor that lets the application read/write from/to the network
  • Clients and servers communicate with each other by reading from and writing to socket descriptors
  • The main distinction between regular file I/O and socket I/O is how the application "opens" the socket descriptor

Socket Address Structures

  • Generic socket address
    • For address arguments to connect, bind, and accept
    • Necessary only because C did not have generic pointers when the sockets interface was designed
    • For casting convenience, we adopt the Stevens convention
      • typedef struct sockaddr SA;
  • Internet-specific socket address
    • Must cast (struct sockaddr_in *) to (struct sockaddr *) for functions that take socket address arguments

Sockets Interface

  • Set of system-level functions used in conjunction with Unix I/O to build network applications
  • socket
    • Clients and servers use the socket function to create a socket descriptor
  • bind
    • A server uses bind to ask the kernel to associate the server's socket address with a socket descriptor
    • The process can read bytes that arrive on the connection whose endpoint is addr by reading from descriptor sockfd
    • Similarly, writes to sockfd are transferred along connection whose endpoint is addr
  • listen
    • By default, kernel assumes that descriptor from socket function is an active socket that will be on the client end of a connection
    • A server calls the listen function to tell the kernel that a descriptor will be used by a server rather than a client
    • Converts sockfd from an active socket to a listening socket that can accept connection requests from clients
    • backlog is a hint about the number of outstanding connection requests that the kernel should queue up before starting to refuse requests
  • accept
    • Servers wait for connection requests from clients by calling accept
    • Waits for connection request to arrive on the connection bound to listenfd, then fills in client's socket address in addr and size of the socket address in addrlen
    • Returns a connected descriptor that can be used to communicate with the client via Unix I/O routines
  • connect
    • A client establishes a connection with a server by calling connect
    • Attempts to establish a connection with server at socket address addr
  • getaddrinfo
    • The modern way to convert string representations of hostnames, host addresses, ports, and service names to socket address structures
    • Given host and service, getaddrinfo returns result that points to a linked list of addrinfo structs, each of which points to a corresponding socket address struct, and which contains arguments for the sockets interface functions
    • Advantages
      • Reentrant (can be safely used by threaded programs)
      • Allow us to write protable protocol-independent code
    • Disadvantages
      • Somewhat complex
      • A small number of usage patterns suffice in most levels
  • getnameinfo
    • The inverse of getaddrinfo, converting a socket address to the corresponding host and service
      • Replaces obsolete gethostbyaddr and getservbyport funcs
      • Reentrant and protocol independent