译者前言
使用Python开发web应用非常方便,有很多成熟的框架,比如Flask,Django等等。而这个系列文章是从零开始构建,从中可以学习HTTP协议以及很多原理知识,这对深入理解Web应用的开发非常有帮助。目前,本系列文章共4篇,这是第一篇的译文。
我将使用Python从零开始构建一个web应用(以及它的web服务器),本文是这个系列文章的首篇。为了完成这个系列,唯一的依赖就是Python标准库,并且我会忽略WSGI标准。
言归正传,我们马上开始!
Web服务器
首先,我们将编写一个HTTP服务器用于运行我们的web应用。但是,我们先要花一点时间了解一下HTTP协议的工作原理。
HTTP如何工作
简单来说,HTTP客户端通过网络连接HTTP服务器,并且向它们发送包含字符串数据的请求。服务器会解析这些请求,并且向客户端返回一个响应。整个协议以及请求和响应的格式在RFC2616 中详细的介绍,而我会在本文中通俗地讲解一下,所以你无需阅读整个协议的文档。
请求格式
请求是由一些由\r\n
分隔的行来表示,第一行叫做“请求行”。请求行由以下部分组成:HTTP方法,后跟一个空格,再后跟文件的请求路径,再后跟一个空格,然后是客户端指定的HTTP协议的版本,最后是回车\r
和换行\n
符。
GET /some-path HTTP/1.1\r\n
请求行之后,可能会有零个或者多个请求头。每个请求头都由以下内容组成:一个请求头名称,后跟冒号,然后是可选值,最后是\r\n
:
Host: example.com\r\n
Accept: text/html\r\n
使用空行来标记请求头的结束:
\r\n
最后,请求可能包含一个请求体——一个任意的有效负荷,随着这个请求发向服务器。
将上述内容汇总一下,得到一个简单的GET
请求:
GET / HTTP/1.1\r\n
Host: example.com\r\n
Accept: text/html\r\n
\r\n
以下是一个带有请求体的POST
请求:
POST / HTTP/1.1\r\n
Host: example.com\r\n
Accept: application/json\r\n
Content-type: application/json\r\n
Content-length: 2\r\n
\r\n
{}
响应格式
响应,和请求类似,也是由一些\r\n
分隔的行组成。响应的首行叫做“状态行”,它包含以下信息:HTTP协议版本,后跟一个空格,后跟响应状态码,后跟一个空格,然后是状态码的信息,最后还是\r\n
:
HTTP/1.1 200 OK\r\n
状态行之后是响应头,然后是一个空行,再就是可选的响应体:
HTTP/1.1 200 OK\r\n
Content-type: text/html\r\n
Content-length: 15\r\n
\r\n
<h1>Hello!</h1>
一个简单的服务器
根据我们目前对协议的了解,让我们来编写一个服务器,该服务器不管接受什么请求都返回相同的响应。
我们需要创建一个套接字,将其绑定到一个地址,然后开始监听连接:
import socket
HOST = "127.0.0.1"
PORT = 9000
# By default, socket.socket creates TCP sockets.
with socket.socket() as server_sock:
# This tells the kernel to reuse sockets that are in `TIME_WAIT` state.
server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
# This tells the socket what address to bind to.
server_sock.bind((HOST, PORT))
# 0 is the number of pending connections the socket may have before
# new connections are refused. Since this server is going to process
# one connection at a time, we want to refuse any additional connections.
server_sock.listen(0)
print(f"Listening on {HOST}:{PORT}...")
如果你现在就运行代码,它将输出它在监听127.0.0.1:9000
,立马就结束了。为了能够处理来的连接,我们需要调用套接字的accept
方法。这样做就可以阻塞处理过程直到有一个客户端连接到我们的服务器。
with socket.socket() as server_sock:
server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_sock.bind((HOST, PORT))
server_sock.listen(0)
print(f"Listening on {HOST}:{PORT}...")
client_sock, client_addr = server_sock.accept()
print(f"New connection from {client_addr}.")
一旦我们有一个套接字连接到客户端,我们就可以开始和它通信。使用sendall
方法,向客户端发送响应:
RESPONSE = b"""\
HTTP/1.1 200 OK
Content-type: text/html
Content-length: 15
<h1>Hello!</h1>""".replace(b"\n", b"\r\n")
with socket.socket() as server_sock:
server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_sock.bind((HOST, PORT))
server_sock.listen(0)
print(f"Listening on {HOST}:{PORT}...")
client_sock, client_addr = server_sock.accept()
print(f"New connection from {client_addr}.")
with client_sock:
client_sock.sendall(RESPONSE)
此时如果你运行代码,然后在浏览器里访问 http://127.0.0.1:9000 ,你会看到字符串 “Hello!” 。不幸的是,服务器发送了这个响应后就立即结束了,所以刷新浏览器就会报错。下面修复这个问题:
RESPONSE = b"""\
HTTP/1.1 200 OK
Content-type: text/html
Content-length: 15
<h1>Hello!</h1>""".replace(b"\n", b"\r\n")
with socket.socket() as server_sock:
server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_sock.bind((HOST, PORT))
server_sock.listen(0)
print(f"Listening on {HOST}:{PORT}...")
while True:
client_sock, client_addr = server_sock.accept()
print(f"New connection from {client_addr}.")
with client_sock:
client_sock.sendall(RESPONSE)
此时,我们就拥有了一个web服务器,它可以运行一个简单的HTML网页,一共才25行代码。这还不算太遭!
一个文件服务器
我们继续扩展这个HTTP服务器,让它可以处理硬盘上的文件。
请求抽象
在修改之前,我们需要能够读取并且解析来自客户端的请求。因为我们已经知道,请求数据是由一系列的行表示,每行由\r\n
分隔,让我们编写一个生成器函数,它可以读取套接字中的数据,并且解析出每一行的数据:
import typing
def iter_lines(sock: socket.socket, bufsize: int = 16_384) -> typing.Generator[bytes, None, bytes]:
"""Given a socket, read all the individual CRLF-separated lines
and yield each one until an empty one is found. Returns the
remainder after the empty line.
"""
buff = b""
while True:
data = sock.recv(bufsize)
if not data:
return b""
buff += data
while True:
try:
i = buff.index(b"\r\n")
line, buff = buff[:i], buff[i + 2:]
if not line:
return buff
yield line
except IndexError:
break
以上代码看上去有点困难,实际上,它只是从套接字中尽可能的读取数据,将它们放到一个缓冲区里,不断得将缓冲到的数据拆分成单独的行,每次给出一行。一旦它发现一个空行,它就会返回提取到的数据。
使用iter_lines
,我们可以开始打印出从客户端读取到的请求:
with socket.socket() as server_sock:
server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_sock.bind((HOST, PORT))
server_sock.listen(0)
print(f"Listening on {HOST}:{PORT}...")
while True:
client_sock, client_addr = server_sock.accept()
print(f"New connection from {client_addr}.")
with client_sock:
for request_line in iter_lines(client_sock):
print(request_line)
client_sock.sendall(RESPONSE)
此时如果你运行代码,然后在浏览器里访问 http://127.0.0.1:9000 ,你会在控制台里看到以下内容:
Received connection from ('127.0.0.1', 62086)...
b'GET / HTTP/1.1'
b'Host: localhost:9000'
b'Connection: keep-alive'
b'Cache-Control: max-age=0'
b'Upgrade-Insecure-Requests: 1'
b'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.167 Safari/537.36'
b'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8'
b'Accept-Encoding: gzip, deflate, br'
b'Accept-Language: en-US,en;q=0.9,ro;q=0.8'
相当整齐!让我们抽象出一个Request
类:
import typing
class Request(typing.NamedTuple):
method: str
path: str
headers: typing.Mapping[str, str]
现在,这个请求类只知道请求方法,路径,请求头,后续,我们继续支持查询字符串参数以及读取请求体。
为了封装逻辑需要构建一个请求,我们在Request类中增加一个类方法from_socket
:
class Request(typing.NamedTuple):
method: str
path: str
headers: typing.Mapping[str, str]
@classmethod
def from_socket(cls, sock: socket.socket) -> "Request":
"""Read and parse the request from a socket object.
Raises:
ValueError: When the request cannot be parsed.
"""
lines = iter_lines(sock)
try:
request_line = next(lines).decode("ascii")
except StopIteration:
raise ValueError("Request line missing.")
try:
method, path, _ = request_line.split(" ")
except ValueError:
raise ValueError(f"Malformed request line {request_line!r}.")
headers = {}
for line in lines:
try:
name, _, value = line.decode("ascii").partition(":")
headers[name.lower()] = value.lstrip()
except ValueError:
raise ValueError(f"Malformed header line {line!r}.")
return cls(method=method.upper(), path=path, headers=headers)
这里用到了iter_lines
函数,刚才我们在读取请求行时用过它。这里获取了请求方法和路径,然后读取每一个请求头并且进行转换。最终,它构建了一个Request
对象并返回了该对象。如果我们把它放到之前的服务器循环里,会像下面这样:
with socket.socket() as server_sock:
server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_sock.bind((HOST, PORT))
server_sock.listen(0)
print(f"Listening on {HOST}:{PORT}...")
while True:
client_sock, client_addr = server_sock.accept()
print(f"Received connection from {client_addr}...")
with client_sock:
request = Request.from_socket(client_sock)
print(request)
client_sock.sendall(RESPONSE)
如果你现在连接到服务器,你会看到如下信息:
Request(method='GET', path='/', headers={'host': 'localhost:9000', 'user-agent': 'curl/7.54.0', 'accept': '*/*'})
因为from_socket
在特定的情况下会抛出一个异常,如果你现在给出一个非法的请求,那么服务器就可能会宕机。为了模拟这种请求,你可以使用telnet
连接到服务器,然后发送一些伪造的数据:
> telnet 127.0.0.1 9000
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
hello
Connection closed by foreign host.
果然,这个服务器宕机了:
Received connection from ('127.0.0.1', 62404)...
Traceback (most recent call last):
File "server.py", line 53, in parse
request_line = next(lines).decode("ascii")
ValueError: not enough values to unpack (expected 3, got 1)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "server.py", line 82, in <module>
with client_sock:
File "server.py", line 55, in parse
raise ValueError("Request line missing.")
ValueError: Malformed request line 'hello'.
为了能够更加优雅地处理这种情况,我们使用try-except
包裹起对from_socket
的调用,然后当遇到有缺陷的请求时,就向客户端发送一个“400 Bad Request“响应:
BAD_REQUEST_RESPONSE = b"""\
HTTP/1.1 400 Bad Request
Content-type: text/plain
Content-length: 11
Bad Request""".replace(b"\n", b"\r\n")
with socket.socket() as server_sock:
server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_sock.bind((HOST, PORT))
server_sock.listen(0)
print(f"Listening on {HOST}:{PORT}...")
while True:
client_sock, client_addr = server_sock.accept()
print(f"Received connection from {client_addr}...")
with client_sock:
try:
request = Request.from_socket(client_sock)
print(request)
client_sock.sendall(RESPONSE)
except Exception as e:
print(f"Failed to parse request: {e}")
client_sock.sendall(BAD_REQUEST_RESPONSE)
如果我们再去尝试搞挂服务器,我们的客户端会得到一个响应,并且服务器会继续正常运行:
~> telnet 127.0.0.1 9000
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
hello
HTTP/1.1 400 Bad Request
Content-type: text/plain
Content-length: 11
Bad RequestConnection closed by foreign host.
现在我们准备开始实现处理文件的部分,首先,我们在定义一个默认的”404 Not Found“响应:
NOT_FOUND_RESPONSE = b"""\
HTTP/1.1 404 Not Found
Content-type: text/plain
Content-length: 9
Not Found""".replace(b"\n", b"\r\n")
#...
with socket.socket() as server_sock:
server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_sock.bind((HOST, PORT))
server_sock.listen(0)
print(f"Listening on {HOST}:{PORT}...")
while True:
client_sock, client_addr = server_sock.accept()
print(f"Received connection from {client_addr}...")
with client_sock:
try:
request = Request.from_socket(client_sock)
print(request)
client_sock.sendall(NOT_FOUND_RESPONSE)
except Exception as e:
print(f"Failed to parse request: {e}")
client_sock.sendall(BAD_REQUEST_RESPONSE)
此外,再增加一个“405 Method Not Allowed ”响应。我们将会只处理GET
请求:
METHOD_NOT_ALLOWED_RESPONSE = b"""\
HTTP/1.1 405 Method Not Allowed
Content-type: text/plain
Content-length: 17
Method Not Allowed""".replace(b"\n", b"\r\n")
我们来定一个SERVER_ROOT
常量和一个serve_file
函数,这个常量用于表示服务器处理哪里的文件。
import mimetypes
import os
import socket
import typing
SERVER_ROOT = os.path.abspath("www")
FILE_RESPONSE_TEMPLATE = """\
HTTP/1.1 200 OK
Content-type: {content_type}
Content-length: {content_length}
""".replace("\n", "\r\n")
def serve_file(sock: socket.socket, path: str) -> None:
"""Given a socket and the relative path to a file (relative to
SERVER_SOCK), send that file to the socket if it exists. If the
file doesn't exist, send a "404 Not Found" response.
"""
if path == "/":
path = "/index.html"
abspath = os.path.normpath(os.path.join(SERVER_ROOT, path.lstrip("/")))
if not abspath.startswith(SERVER_ROOT):
sock.sendall(NOT_FOUND_RESPONSE)
return
try:
with open(abspath, "rb") as f:
stat = os.fstat(f.fileno())
content_type, encoding = mimetypes.guess_type(abspath)
if content_type is None:
content_type = "application/octet-stream"
if encoding is not None:
content_type += f"; charset={encoding}"
response_headers = FILE_RESPONSE_TEMPLATE.format(
content_type=content_type,
content_length=stat.st_size,
).encode("ascii")
sock.sendall(response_headers)
sock.sendfile(f)
except FileNotFoundError:
sock.sendall(NOT_FOUND_RESPONSE)
return
serve_file
获得客户端套接字和一个文件的路径。然后它尝试解决真正文件的路径,这些文件位于SERVER_ROOT
,对于SERVER_ROO
之外的文件就返回“not found”。然后尝试打开文件,找到它的mime类型和大小(使用os.fstat
),接着构造响应头,然后使用sendfile
系统调用将文件写入套接字。如果在硬盘上找不到文件,就返回"not found"响应。
如果我们增加serve_file
,我们的服务器循环像这个样子:
with socket.socket() as server_sock:
server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_sock.bind((HOST, PORT))
server_sock.listen(0)
print(f"Listening on {HOST}:{PORT}...")
while True:
client_sock, client_addr = server_sock.accept()
print(f"Received connection from {client_addr}...")
with client_sock:
try:
request = Request.from_socket(client_sock)
if request.method != "GET":
client_sock.sendall(METHOD_NOT_ALLOWED_RESPONSE)
continue
serve_file(client_sock, request.path)
except Exception as e:
print(f"Failed to parse request: {e}")
client_sock.sendall(BAD_REQUEST_RESPONSE)
如果你增加一个文件www\index.html
,靠着server.py
文件,然后访问http://localhost:9000 ,你就会看到文件的内容。
尾声
这是Part 1。在Part 2中,我们将提取Server
和Response
的抽象,以及如何处理多个并发的请求。如果你想获得完整的源码,访问这里。
原文:WEB APPLICATION FROM SCRATCH, PART I
- *作者:*Bogdan Popa
- 译者:noONE
更多精彩内容,关注公众号SeniorEngineer: