CMPUT 404

Web Applications and Architecture

HTTP Lab



Description

Your task is to build an HTTP/1.1 compliant webserver and a basic HTTP client. This lab is designed to give you a hands-on, ground-up understanding of the basics of HTTP by implementing both the client and server sides.

The webserver will serve static content from the www directory in the same directory where you start the webserver. The client will be used to make HTTP requests to the server or any HTTP/1.1 server.

Getting Started

  1. Clone the GitHub classroom repository from eClass.
  2. Follow the instructions below to implement the client and server.

Collaboration

User Stories

Task 0: Get an Environment with a Recent Python3 Version

Make sure you have a working development environment with these instructions!

Task 1: Implement the HTTP Client

Client Requirements

To make your client communicate with the server, follow these steps: - Import the libraries needed (socket) - Create a Context Manager - Connect to server and send request - Receive and Process Responses

Import the python socket module

import socket

Connect to a remote socket at specified address

Use the socket module to create a new socket object. This object has an overloaded constructor. The two arguments to the constructor are the address family and the socket type. (i.e. socket.AF_INET and socket.SOCK_STREAM)

  1. socket.AF_INET : represents the address type (for IPv4, IPv6, and DNS names)
  2. socket.SOCK_STREAM : represents the socket type (TCP socket) Connect to a remote socket at specified address. This uses the socket.connect(address) method to connect with the host address. The address is a tuple that holds the host or IP address and the port number of the server
with socket.socket(socket.AF_INET,socket.SOCK_STREAM) as sock:
   sock.connect((HOST,PORT))
   print(f"connected to server at host {HOST} and port {PORT}")

Correctly parse URL

  1. Make sure it starts with http:// or stop with an error.
    • We do not support HTTPS in this lab.
  2. Separate the IP address, port, and path.
    • If the port is not specified, use the default port 80.
    • If the IP address is a IPv6 address, remove the square brackets around it.
  3. If it is a GET request, the params need to made into query parameters.
    • If it is a POST request the params should become the post body instead, and they should not be added to the query parameters.
  4. The path and query params must be correctly percent encoded.

Prepare your request

This would be correctly formatted to include the HTTP method, protocol, Host address, Content-Type and Content length and finally the data.

Pass your request byte as a parameter to socket.sendall function

Call the sendall function that accepts raw bytes to send the request to the server. Remember data is transmitted as bytes over the internet and not strings. To convert the request string message to a byte, you encode the request using the utf-8 encoding.

sock.sendall(request.encode("utf-8"))

Process and receive response

The "read_response" function reads the entire HTTP response from a server after a request has been made through a socket connection. It converts the socket into a file-like object (sock_file) using the makefile('rb') method, which allows the response to be read as raw bytes. The makefile() method simplifies the process of reading from and writing to the socket by providing file-like methods (read, write, etc.), making it easier to handle network communication. The function reads all the data from the server in one go and stores it in a byte string called response. Finally, it returns this byte string, containing the full response, including headers and any accompanying data, such as HTML content or binary files.

# Receive the response
   def read_response(self):
        response = b""
        with self.socket.makefile('rb') as sock_file:  
            response = sock_file.read()
        return response

Task 2: Implement the HTTP Server

Server Requirements

Getting your server to talk to a client involves the following steps

Import the libraries needed (socketserver and pathlib)

The socketserver provides simplicity for creating network servers in python. The pathlib module is useful for managing filesystem path

import socketserver
import pathlib

Create a custom server class to inherit from the TCPServer class

Create a custom server class and make it inherit from the socketserver.TCPServer class. This is the first step in setting up a server, There are four basic servers available within the socketserver module namely TCPServer, UDPServer,UnixStreamServer and UnixDatagramServer

class LabHttpTcpServer(socketserver.TCPServer):
   allow_reuse_address = True

Create the custom HttpHandler class

Create the custom HttpHandler class by inheriting the socketserver.StreamRequestHandler to handle incoming requests. We utilized the socketserver.StreamRequestHandler class, a subclass of socketserver.BaseRequestHandler because it provides implementation for rfile and wfile.

class LabHttpTCPHandler(socketserver.StreamRequestHandler)

Implement the handle method

This would involve some substeps

Receive and decode request

The rfile would be utilized to receive and decode the request sent from the client. Simply call the readline() method which calls the recv() method multiple times until a new character line is reached

# Receive and decode the request
       request_line = self.rfile.readline().strip().decode('utf-8')
Extract the method and path from the request line

The split method is used to divide the request into three different parts because the 2 ensure split is done only twice. The first line in a request_line always contains the HTTP METHOD PATH and HTTP_Version. For example GET /index.html HTTP/1.1 would be split into

GET
/index.html
HTTP/1.1

The underscore is a convention used for values that would be ignored in our case the HTTP version

# Extract the method and path from the request
       method, path, _ = request_line.split(' ', 2)
Extract the headers and store in a dictionary

We created a custom method to handle this. We split the line using the colon delimiter to ensure the split is done on the content type

def parse_headers(self):
       headers = {}
       while True:
           line = self.rfile.readline().strip().decode('utf-8')
           if not line:
               break
           key, value = line.split(":", 1)
           headers[key.strip()] = value.strip()
       return headers
Decide the action to take based on the Http method type

All other method other than GET are met with a 405 error

Start the server

Start the server by calling the serve_forever method and pass the HOST, PORT

def main():
   with LabHttpTcpServer((HOST,PORT),LabHttpTCPHandler) as server:
       print("server is starting")
       print("running")
       server.serve_forever() 

Task 3: Testing

Scenario 1 : Ensure that your custom client can send requests to your custom server and receive the expected response

You have implemented both a custom HTTP client and server. The server serves files from a directory and handles different types of HTTP requests (e.g., GET, POST). Use your custom client to send a GET request to your custom server to retrieve a specific file (e.g., index.html). Your client should receive the correct HTML content from the server, indicating that the file was successfully served. The response should include the correct status code (200 OK) and appropriate headers.

Scenario 2: Verify that your custom client can interact with standard servers, correctly sending requests and processing responses.

Your custom client needs to be tested against a well-known, standard server like Google's web server. Use your custom client to send a GET request to google.com. Most servers (such as google) will only return a redirect to force your client to use encryption (HTTPS). If you test against a standard server like google, the client should receive a response containing the redirect.

Here are some standard HTTP test servers that support unencrypted HTTP/1.1. Make sure the servers correctly understand the GET and POST requests your client is making. Note these are all IPv4 only, because IST. These servers should return 200 OK. They also respond with a description of the way the server interpreted the request, so you can check that the server is interpreting it correctly.

Scenario 3: Ensure that standard clients (web browsers, Postman, curl) can communicate with your custom server and receive the correct responses.

You want to verify that your custom server can handle requests from standard clients, such as Postman. Use Postman to send a GET request to your custom server to retrieve 'index.html'. Then, use a web browser and curl to perform the same test. Postman should receive the correct content, with appropriate status codes and headers.

Hints

Restrictions

Violation of the restrictions will result in a mark of zero.

Submission Instructions

Make sure you push to GitHub classroom BEFORE the deadline! You will not be able to push after that!

Submit a link to your repo in the form https://github.com/uofa-cmput404/w24-h0x-labsignment-http-yourgithubname on eClass. Do not submit a link to a branch, a file, or the clone URL.

If you do not submit a link to your repo on eClass on time using the correct format above, you will get a zero.

Tips