Pythonic Purity - Techniques for Writing Clean Code

What is Python?

Python is a super popular dynamically-typed, interpreted, high-level language. It has gained massive popularity due to its clean syntax and readability. Python serves as a versatile tool for a wide range of applications, from data science and big data analytics to backend development. Its simplicity and conciseness make it a top choice for beginners. It is also the most popular choice for coding interviews. While it may seem straightforward at first glance, there's depth to it. As there's more than one way to do something in Python, it's important to understand the options and tradeoffs. I hope to share some of my knowledge with you and hope you learn at least one thing from this post.

PEP8 and Docstrings
Naming Conventions
Map/Filter vs. List Comprehension
Error Handling in Python
Using Logging over Print
Consistent Return Types
Benefits of Typing in Python
Enums in Python
Abstract Classes and Annotations
Using Data Classes
Handling Reference Types
Miscellaneous Tips

PEP8

PEP8 is the official style guide for Python, it makes sure code is consistently formatted and easy to read. Adhering to its conventions enhances code maintainability and collaboration, especially in large code bases. You ideally want the code of all contributors to a Python project's code to be identical in style. It's highly recommended to check out the full PEP8 guide and to employ linters or extensions in your editor to notify you of any PEP8 deviations. This is something not only applicable to Python but to all languages, it's important to have a consistent style across your codebase and most large scale projects have a style guide that they follow and enforce.

Naming Conventions

In Python, naming conventions play a pivotal role in ensuring clarity and readability. While PEP8 provides guidelines like using snake_case for variables and functions or CamelCase for classes, the essence of a good name goes beyond just the format. It's about the semantics. Descriptive variable names, even if they're longer, are preferred over short, ambiguous ones. For instance, calculate_total_price is more intuitive than a mere ctp, even if it's lengthier. The verbosity of long names don't matter much as modern IDEs and most text editors equipped with intellisense and auto-completion, typing long variable names no longer becomes a chore. Instead, they serve as self-documenting snippets, making the codebase more understandable and maintainable for both the author and collaborators.

tpib = 250.50 # // Bad
total_price = 250.50 # // Better (but ambiguous)
total_price_in_basket = 250.50 # Good

Map/Filter vs. List Comprehension

Both map/filter and list comprehensions are tools to transform or filter lists. While both are valid, list comprehensions are often more Pythonic and readable. With map and filter, you often use lambda functions, but with list comprehensions, the intent is clearer. But both are preferred over using for loops to iterate the list and adding to a new list when you are doing simple tasks without conditional logic. If there is some complex logic involved, use a named function instead of a lambda and you can continue using map/filter or list comprehensions but you can still use the for loop if you want to if it's more readable as that's the most important thing (along with consistency). Performance wise, map/filter is slightly faster than list comprehensions but the difference is negligible. If you ever needed to map or filter through a truly massive dataset, you should probably be using a library like numpy or pandas instead of Python's built-in functions as there libraries are optimized for these use cases and use cool techniques like vectorization.

numbers = [1, 2, 3, 4]
squared_with_map = list(map(lambda x: x*x, numbers))
squared_with_comprehension = [x*x for x in numbers]

even_with_filter = list(filter(lambda x: x % 2 == 0, numbers))
even_with_comprehension = [x for x in numbers if x % 2 == 0]

Error Handling in Python

Proper error handling ensures your program can cope with unexpected situations during its execution. The try/except/finally structure in Python offers an elegant mechanism for this. I strongly advise against using a blanket try/except without specifying exceptions, as it can mask unforeseen issues and lead to unpredictable behavior. Instead, catch specific exceptions and handle each accordingly. If you intend to catch multiple exceptions, you can group them in a tuple. The else block lets you execute code when no exceptions are raised, and finally ensures the enclosed code runs irrespective of whether an exception occurs.

You can also signal your own exceptions using the raise keyword. This becomes particularly handy when you wish to offer more context or propagate a custom exception. For instance, if a caller provides an invalid argument, rather than returning a vague None or a flag value like -1, it's more informative to raise a ValueError. This approach clearly communicates the nature of the issue, especially compared to a generic Exception which might be too broad. If callers neglect to handle such flag values or defaults, it can sow seeds for hidden bugs. Therefore, it's often more prudent to raise an exception and leave it to the caller to handle it properly.

def extract_integer_from_file(filename, line_number):
    try:
        with open(filename, 'r') as file:
            lines = file.readlines()
            if line_number > len(lines) or line_number < 1:
                raise IndexError(f"Requested line number {line_number} exceeds file length or is less than 1.")
            value = int(lines[line_number - 1].strip())
            return value
    except FileNotFoundError:
        raise FileNotFoundError(f"The file {filename} was not found.") from None
    except ValueError:
        raise ValueError(f"Line {line_number} in {filename} does not contain a valid integer.") from None
    except Exception as e:
        raise Exception(f"An unexpected error occurred: {str(e)}") from None
    finally:
        print(f"Processed {filename}.")

Using Logging over Print

The ubiquitous print function serves as an critical tool when you spam them in every line trying to figure out what went wrong in your program in your tiny console window. When constructing production applications or systems, relying solely on print can lead to limitations. This is where Python's logging module comes to the rescue, offering a comprehensive mechanism to capture, display, and store messages in a versatile manner and it's easy to refactor to change the output destination whether that be an Obervabillity service like CloudWatch or just a local file.

With logging:

Set Levels: Control the granularity of messages. For instance, you might only want to see error messages and not debug messages.

Direct Outputs: Route logs to various destinations like files, sockets, or even third-party services.

Format Messages: Customize log message format to include information such as timestamps, log level, or even module details.

Here's a simple example of how I like to use logging in my Python programs:

import logging

# Basic configuration: Level and format
logging.basicConfig(level=logging.INFO,
                    format='%(asctime)s - %(levelname)s - %(message)s')

# Different log levels
logging.debug("This is a debug message.")
logging.info("This is an informational message.")
logging.warning("This is a warning.")
logging.error("This is an error message.")
logging.critical("This is a critical error message.")

# Output:
# Logs for debug won't be displayed as the default level severity we set above was INFO+
# 2023-10-04 16:34:23,321 - INFO - This is an informational message.
# 2023-10-04 16:34:23,321 - WARNING - This is a warning.
# 2023-10-04 16:34:23,321 - ERROR - This is an error message.
# 2023-10-04 16:34:23,321 - CRITICAL - This is a critical error message.

Consistent Return Types

Functions should consistently return the same type. This predictability minimizes errors. Even if Python allows multiple return types, it's best to avoid this practice. For instance, a function designed to return a list should return an empty list instead of another type. This consistency is crucial when collaborating on a codebase. In dynamically typed languages like Python, inconsistent returns can cause issues. In contrast, statically typed languages like Java enforce specific return types. If a return type might be None, ensure the caller knows to handle it. If there's an error, prefer raising an exception over returning None.

def get_user_data(user_id):
    # This function should always return a dictionary.
    if not user_id:
        raise ValueError("Invalid user_id provided.")

    user_data = user_data_store.get(user_id)

    if not user_data:
        raise Exception(f"No data found for user_id: {user_id}")

    return user_data

Benefits of Typing in Python

Python introduced type hints which can make your code more readable and help with debugging. While Python remains dynamically typed, type hints provide insights about expected types. They are quite common in large Python codebases and many modern packages come with Python types. Modern IDEs and text editors can also leverage these hints to provide better intellisense and auto-completion. While type hints are optional, they can be quite useful. They allow for gradual adoption so you aren't forced to refactor every file to have types and you can slowly add them to your codebase which helps with adoption.

"""Primitive types don't need to be imported"""
from typing import List, Dict, Union

# Type hints with primitive types
def add_numbers(a: int, b: int) -> int:
    return a + b

def get_full_name(first_name: str, last_name: str) -> str:
    return f"{first_name} {last_name}"

"""Type hints with list and dict types"""
def total_prices(prices: List[float]) -> float:
    return sum(prices)

def get_student_grade(grades: Dict[str, Union[int, float]], student_name: str) -> Union[int, float, None]:
    return grades.get(student_name)

More Advanced Topics - OOP in Python

Enums in Python

Enums are a way to represent a group of finite and related constants. They make your code more descriptive and readable compared to using plain constants.

from enum import Enum
class  Colors(Enum):
	RED = 1
	GREEN = 2
	BLUE = 3

def describe_color(c: Color):
    if c == Color.RED:
        return "It's passionate!"
    elif c == Color.GREEN:
        return "It's calming."
    elif c == Color.BLUE:
        return "It's cool."
    else:
        raise ValueError("Invalid color provided.")
		# In a statically typed language, you wouldn't need this as the compiler would throw an error if you tried to pass an invalid color

Abstract Classes

Abstract classes define a set of methods that must be implemented by its subclasses. This can achieve behavior similar to Java's Interfaces and Abstract classes. They serve as a blueprint for other classes, ensuring certain methods are implemented in child classes. They act as a foundational structure that enforces specific contracts while possibly providing some basic implementation.

Why Use Abstract Classes?

Enforcing Method Implementation: Abstract classes allow you to define methods that must be implemented by any child classes. This ensures that specific "contracts" are met by subclasses.

Providing Shared Implementations: While enforcing method implementation, abstract classes can also provide shared method implementations that subclasses can use without redefining.

Using Abstract Classes: Python's abc module provides the tools to create abstract classes.

from abc import ABC, abstractmethod
class Shape(ABC):

    @abstractmethod
    def area(self) -> float:
        pass

    @abstractmethod
    def perimeter(self) -> float:
        pass

    """Concrete method"""
    def describe(self):
        return "This is a shape."

Abstract Class as Interface and Abstract Class: In Java, you have both interfaces (pure method declarations) and abstract classes (mixed method declarations and implementations). Python's abstract classes combine these two concepts: Python's implementation is similar to C++'s pure and non-pure virtual functions.

class Drawable(ABC):

    @abstractmethod
    def draw(self):
        pass

class Square(Shape, Drawable):

    def __init__(self, side_length):
        self.side_length = side_length

    def area(self):
        return self.side_length ** 2

    def perimeter(self):
        return 4 * self.side_length

    def draw(self):
        print(f"Drawing a square with side length {self.side_length}")

Using Data Classes

Python's dataclass provides a decorator and functions for creating classes especially used for storing data attributes. It greatly reduces boilerplate code and makes your code more readable and integrates well with type hints. You can also customize the behavior of data classes by overriding the default or with the help of decorators, with a pretty important decorator option being frozen which makes the data class immutable. You can also attach methods to data classes with the help of decorators like @property and @staticmethod similar to how you would in a normal class.

These classes can automatically provide special methods like __init__ and __repr__.

from dataclasses import dataclass
class TraditionalTransaction:
    id: int
    user: str
    amount: float
    currency: str

    def __init__(self, id: int, user: str, amount: float, currency: str):
        self.id = id
        self.user = user
        self.amount = amount
        self.currency = currency

    def __repr__(self):
        return f"Transaction({self.id}, {self.user}, {self.amount}, {self.currency})"

@dataclass(frozen=True) # Optional frozen makes the data class immutable
class Transaction:
    id: int
    user: str
    amount: float
    currency: str

	"""This decorator lets you call a function like it's a field (getter method)"""
    @property
    def amount_with_currency(self) -> str:
        return f"{self.amount}-{self.currency}"

	@fullname.setter
	def amount_with_currency(self, value: str):
		amount, currency = value.split("-")
		self.amount, self.currency = int(amount), currency

	@staticmethod
	def is_american(age):
    	return self.currency == "USD"

Handling Reference Types

Python variables hold references. When you return data structures like lists or dicts from a function, you might accidentally expose the internals of a class. To prevent unintended modifications, return copies or use immutable structures. Making an object frozen basically set's the setattr method to raise an exception if you try to modify the object.

def get_groceries(self) -> List[str]:
    return self._groceries.copy()

def get_basket(self) -> FrozenSet[str]:
	return frozenset(self._data)

Miscellaneous Tips

Enumerate: When you need both the index and value from a list, use enumerate.

for index, value in  enumerate(my_list):
	print(index, value)

State Modifications When you need to modify the state an object, the function should not take in the object but instead be a method on the called object.

Guard Clauses: Use them for early returns to reduce nested conditions. This makes the code more readable and reduces indentation levels. This is very much an opinionated topic and some people prefer to use nested if statements but I personally prefer guard clauses as it makes the code more readable and reduces indentation levels so go with what your project's style guide says. This works best when you have an if/else and not if/elif/else as you can't just flip the first condition and return early.

"""I don't like this"""
def process_order(order):
    if order.is_valid():
        deduct_inventory(order.items)
        total_price = calculate_total(order.items)
        charge_customer(order.customer, total_price)
        send_email(order.customer.email, "Order Confirmation", "Your order has been processed!")
        order.status = "Processed"
        return "Order processed successfully!"
    else:
        return "Invalid order!"

"""I prefer this"""
def process_order(order):
    if not order.is_valid():
        return "Invalid order!"

    deduct_inventory(order.items)
    total_price = calculate_total(order.items)
    charge_customer(order.customer, total_price)
    send_email(order.customer.email, "Order Confirmation", "Your order has been processed!")
    order.status = "Processed"
    return "Order processed successfully!"