I never used to worry too much about readability of my code.
Obviously, I cared that I could read it during development. But I never gave too much consideration to whether I would still be able to read it in 6 months time or if it was easily understandable for a colleague with no prior knowledge of the code base.
“Of course, I know what type of data that variable holds and what type of transformation that function is responsible for – it’s obvious, right!?"
Reality check: You will forget.
In contrast to Software engineering, as Data Scientists we tend to work quite independently on code. Yes, we work in teams but it is unusual for you to be working on the same code file as someone else and normally you will have individual responsibility and ownership over a particular part of the project.
It will usually just be one person responsible for maintaining a code section, rather than the collective responsibility from multiple team members. This tends to result in project code bases with inconsistent code formatting and less focus on readability and maintainability by others.
Since the beginning of my career I have mostly been working in consulting roles for ‘greenfield’ projects. These projects generally involved writing new code that only I was responsible for. It was only after a couple of years that I worked on my first project that involved an inherited legacy code base.
This was when I was finally struck by the importance of writing readable code. Not just for yourself, but for developers on-boarded onto the project in the future. It also made me reflect on my own bad coding habits on previous projects and what I wished previous developers had added to their code to save me hours of scratching my head.
Luckily, in Python there are several simple cosmetic changes that make your code significantly more readable and manageable going forwards.
In this post I give my top 5 actionable tips you can apply while coding to improve the readability of your Python code, without needing to refactor.
Let’s get into it!
1. Use auto-formatters to standardise code
The lowest effort method to improve code readability is using auto code formatting tools to standardise your code.
There are a number tools in Python to help you with code formatting. Some of the most popular include: Black , isort , autopep8 and YAPF .
I personally use black
for code formatting and isort
for bringing order to my library import statements.
Defining and adhering to a consistent coding style/format throughout a project greatly increases the readability of your code for other developers.
Reading code with poor formatting is like reading an article with bad grammar. You can still understand the text but the meaning can be ambiguous or confusing.
Consistent formatting that adheres to PEP8 standards helps orientate developers when reading code. They know what to expect from the code layout, and can instead focus on the important bit – understanding the code logic.
Standardised and automated code formatting also helps version control.
Imagine the scenario where you create a new file which does not follow PEP8 conventions. Your coworker then checks out the file on version control and makes a change to one line of code. Upon saving the file, their IDE autoformats the original file to adhere to PEP8 (e.g. max line length 79 characters, remove unecessary blank lines between functions, extraneous whitespace between function arguments etc…). They then check the file back into the repository. There is now a problem. There will be a very large difference (‘diff’) between the old and updated code, even though only a very small change was made to the underlying code logic.
Standardised, automated code formatting reduces the risk of large diffs in version control that are irrelevant to the functionality of the code being changed.
How to use code formatters
You can run auto code formatters against your code from the command line. Alternatively, most IDEs (such as VSCode) have settings and plugins that will auto format your code upon saving your code.
For example, on the command line, you can apply the Black code style to your code and organise your imports with isort using the following:
# install the black and isort libraries
pip install black isort
# use black command line tool to format code
black SOURCE_FILE_OR_DIRECTORY
# organise import statements with isort
isort SOURCE_FILE_OR_DIRECTORY
Below is an extract from my VSCode settings.json
file which automatically applies black
formatting to my Python files upon saving files in VSCode. It also organises my imports in alphabetical order (similar to the functionality of isort
) to make imports statements more readable when there are many imported libraries or functions.
# vscode settings.json
{
"editor.formatOnSaveMode":"file",
"editor.formatOnSave":true,
"python.formatting.provider": "black",
"editor.codeActionsOnSave": { "source.organizeImports": true},
}
Note
While it is recommended to follow PEP8 coding conventions, there are no universally accepted Python code styling rules. You will notice variations between different projects as a result of the developing team’s own preferences.
It is vitally important that all members of your team working on the project are adhering to the same code styling rules. Therefore, it is best to be explicit about which style guide you are following.
You can describe the coding style your project uses in a
README.md
, but better still, you can explicitly define the style using configuration files in your project directory. Each of the tools mentioned above can automatically read from a configuration file to ensure all developers on the project are following the same rules.
2. Use code checking tools (linters) to catch bugs and follow best practices
Code checking tools, commonly referred to as linters , help highlight syntactical and stylistic problems with your Python code.
Linters help you to keep your code PEP8 compliant, catch some errors before running your code and warn against bad practices. For example:
- spotting unused variables or variables used before assignment
- identifying unused imported libraries
- ensuring all functions/classes have docstrings
- warning against using bare exceptions
In some cases, the errors highlighted by the linter are not really a problem. For example, pylint
suggests each Python module should have a docstring to describe it’s purpose. You might decide for your project that this is not necessary. You can control the behaviour of the linters using configuration files or inline comments
which can be set to ignore certain errors.
Once again, there are many linting tools for Python. The two most common are: flake8 and pylint .
I tend to use flake8
for my pre-commit hooks (see tip below). I also use Pylint during development to help catch issues and identify possible improvements, however, compared to flake8
, it is much stricter on seemingly benign issues which can make it impractical to use as the gatekeeper for committing new code to your repo.
Pylint also gives your code a rating out of 10 which is quite cool. Although it can hurt your ego when your fully working code initially gets rated a 3 out of 10 😞.
How to use linters to check your code
As above, linters can be run from the command line or built into your IDE.
# install flake8
pip install flake8
# run flake8 checks
flake8 SOURCE_FILE_OR_DIRECTORY
You can enable linting in VSCode using the settings.json
file. For example, using pylint:
# vscode settings.json
{
"python.linting.pylintEnabled":true,
"python.linting.enabled": true,
}
Top Tip 💡
Extend code checking tools to your Jupyter notebooks using the nbQA library.
If you use Jupyter notebooks, using code checking tools becomes even more important than simple
.py
files.By nature, work in Jupyter notebooks is experimental and can involve running cells in different orders when you are testing or debugging code. As a result, it can be very easy to accidentally leave unassigned variables, or use variables before they are defined. It is also hard to catch these errors manually before committing the notebook to your git repo.
nbQA runs the Python linting tools against your notebooks to help protect against broken notebooks being committed to your repository.
3. Static type hinting
Using type hinting in my code was a game changer for me.
I cannot emphasise enough how much the readability of your code improves when you start using type hinting . If you take anything away from this post, it should be to start using type hinting.
What is type hinting?
Python is a dynamically typed language . This means that you don’t need to explicitly declare the variable data type in your code. Data types are automatically assigned to variables based on the specific data structure passed to them when the code is run.
The lower syntactical overhead of dynamically typed languages can be more forgiving and appealing for beginners, however, it can lead to bad habits and less readable code particularly in larger projects.
Type hints in Python (version 3.6+) are optional comments which signal to the reader of your code what data structure/type is expected for a particular variable or function. E.g str
, list
, dict
, int
etc.
The Python interpreter ignores type hints. Your code will still run if you do not have type hints – or even if you have incorrectly specified type hints. The purpose of type hinting is purely for the benefit of the developer
Example
Take the following function as an example where type hints can be used to improve the developer experience.
def get_largest(items):
return max(items)
The function above is intended to return the maximum value in a list of numbers. Simple, right?
Now imagine you are a new developer to the project and see this code. How should you use this function?
When the function was originally written, I’m sure the author knew exactly what that function did and what it should be used for – it is very simple Python code. However, as a new developer there is ambiguity which could lead to the function being used incorrectly or getting unexpected results.
The biggest question in this example is: what data structure should I pass into the function argument called items
?
The max
Python keyword would work with many different data structures including: dictionaries, lists, strings, lists of strings, sets, tuples. The function above would not throw an error if any of these data structures were passed to it. However, only a list of integers (or perhaps floats) would give the result intended by the original developer.
Type hinting (and improved variable naming) can dramatically improve the readability and reduce the chance of errors further down the line:
# improved with type hints (python 3.9+ syntax)
def get_largest(input_list: list[int]) -> int:
"""Return largest number in a list"""
return max(input_list)
The snippet above uses the :
syntax to specify that the data structure expected by the function is a list of integers, and the ->
syntax indicates that the function will return an integer.
You can read more about the Python type hint syntax in the documentation
While there is still nothing preventing future developers accidentally passing a string or list of strings to this function, the type hints greatly aid the developer and signal how the function was intended to be used.
Type hinting tools
You can use tools like mypy and pylance (in VSCode) to check for suspected typing errors.
If you specify type hints, these tools will check for instances of that variable/function in your code and ensure the correct data structure is being used.
For example, in my VSCode editor below I have specified that the function expects a list of integers and returns an integer. VSCode has identified two issues with this. Firstly, I am returning a string from the function and not an integer. Secondly, when I try and use the function elsewhere I get a red line to warn me that I am passing a list of strings to the function instead of a list of integers.
These warnings can be very useful during development to help catch potential bugs quickly.
The relevant settings in my vscode settings.json
file are:
{
"python.linting.pylintEnabled":true,
"python.linting.enabled": true,
"python.analysis.typeCheckingMode": "strict",
}
Alternatively you can use mypy
as a command line utility to run static type analysis on your code:
#install mypy
pip install mypy
# run mypy
mypy SOURCE_FILE_OR_DIRECTORY
4. Automate everything with pre-commit hooks
As programmers, we love automating things. Code formatting and checking should be no exception.
You can automate code styling and checking using pre-commit
hooks. The pre-commit
framework allows you to specify which checks you want to run against your code before committing changes to your git repository. These checks are run automatically when committing the code. If any checks fail, the code is rejected until they are fixed.
This helps protect against bad code being added to your project history and ensures all code in the repo complies with your code styling guidelines.
Getting started with pre-commit hooks
Install pre-commit
pip install pre-commit
Add configuration file to your project directory (
.pre-commit-config.yaml
)# .pre-commit-config.yaml repos: - repo: https://github.com/pre-commit/pre-commit-hooks rev: v2.3.0 hooks: - id: trailing-whitespace - repo: https://github.com/psf/black rev: 19.3b0 hooks: - id: black
Install hooks
pre-commit install
After following these instructions, next time you try to commit files to the repo, the pre-commit hooks will run and ensure your code is compliant with the black
formatter. In the case above, pre-commit will also check for and remove any trailing whitespace in the files
Check out my article on pre-commit hooks
for a more detailed tutorial and my full template .pre-commit-config.yaml
file that I use for my Data Science projects.
5. Write useful documentation
We all want our code to live on beyond our input on the project. But it doesn’t matter how good your software is, if the documentation is not good, people will not use it.
It is very easy to skip on the documentation during development and is commonly the last thing that gets our attention at the end of the project.
While there is no point writing detailed documentation for a feature still under development – bad or incorrect documentation is worse than no documentation – there are some simple strategies that will help you keep on top of things without too much overhead. With better documentation you increase the chances of other people using your code and will make your life easier in the future.
In Python, documentation generally comes in three forms: inline comments, docstrings and external documentation (e.g. a README.md).
Inline Comments
In general, you should try to follow the inline comments guidelines set by PEP8 .
Writing inline comments in code is a surprisingly controversial topic . But I like to take the advice of Martin Fowler in his book, Refactoring :
“Comments should explain ‘why’ the code is doing something rather than ‘what’ it is doing” – Martin Fowler, Refactoring
What the code is doing should be self explanatory. However, why the developer chose to use that ‘magic’ number, unusual method or workaround in the code may not be obvious. This is where inline comments make a lot of sense.
I like to add inline comments where the intention of my code is not immediately obvious. A good example of this is when you are using HEX codes for colours – you are never going to remember the colour just by looking at the hex code.
Below is a snippet from my post on Visualising Asset Price Correlations where I used inline comments for this purpose.
def assign_colour(correlation: float) -> str:
"""Assign hex code colour string based on correlation value"""
if correlation <= 0.0:
return "#ffa09b" # red
else:
return "#9eccb7" # green
Doc strings
Doc strings are used to describe the operation of the function or class. They are defined using the three quotes syntax """
below the function declaration.
def add(a,b):
"""Add two numbers together"""
return a + b
Adding simple one line doc strings to your code functions, even if you think the function is trivial, goes a long way to improving readability and reducing ambiguity for future developers.
It is important to note that doc strings and inline/block comments are not interchangeable.
Unlike inline/block comments, doc strings can be accessible at code runtime using either the __doc__
dunder attribute or the built in help()
function. This means the doc string can be interpreted by your IDEs and displayed as hints while typing new code.
I notice programmers coming from other languages do not make proper use of Python’s doc string features. For example, using block comments above the Python function instead of using the doc string syntax:
# function to add two numbers together
def add(a,b):
return a + b
Not only is this not ‘Pythonic’ (see tip 1 – code formatting), but by writing comments like this you surrender your access to doc strings at runtime. As a result, you will not be able to view the function description when writing in your IDE or be able to use auto documentation tools like sphinx (see below).
Depending on the complexity of the function, a simple one-line doc string can be appropriate. Don’t add unnecessary detail if the function name and arguments provide enough descriptive detail in themselves.
Top Tip 💡
You can use tools like the Python Docstring Generator in VSCode to automatically generate doc string templates
External Documentation
The most detailed documentation should be reserved for the project README.md
.
The README should give general information to maintainers and users of the project about how to set up the environment, the purpose of the project, the project directory structure etc.
You can make use of README templates available on GitHub .
Additionally, if you have been documenting your code with doc strings you can automatically generate documentation in a beautiful user interface using tools like Sphinx .
👨🎓 Bonus: Read other people’s code
So this piece of advice does not relate directly improving your own Python code. However, one of the best ways to learn how to write readable and maintainable code is to read other people’s code.
You can get hold of other people’s code by looking at popular open source repositories on GitHub, or if you work in a team, by reviewing other people’s code.
By reading other people’s code, you will be able to spot things that frustrate you when trying to understand their code base. This could be poorly named variables, unclear data structures required for a function, insufficient documentation (e.g. no docstrings for functions, README) inconsistent or non PEP8 compliant code making it harder to read.
With this outside perspective looking at someone elses code, you can apply these learnings to your own code from the perspective of another programmer with no prior knowledge of the code base.
For example, it was only when I was on a project that inherited a legacy code base that I realised how useful type hints can be for helping other developers (and yourself!) to understand the data structures being passed around your code.
Conclusion
You can dramatically improve the readability and maintainability of your projects without needing to refactor any code logic, simply by utilising tools developed by Python’s open source community.
In my experience, you should assume that you will forget the purpose of every function or variable at some point in the future. Therefore, it is always better to be as explicit as possible when writing code the first time round (e.g. Writing descriptive doc-strings, adding explicit type hints etc.). Applying these strategies to your own Python projects will greatly improve the developer experience for collaborators, if not, it will at least make life easier for your future self.
My top five tips covered in this post include:
- use auto-formatters such as
black
andisort
- use code checkers such as
flake8
andpylint
- use type hints to remove ambiguity in your function arguments
- automate checking with
pre-commit
- write good documentation using Python doc strings
Applying these principles to your development will dramatically improve the maintainability of your code and make your life so much easier in the future!
Resources
- My template .pre-commit-config.yaml file I use as a base for most projects
- RealPython - Type hinting tutorial
- Arjan Codes YouTube Channel - software design thinking and clean code
- Writing Good Python Documentation
Further Reading
- Do programmers need to be able to type fast?
- Algorithms to Live By (Book Notes)
- Sapiens (Book Notes)
- Google Cloud Professional Cloud Architect Exam Notes (July 2021)
- Top 10 Technical Resources for Google Cloud
- How to Manage Multiple Git Accounts on the Same Machine
- Gitmoji: Add Emojis to Your Git Commit Messages!
- Event Driven Data Validation with Goolge Cloud Functions and Great Expectations