The ‘Walrus’ Operator
The headline feature of Python’s version 3.8 release
was the addition of the ‘assignment expression operator’ (:=
) – colloquially known as the ‘Walrus’ operator.
According to the Python docs:
"[The Walrus operator] assigns values to variables as part of a larger expression"
So… what does that mean? Basically it allows you to assign a variable and check it with a conditional at the same time, potentially cutting down a couple lines of code into one.
When I first saw the Walrus operator I thought, that’s kinda cool. Not sure I will use it that often though.
But recently I have found myself using it more and more as an elegant (in my opinion) solution for regular expression matching.
Walrus Operator Use Case: Regular Expression Matching
Let’s start with an example of regular expression matching without using the Walrus Operator. We will use a regular expression to extract the date (YYYYMMDD) and city name from an example file name.
If you are new to regular expressions, check out this website that walks through the syntax for regular expressions: https://regexone.com
Setup
# Example: extract date and city from filename
import re
FILENAME = "20220614_london.csv"
# regular expression pattern to capture the date and city
regex_pat = "(\d*)_(.*).csv"
Version 1: Regex only
# extract date and city from filename
matches = re.match(regex_pat, FILENAME)
# show extracted data
matches.groups()
('20220614','london')
This works fine but I would argue there are a couple issues.
Firstly, a problem arises if the supplied string (file name) does not match the expected pattern. In this scenario, the re.match
function will return None
because no matches would be found. When we then call matches.groups()
to display the extracted information, we will get an AttributeError
:
AttributeError: 'NoneType' object has no attribute 'groups'
The way to protect against this error is to add a conditional check on the matches
variable to check it has a value, before doing anything else with it:
Version 2: Regex with conditional
# improved example
matches = re.match(regex_pat, FILENAME)
# check matches is not None before calling matches.groups()
if matches:
print(matches.groups())
Further reading
I used this approach in another post about extracting information from Google Cloud Storage URIs. Link to article here
However, this leads to the second issue (although admittedly much less of a problem). We have had to add an extra line of code for the conditional. This seems unnecessarily verbose.
This is where the Walrus operator comes in handy. We can extract the information using the regular expression and check it is not None
in a single line of code.
Final version: Walrus Operator
# assignment and conditional in one go using := syntax
if matches := re.match(regex_pat, FILENAME):
print(matches.groups())
Conclusion
You can use the Walrus Operator to write more concise code. It is particularly handy for regular expression matching, or any time where you assign a variable using a function which could return None
.
Side Note ⭐️
While researching this article, I learned that you can actually explicitly name the capture groups in regular expressions using the
?P<...>
syntax. We can then access their values by name rather than by index value.Using the example above:
import re regex_pat = "(?P<date>\d*)_(?P<city>.*).csv" if matches := re.match(regex_pat, FILENAME): print(matches.group('date')) print(matches.group('city'))
Accessing the values by name rather than index value makes your regular expressions code even more readable , particularly if you have a long regex pattern with multiple capturing groups.
Happy coding!
Further Reading
- How to extract bucket and file name from a Google Cloud Storage URI with Python
- What I Learned Optimising Someone Else’s Code
- Data Science Setup on MacOS (Homebrew, pyenv, VSCode, Docker)
- Five Tips to Elevate the Readability of your Python Code
- Automate your MacBook Development Environment Setup with Brewfile
- SQL-like Window Functions in Pandas
- Gitmoji: Add Emojis to Your Git Commit Messages!
- Do Programmers Need to be able to Type Fast?
- How to Manage Multiple Git Accounts on the Same Machine