How to Save PySpark Dataframe to a Single Output File

Save a Spark dataframe to a single output csv file

August 27 2023  ·  6 min

How to search and replace across multiple files using Vim

Vim commands to search and replace in files across your entire project

How to use allure-pytest and allure-pytest-bdd in the same project

Dealing with pytest plugins with conflicting command line arguments

How to mock sending SMTP emails using PyTest

Use pytest-mock to mock sending SMTP emails during your unittests

February 28 2023  ·  13 min

How to save the output of PySpark DataFrame 'show' to a variable

There is no obvious way to save the nicely formatted DataFrame show() string to a variable. But here is how you can do it

Advent of Code 2022 Solutions

Python solutions for Advent of Code 2022

Top tips for using PyTest

Top tips for using PyTest

November 19 2022  ·  11 min

Unit testing PySpark code using Pytest

How to unit test PySpark code using Pytest

October 29 2022  ·  12 min

How to set up Logging for Python Projects

Improve your data science projects with logging instead of using print statements

Pytest: How to use fixtures as arguments in parametrize

Using fixtures in parametrized pytest tests

How to Always Enable Autoreloading of Modules in iPython

Update your iPython configuration to automatically enable auto-reloading of modules

Google Search Console API with Python

How to access you Google Search Console data using the API with Python

July 24 2022  ·  15 min

Python Walrus Operator: Regular Expression Matching Use Case

Using the Python Walrus operator for regular expression matching

June 15 2022  ·  4 min

What I Learned Optimising Someone Else's Code

And how it improved the code I write going forwards

🎂 One year of blogging: Growing the blog to 10k monthly visitors

What I have learned from starting my own developer blog

May 28 2022  ·  12 min

Export Your Spotify Playlist to a CSV File Using Python

You can automate and extract information from Spotify using the web API

Matplotlib: Make Impactful Charts by Adding Subtitles with plt.suptitle

Deliver a clear message to your stakeholders by adding descriptive headlines to your matplotlib charts using plt.suptitle

Deploying Dremio on Google Cloud (GKE)

Dremio is a powerful engine for querying data directly in the data lake without having to ingest it into a datawarehouse

Reproducible ML: Maybe you shouldn't be using Sklearn's train_test_split

Reproducibility is critical for successful ML projects. Sklearn’s train_test_split might not be as robust as you think

Why is Machine Learning Deployment so Difficult in Large Companies?

Building the model is just the start. In large enterprises governance, data quality and silos dramatically increase complexity

How to extract bucket and file name from a Google Cloud Storage URI with Python

You can use Python string manipulation to extract information from GCS URIs. Two methods I use are Python’s ‘split’ method and regular expression lookups.

February 24 2022  ·  7 min

The Best Way to Learn Vim

Learning Vim can be daunting. Being strategic with your learning process can improve your chances of persevering and succeeding

How to Open a Chrome Tab in a New Window with Alfred and Applescript

Using Alfred workflows to open your current browser tab in a new window

How to set up an amazing terminal for data science with oh-my-zsh plugins

How to configure your terminal for data science to maximum productivity (oh-my-zsh, iterm2, tmux, Starship)

Data Science Setup on MacOS (Homebrew, pyenv, VSCode, Docker)

Setting up a new MacBook Pro for Data Science

The Pragmatic Programmer (David Thomas and Andrew Hunt)

Writing good software is about more than just technical ability

On Writing Well (William Zinsser)

Writing effectively is a difficult but important skill for anyone in the workforce

Automate your Macbook Development Environment Setup with Brewfile

Automate and reproduce your Macbook setup using Homebrew and Brewfile

SQL-like Window Functions in Pandas

How to write SQL-like window functions in Pandas using groupby and transform

Five Tips to Elevate the Readability of your Python Code

Top tips for improving your Python code without needing to refactor. Implement these simple strategies and your future self (and colleagues!) will thank you!

Do programmers need to be able to type fast? Yes. But probably not for the reasons you are thinking

Typing fast is a super power

Event Driven Data Validation with Google Cloud Functions and Great Expectations

Build trust with end users by automating data testing in the data lake using Cloud Functions and Great Expectations

December 6 2021  ·  28 min

Voilà! Deploy your Jupyter Notebook Based Python Dashboard on Heroku (Part 3)

Share your work with the world! Deploying a Voilà web app for free using Heroku

November 5 2021  ·  8 min

Voilà! Optimising Python Dashboard Performance (Part 2)

Improve loading performance of Voilà Python dashboard application

October 27 2021  ·  10 min

Voilà! Interactive Python Dashboards Straight from your Jupyter Notebook (Part 1)

Bring your Jupyter notebooks to life in a Python dashboard web application

October 25 2021  ·  6 min

Refactoring (Martin Fowler)

This book made me really think about the purpose of refactoring. It provides practical guidance for effective refactoring and advice for convincing stakeholders to allow you the time to refactor your code

How to Install Miniconda from the Command Line (Linux/MacOS)

Automate your development environment setup by installing Miniconda from the command line

How to Manage Multiple Git Accounts on the Same Machine

Learn how to manage multiple git accounts on the same machine using SSH keys, SSH config and git config

August 30 2021  ·  10 min

Algorithms to Live By (Brian Christian)

A practical guide to the algorithms we use in our day to day life without even realising it

Google Cloud Professional Cloud Architect Exam Notes

My notes after passing the Google Cloud Professional Cloud Architect Exam - July 2021

July 29 2021  ·  10 min

Sapiens: A Brief History of Humankind (Yuval Noah Harari)

This book highlights the unique attributes which make humans special among the animal kingdom and the main events which have shaped our modern day consumer society

10 Resources Learning Data Science on Google Cloud

Ten great resources for Google Cloud data engineering and data science practitioners.

July 11 2021  ·  9 min

Which Python String Formatting Method Should You Be Using in Your Data Science Project?

Which string formatting method should you be using in your data science workflow? This post summarises the different methods for string substitution and when you should be using each.

Improve Code Quality with Git Hooks and Pre-commit

Pre-commit hooks tutorial to improve code quality and reduce bugs. Example .pre-commit-config.yaml file included!

How to Prepare and Pass the Confluent Certified Developer for Apache Kafka Exam

Notes and resources for passing the Confluent Certified Developer for Apache Kafka exam

June 26 2021  ·  8 min

Gitmoji Quick Start Tutorial - An Emoji Guide for Git Commit Messages!

Use emojis to liven up your git repositories

June 23 2021  ·  6 min

Matplotlib: Plotting Subplots in a Loop

How to plot Matplotlib subplots in a loop using numpy’s ravel method or Matplotlib’s plt.subplot method

Visualising Asset Price Correlations

Learn how to use NetworkX and Plotly to visualise relationships between different asset classes