How to Save PySpark Dataframe to a Single Output File
Save a Spark dataframe to a single output csv file
Save a Spark dataframe to a single output csv file
Vim commands to search and replace in files across your entire project
Dealing with pytest plugins with conflicting command line arguments
Use pytest-mock to mock sending SMTP emails during your unittests
There is no obvious way to save the nicely formatted DataFrame show() string to a variable. But here is how you can do it
Python solutions for Advent of Code 2022
Top tips for using PyTest
How to unit test PySpark code using Pytest
Improve your data science projects with logging instead of using print statements
Using fixtures in parametrized pytest tests
Update your iPython configuration to automatically enable auto-reloading of modules
How to access you Google Search Console data using the API with Python
Using the Python Walrus operator for regular expression matching
And how it improved the code I write going forwards
What I have learned from starting my own developer blog
You can automate and extract information from Spotify using the web API
Deliver a clear message to your stakeholders by adding descriptive headlines to your matplotlib charts using plt.suptitle
Dremio is a powerful engine for querying data directly in the data lake without having to ingest it into a datawarehouse
Reproducibility is critical for successful ML projects. Sklearn’s train_test_split might not be as robust as you think
Building the model is just the start. In large enterprises governance, data quality and silos dramatically increase complexity
You can use Python string manipulation to extract information from GCS URIs. Two methods I use are Python’s ‘split’ method and regular expression lookups.
Learning Vim can be daunting. Being strategic with your learning process can improve your chances of persevering and succeeding
Using Alfred workflows to open your current browser tab in a new window
How to configure your terminal for data science to maximum productivity (oh-my-zsh, iterm2, tmux, Starship)
Setting up a new MacBook Pro for Data Science
Writing good software is about more than just technical ability
Writing effectively is a difficult but important skill for anyone in the workforce
Automate and reproduce your Macbook setup using Homebrew and Brewfile
How to write SQL-like window functions in Pandas using groupby and transform
Top tips for improving your Python code without needing to refactor. Implement these simple strategies and your future self (and colleagues!) will thank you!
Typing fast is a super power
Build trust with end users by automating data testing in the data lake using Cloud Functions and Great Expectations
Share your work with the world! Deploying a Voilà web app for free using Heroku
Improve loading performance of Voilà Python dashboard application
Bring your Jupyter notebooks to life in a Python dashboard web application
This book made me really think about the purpose of refactoring. It provides practical guidance for effective refactoring and advice for convincing stakeholders to allow you the time to refactor your code
Automate your development environment setup by installing Miniconda from the command line
Learn how to manage multiple git accounts on the same machine using SSH keys, SSH config and git config
A practical guide to the algorithms we use in our day to day life without even realising it
My notes after passing the Google Cloud Professional Cloud Architect Exam - July 2021
This book highlights the unique attributes which make humans special among the animal kingdom and the main events which have shaped our modern day consumer society
Ten great resources for Google Cloud data engineering and data science practitioners.
Which string formatting method should you be using in your data science workflow? This post summarises the different methods for string substitution and when you should be using each.
Pre-commit hooks tutorial to improve code quality and reduce bugs. Example .pre-commit-config.yaml file included!
Notes and resources for passing the Confluent Certified Developer for Apache Kafka exam
Use emojis to liven up your git repositories
How to plot Matplotlib subplots in a loop using numpy’s ravel method or Matplotlib’s plt.subplot method
Learn how to use NetworkX and Plotly to visualise relationships between different asset classes