My top piece of advice for aspiring data scientists is to learn about cloud computing.
All businesses are moving data workloads to the cloud.
If you are ever going to build an ML model and put it into production at a company, I almost guarantee it will involve either accessing or processing data on the cloud.
I have noticed that most job adverts for data science positions now include cloud skills as a core part of the job description. I highly recommend you invest the time to up-skill.
It doesn’t matter which cloud provider — AWS, GCP (Google Cloud Platform), or Azure.
Pick one and learn the basics of data storage, processing, analysis and ML workflows using the platform.
In my case, I landed with GCP.
One of the main reasons I love using GCP is the vast amount of easily digestible documentation and quality learning resources created by the Google Developer Advocate Teams.
In this post, I will share my top 10 ‘go to’ sources for information and inspiration when developing on Google Cloud. I have categorised these resources by: blogs, GitHub repositories, Documentation pages, YouTube channels, and content creators.
Almost every day I seem to find a new valuable source of information to help me understand a topic or implement a solution. I’m sure I must have missed some great resources — please comment if you think I have missed any!
Google Cloud runs a very active blog on their main website covering many Google Cloud related topics including both high-level trends as well as more detailed technical content.
The Developers & Practitioners blog is a subtopic within the main Google Cloud blog and focuses on detailed tutorials and demonstrations of GCP tooling.
The high quality blog posts are written by members of Google’s exceptional Developer Advocate team.
I find this blog particularly useful for learning about best practices.
There are also many excellent cheat sheets that are great to refresh your memory on different GCP tools . The ‘Google Cloud Products in 4 Words or Less’ blog post is great for getting up to speed with the GCP platform offerings.
In addition to the blog run on Google Cloud’s website, they regularly update their ‘community’ blog on Medium. This consists of a number of curated blog posts from Google Cloud employees as well as other practitioners in the field.
Similar to the Developers & Practitioners blog, this resource contains a number of in depth tutorials on various GCP use cases and best practices.
Checking this blog for new updates has become a staple of my morning routine.
3. GoogleCloudPlatform GitHub repo (⭐️ Top Resource)
Google Cloud maintains a very active GitHub account with hundreds of repositories containing tutorials, workshop materials and GCP tools.
This is a great resource for finding code snippets for already implemented solutions on GCP and observing coding best practices from Google engineers.
Many of these repositories are referenced in the GCP documentation, however, I recommend browsing through to see if there are any repositories related to the tools and languages you tend to work on.
My particular highlights include:
- Professional Services — Tutorials and code for common solutions faced by Google Cloud’s Professional Services team
- ML on GCP — Guides for various machine learning frameworks on Google Cloud
- MLOps on GCP — Demonstration of a number of design and code patterns for a variety of ML Engineering topics
- MLOps with Vertex AI — End-to-end MLOps process using Vertex AI platform
- ML Design Patterns — Source code and examples for O’Reilly’s ML Design Patterns book
- Data Science on GCP — Source code and examples for O’Reilly’s Data Science on Google Cloud Platform
- Training data analyst — Labs and demos for Google Cloud’s training courses
Google provides a number of client libraries to interact with its services using your language of choice.
Throughout the Google Cloud documentation, you will find various code snippets for each tool demonstrating how to use the API for your chosen client library.
These code snippets are actually available in the Google API source code hosted on Github — normally in a folder called samples. I find this really handy as you can access all these code samples in one place rather than having to navigate through various pages in the documentation trying to find that useful snippet you vaguely remember.
For example, here are all the code snippets for using BigQuery’s Python API.
Additionally, you can access the various tests for these samples Google uses to test these code samples. These snippets can be a useful reference when writing tests for your own functions based off these snippets.
5. Product Documentation Pages – Key Concepts
As you would expect, each product on Google Cloud has extensive documentation with explanations and step-by-step example tutorials.
In particular, each product documentation page has a section called ‘Concepts’.
The concepts pages explain the most important aspects of each product. They normally call out the key features and best practices. This is extremely helpful for getting up to speed quickly, particularly if you have used a similar tool before on another cloud provider.
For the most part, Google’s documentation is an excellent resource for understanding the key concepts for each GCP tool with useful examples.
For a long time, as the documentation pages were so comprehensive, I never even thought to look to see if there were any more resources to help me design solutions on Google Cloud.
That was until I stumbled across the Google Cloud Architecture Center.
In most cases, the data problem you are tackling has previously been solved (at least in part) by someone else. There is no need to reinvent the wheel.
The Google Cloud Architecture Center contains reference architectures, diagrams, design patterns, guidance, and best practices for building or migrating your workloads on Google Cloud.
Whenever designing a new solution on Google Cloud, I always check the Architecture Center for any reference architectures to see if there are any similar solutions already available and to get ideas on how these can be applied/adapted to my specific problem.
For example, there is a ready made solution for processing streaming time series data including anomaly detection with pub/sub and dataflow.
Many of these reference architectures also link to the tutorial sections in each part of the documentation.
Google Cloud Tech is Google’s main YouTube channel for developers and has over 700k subscribers. The channel features professionally produced videos from the Google Developer Advocate team introducing GCP tools and best practices as well as technical tutorial walkthroughs.
This channel is different to the other Google Cloud YouTube channels (Google Developers , Google Cloud ) as it focuses on in-depth technical material for practitioners. The other channels provide more of a ‘C-suite level’ view on trends and concepts which are less useful for the day to day implementation of solutions on GCP.
If I ever need a walk-through of a tool, this is normally my first port of call to get familiar with the main concepts and best practices.
Adventures in the cloud is a much smaller channel (only 2k subscribers) run by Yufeng Guo, who is also a Google Cloud Developer Advocate. Despite its size, it is a gold mine of content.
The channel mostly features long-form and real-time tutorials on machine learning and MLOps. Yufeng Guo runs through the tutorials in real-time, which doesn’t always go to plan, but this is a really valuable way to learn as you can see how to debug common issues as he comes across them.
I really enjoyed his video on Productionizing Machine Learning with AI Platform Pipelines .
Unfortunately, it looks like he has not uploaded any new content for a while, however, there is a lot of older content to keep you busy.
Finally, I want to highlight a few of my favourite personalities to follow who create great content in the Google Cloud space. I’m sure I have missed a lot of other excellent content creators, but here are three that keep coming up again and again with really useful material.
I have linked to their Medium blogs, however, I recommend you follow them on LinkedIn as well for regular updates and insights.
Lak is a Director of Data Analytics and AI Solutions at Google Cloud. He provides insightful thinking and clear tutorials mostly around BigQuery and machine learning.
Priyanka is a Developer Relation Engineer at Google and is very active on Google’s Cloud Tech YouTube channel as well as LinkedIn.
Priyanka creates content on topics across the whole of Google Cloud with many entertaining and informative videos.
She recently produced a great series explaining 13 reference architectures for Google Cloud.
In this post, I have shared my top 10 technical resources for developing with Google Cloud.
I have been really impressed with the quality of Google’s resources for developers which makes Google Cloud by far my favourite cloud provider to work with.
Many of these resources I only came across in the last few months, which makes me think there must be many more that I have overlooked. I would love to hear what your favourite resources are - I’m sure I have missed a few!
- Reproducible ML: Maybe you shouldn’t be using Sklearn’s train_test_split
- Event Driven Data Validation with Google Cloud Functions and Great Expectations
- The Best Way to Learn Vim
- Google Cloud Professional Cloud Architect Exam Notes (July 2021)
- How to Prepare and Pass the Confluent Certified Developer for Apache Kafka Exam
- Visualising Asset Price Correlations
- Improve Code Quality with Pre-commit