Google Cloud: The Data-Driven Org’s Biggest Ally Today
The complexities of machine learning (ML) and data analysis keep data scientists and engineers working around the clock. Meanwhile, companies continue to search for increasingly sophisticated new use cases. Fortunately, the tools and environments available to make all this work easier continue to improve and expand in lockstep, making the data-driven organization much more achievable today.
Many data-driven organizations work with powerful but highly complex hybrid environments. These depend on many indispensable parts: automation, function integration, and process streamlining. Data scientists and engineers need solutions that integrate these different parts and provide more control, with as many tools as possible to minimize their operations and simplify management. With these solutions in place, they can shift their focus to improving the content of learning models and finding new data applications instead.
With its finger on the pulse of the data-driven economy for years, Google knows what data-centric organizations need. The company has bundled a variety of these solutions into different platforms. This article takes a in depth look at three of these environments: the unified MLOps platform Vertex AI, the application deployment platform Anthos, and the much-discussed concept for the modern data platform, Data Mesh.
Vertex AI: Controlling the Complete ML Lifecycle
Vertex AI, the former AI platform, consolidates all machine learning features in the cloud. “Vertex AI is for anyone who works with data, but it’s typically used by data engineers, data scientists, ML developers, and ML engineers,” explains Wouter Roosenburg, a customer engineer at Google.
Vertex AI brings many functions together that can be used in many different scenarios. Its tools can accelerate model rollout, simplify management, and perform MLOps. Most importantly, these functionalities can be monitored to ensure they are applied to the correct models. “Unlike DevOps, the MLOps model can still malfunction even if the code is correct. That’s simply because the data distribution can change,” says Roosenburg, “That’s why we want to detect deviations and permanently solve the training-induced skew.”
Vertex AI is suitable for four classes of data: images, tables, text, and video. With vertex AI, you create a dataset, train the model, roll it out to an endpoint and generate predictions. According to Roosenburg, Vertex AI particularly excels when you have many models and data sources. The platform covers the entire ML spectrum: data readiness, feature engineering, training and tuning, model serving, further analysis, edge, model monitoring, and model management.
Some companies, like banks and insurance providers, must be able to explain the results of their models to a regulator, so they need to be accompanied by detailed explanations. For this reason, the end-to-end cycle and sequence of steps are becoming increasingly important.
An important part is the functionality of training models and predictions. Vertex AI can provide pre-trained APIs for organizations that do not have their own training data. AutoML can create a model automatically depending on the organization’s needs. If your organization has its own data, that’s exactly what Rooseburg advises. “If, on the other hand, the performance or precision of the model is more important from the start, then I recommend a custom model,” he says.
But a rolled-out model is just the beginning. The backbone is the pipeline, and Vertex AI offers managed pipelines. Data scientists can then write their own code in the Jupyter notebook, and others are presented with a graphical representation. “In this way, you really see what’s going on,” says Roosenburg.
App Modernization With Anthos
Effective data deployment requires an efficient application environment. The hybrid cloud allows data to be processed in the most suitable location. But to do this optimally, the applications themselves must be as compact and portable as possible. Container technology, such as Docker and Kubernetes, has contributed significantly to this by providing an abstraction package of the application environment. Containers are much lighter than a full operating system and require only a fraction of the memory. They also boot faster. Kubernetes, in particular, is extremely rich in built-in functionality, making it well-suited for environments where application data must be constantly moved. Rushil Sharma, a hybrid cloud customer engineer at Google explains, “What Linux did for virtualization, Kubernetes is doing for containers.”
But containers also bring challenges, especially when managing multiple hosts. Manually setting up a Kubernetes environment requires many complex actions. With Anthos, you can develop modern hybrid and multi-cloud applications without being tied to a specific infrastructure. The platform allows you to launch clusters with one click and place all clusters and workloads in one overview. Google also keeps the clusters running automatically. Anthos is based on GKE, the Kubernetes engine that has been available since 2015. It is reliable, efficient and extremely powerful, partly because it runs directly in the Google Cloud.
“We have two modes of operation: the standard mode and the GKE Autopilot,” explains Sharma. “The GKE Autopilot manages the entire infrastructure of the cluster, including control, node pools, and the nodes themselves. The best mode depends on the use case, but you can choose the mode for each cluster. You don’t need a lot of knowledge about Kubernetes for clusters on autopilot. You can just click, and it starts up. By using it, you learn to manage Kubernetes,” he says.
In addition to working for all forms of on-premise infrastructure and cloud, Anthos also works with attached clusters, which integrate existing Kubernetes environments. Other features include Cloud Run for Anthos (serverless environment management), Anthos Config Management (for GitOps automation), and Anthos Service Mesh (routing and load balancing management). Anthos offers Hybrid AI, a training environment for AI and ML models that uses pre-trained AI models, and MLOps lifecycle management for data-driven organizations.
Data Mesh, the Modern Data Platform
Data mesh is a new way of thinking about how to use data to create business value.
Navin Goel, a customer engineer at Google, explains why: ”Different user groups work in different domains. On one side, you have the business units with their own teams and knowledge that generate data as a byproduct. On the other side, you have data science teams that are responsible for the data pipelines. And in the middle, you have the domain-agnostic teams. They have to manage the data warehouse and the data platform. There are good reasons for this split, but the problem is that knowledge is lost because the responsibility for the data is not with the teams that really understand it. If the data science teams want to set up a new use case, they need to know what data they can use. But the management team does not know the answer because it does not own the data. They are being fed data sources by users and getting requests from data scientists. And that’s where a bottleneck comes in.”
According to Goel, data mesh solves this problem by decentralizing ownership of the data and pipelines. The people, processes and technology are spread across all teams. “Instead of those vertical domains, you get cross-functional data teams with a data engineer, software engineer and the data owner,” he explains, “It’s about breaking down that monolithic structure into domain-specific services, but with data instead of services.”
With the data mesh, every team preserves its knowledge through its data engineer, who is responsible for the data pipelines. The software engineer’s role remains unchanged. Data is no longer a by-product, but a primary product, so a new role is added, the data product owner. “The data product owner must also sell it as a product, both to internal and external customers,” Goel explains, “The data remains available to the team, but the central teams continue to control who has access.”
This, of course, leads to a heavy burden of required knowledge within the organization. Scaling without hiring numerous additional data engineers is therefore critical. This is where technology comes into play. Goel explains: “Using serverless and separating storage makes infrastructure management much more efficient, so data engineers are really engaged in core activities. You want a managed service for hosting the data product. If you have two users today but a hundred users in the future, you do not want to be constantly managing the infrastructure to scale it up and down. The same is true for pipelines. Separating storage and computing power ensures that different teams can access the same data but perform their own actions on it.”
Google Cloud Integrates Services and Takes Over Management
By freeing data scientists and data engineers from routine tasks such as management and maintenance, the entire organization can take major steps in its data deployment. The possibilities are also becoming increasingly apparent, now that new platforms and concepts are appearing on the market that combines existing solutions and greatly simplify the way of working. Google is ahead of the pack on this point. Through Google Cloud, organizations get solutions and concepts that would otherwise be very complex and expensive to implement.