Google CloudProfessional Machine Learning Engineer

Google Professional Machine Learning Engineer

The Google PMLE exam tests your ability to design and operate production machine learning systems — not just train models. It covers the full ML lifecycle: framing the problem, preparing data, training at scale, deploying models, monitoring for drift, and keeping the pipeline reliable. If you have only trained models in notebooks, this exam will expose the gaps.

14 min
4 sections · 6 exam key points

ML Problem Framing and Data Strategy

Before building a model, frame the problem correctly: What is the prediction target? What labels do you have? What are the business metrics, and how do they relate to ML metrics (accuracy, AUC, RMSE)? Is this classification, regression, ranking, or generation? Can the problem be solved with rules or simple statistics instead of ML? Data strategy: structured data (Cloud SQL, BigQuery, Spanner), unstructured data (Cloud Storage for images/audio/video/text). Feature engineering in BigQuery ML or Vertex AI Feature Store (serves features consistently between training and serving — avoids training-serving skew). Data validation with TFX (TensorFlow Extended) DataValidationComponent detects schema drift and anomalies.

Vertex AI: Training and Model Registry

Vertex AI is GCP's unified ML platform. Custom Training: run training code in a managed container on CPUs, GPUs, or TPUs. Training pipelines in Vertex AI Pipelines (Kubeflow Pipelines SDK or TFX) orchestrate multi-step workflows with automatic caching and artifact tracking. Hyperparameter tuning: Vertex AI Vizier (Bayesian optimisation) explores the hyperparameter space more efficiently than grid or random search. Vertex AI Experiments tracks runs, parameters, and metrics for comparison. Model Registry: versioned model artefacts with aliases (production, staging, challenger) — separates model management from deployment. Distributed training: data parallelism (same model on multiple workers, each sees a different batch), model parallelism (split model layers across devices for models too large for one device). MirroredStrategy (single node, multiple GPUs) versus MultiWorkerMirroredStrategy (multiple nodes).

Model Deployment and Serving

Vertex AI Endpoints: deploy model versions with traffic splits for A/B testing and canary rollouts. Dedicated endpoints (always-on) versus Serverless prediction (autoscaling to zero). Online prediction: low-latency, single-record requests. Batch prediction: high-throughput, asynchronous, for scoring large datasets. Model optimisation for serving: quantisation (FP32 to INT8 reduces model size and improves latency with some accuracy trade-off), distillation (train a smaller student model to mimic a larger teacher), TensorRT or ONNX Runtime for GPU inference optimisation. Feature latency: pre-compute slow features offline, serve fast features online from Memorystore. Explainability: Vertex Explainable AI provides feature attributions using SHAP (Shapley values) or Integrated Gradients. Required for regulated industries and for debugging unexpected model behaviour.

MLOps: Pipelines, Monitoring, and Governance

MLOps maturity levels: Level 0 (manual, notebook-driven), Level 1 (automated training pipeline, triggered by schedule or data drift), Level 2 (full CI/CD for ML — code changes trigger pipeline, evaluation gates before promotion). Model monitoring: Vertex AI Model Monitoring detects training-serving skew (difference between training data distribution and live prediction input distribution) and prediction drift (change in model output distribution over time). Alerts trigger retraining pipelines. Data freshness: stale feature data degrades model performance before accuracy metrics detect it. Governance: Dataplex for data cataloguing and lineage, BigQuery Authorized Views for column-level access control on training data, Vertex AI Model Cards for model documentation. Privacy-preserving ML: differential privacy (add calibrated noise during training), federated learning (train on device, aggregate model updates centrally — no raw data leaves the device).

Key exam facts — Professional Machine Learning Engineer

  • Feature Store eliminates training-serving skew by serving the same feature transformations online and offline
  • Batch prediction is for scoring large datasets offline; online prediction is for real-time, low-latency single predictions
  • Vertex AI Model Monitoring detects skew (training vs. serving distribution) and drift (serving distribution changes over time)
  • MLOps Level 2 means every code change to a pipeline triggers automated retraining and evaluation before deployment
  • Vertex AI Vizier uses Bayesian optimisation — it converges on good hyperparameters faster than random or grid search
  • Integrated Gradients and SHAP provide feature attributions — required for explainability in regulated industries

Common exam traps

High model accuracy guarantees the model solves the business problem correctly

High accuracy does not mean the model is correct for the business problem — check that ML metrics align with business outcomes

Training-serving skew and model drift are the same phenomenon

Training-serving skew is not the same as model drift — skew is a pipeline bug (different transformations); drift is a real-world data change

Quantisation always preserves full model accuracy

Quantisation reduces model size and latency but can reduce accuracy — always benchmark before deploying a quantised model

Practice this topic

Test yourself on Google PMLE

JT Exams routes you to questions in your exact weak areas — automatically, after every session.

No credit card · Cancel anytime