Skip to main content

Documentation Index

Fetch the complete documentation index at: https://openmetadata-codex-audit-docs-codebase-alignment.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

In this section, we provide guides and references to use the Google Pub/Sub connector.
Supported Authentication Types:
  • GCP Credentials — Google Cloud service account authentication using a service account key file or Application Default Credentials.
Configure and schedule Google Pub/Sub metadata workflows from the OpenMetadata UI:

How to Run the Connector Externally

To run the Ingestion via the UI you’ll need to use the OpenMetadata Ingestion Container, which comes shipped with custom Airflow plugins to handle the workflow deployment. If, instead, you want to manage your workflows externally on your preferred orchestrator, you can check the following docs to run the Ingestion Framework anywhere.

External Schedulers

Get more information about running the Ingestion Framework Externally

Requirements

The Google Cloud service account used for ingestion needs the following IAM permissions:

Metadata Ingestion

PermissionPurpose
pubsub.topics.listList topics in the project
pubsub.subscriptions.listList subscriptions (for dead letter detection and subscription metadata)
pubsub.subscriptions.getRead individual subscription details

Schema Registry (when schemaRegistryEnabled is true)

PermissionPurpose
pubsub.schemas.listList schemas in the Schema Registry
pubsub.schemas.getRead schema definitions (Avro, Protocol Buffer)
The built-in GCP role roles/pubsub.viewer grants all of the above permissions and is the recommended role for OpenMetadata ingestion.

Python Requirements

We have support for Python versions 3.9-3.11
To run the Google Pub/Sub ingestion, you will need to install:
pip3 install "openmetadata-ingestion[pubsub]"

Metadata Ingestion

All connectors are defined as JSON Schemas. Here you can find the structure to create a connection to Google Pub/Sub. In order to create and run a Metadata Ingestion workflow, we will follow the steps to create a YAML configuration able to connect to the source, process the Entities if needed, and reach the OpenMetadata server. The workflow is modeled around the following JSON Schema

1. Define the YAML Config

2. Run with the CLI

First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run:
metadata ingest -c <path-to-yaml>
Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, you will be able to extract metadata from different sources.