Feast
Important Capabilities
| Capability | Status | Notes | 
|---|---|---|
| Descriptions | ✅ | Enabled by default | 
| Detect Deleted Entities | ✅ | Optionally enabled via stateful_ingestion.remove_stale_metadata | 
| Schema Metadata | ✅ | Enabled by default | 
| Table-Level Lineage | ✅ | Enabled by default | 
This plugin extracts:
- Entities as MLPrimaryKey
- Fields as MLFeature
- Feature views and on-demand feature views as MLFeatureTable
- Batch and stream source details as Dataset
- Column types associated with each entity and feature
CLI based Ingestion
Starter Recipe
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide.
source:
  type: feast
  config:
    # Coordinates
    path: "/path/to/repository/"
    # Options
    environment: "PROD"
sink:
  # sink configs
Config Details
- Options
- Schema
Note that a . is used to denote nested fields in the YAML recipe.
| Field | Description | 
|---|---|
| path ✅ string | Path to Feast repository | 
| enable_owner_extraction boolean | If this is disabled, then we NEVER try to map owners. If this is enabled, then owner_mappings is REQUIRED to extract ownership. Default: False | 
| enable_tag_extraction boolean | If this is disabled, then we NEVER try to extract tags. Default: False | 
| environment string | Environment to use when constructing URNs Default: PROD | 
| fs_yaml_file string | Path to the feature_store.yamlfile used to configure the feature store | 
| owner_mappings array | Mapping of owner names to owner types | 
| owner_mappings.map map(str,string) | |
| stateful_ingestion StatefulIngestionConfig | Stateful Ingestion Config | 
| stateful_ingestion.enabled boolean | Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_apiis specified, otherwise FalseDefault: False | 
The JSONSchema for this configuration is inlined below.
{
  "title": "FeastRepositorySourceConfig",
  "description": "Base configuration class for stateful ingestion for source configs to inherit from.",
  "type": "object",
  "properties": {
    "stateful_ingestion": {
      "title": "Stateful Ingestion",
      "description": "Stateful Ingestion Config",
      "allOf": [
        {
          "$ref": "#/definitions/StatefulIngestionConfig"
        }
      ]
    },
    "path": {
      "title": "Path",
      "description": "Path to Feast repository",
      "type": "string"
    },
    "fs_yaml_file": {
      "title": "Fs Yaml File",
      "description": "Path to the `feature_store.yaml` file used to configure the feature store",
      "type": "string"
    },
    "environment": {
      "title": "Environment",
      "description": "Environment to use when constructing URNs",
      "default": "PROD",
      "type": "string"
    },
    "owner_mappings": {
      "title": "Owner Mappings",
      "description": "Mapping of owner names to owner types",
      "type": "array",
      "items": {
        "type": "object",
        "additionalProperties": {
          "type": "string"
        }
      }
    },
    "enable_owner_extraction": {
      "title": "Enable Owner Extraction",
      "description": "If this is disabled, then we NEVER try to map owners. If this is enabled, then owner_mappings is REQUIRED to extract ownership.",
      "default": false,
      "type": "boolean"
    },
    "enable_tag_extraction": {
      "title": "Enable Tag Extraction",
      "description": "If this is disabled, then we NEVER try to extract tags.",
      "default": false,
      "type": "boolean"
    }
  },
  "required": [
    "path"
  ],
  "definitions": {
    "DynamicTypedStateProviderConfig": {
      "title": "DynamicTypedStateProviderConfig",
      "type": "object",
      "properties": {
        "type": {
          "title": "Type",
          "description": "The type of the state provider to use. For DataHub use `datahub`",
          "type": "string"
        },
        "config": {
          "title": "Config",
          "description": "The configuration required for initializing the state provider. Default: The datahub_api config if set at pipeline level. Otherwise, the default DatahubClientConfig. See the defaults (https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/graph/client.py#L19).",
          "default": {},
          "type": "object"
        }
      },
      "required": [
        "type"
      ],
      "additionalProperties": false
    },
    "StatefulIngestionConfig": {
      "title": "StatefulIngestionConfig",
      "description": "Basic Stateful Ingestion Specific Configuration for any source.",
      "type": "object",
      "properties": {
        "enabled": {
          "title": "Enabled",
          "description": "Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False",
          "default": false,
          "type": "boolean"
        }
      },
      "additionalProperties": false
    }
  }
}
Code Coordinates
- Class Name: datahub.ingestion.source.feast.FeastRepositorySource
- Browse on GitHub
Questions
If you've got any questions on configuring ingestion for Feast, feel free to ping us on our Slack.
Is this page helpful?
