Salesforce
Important Capabilities
| Capability | Status | Notes | 
|---|---|---|
| Data Profiling | ✅ | Only table level profiling is supported via profiling.enabledconfig field | 
| Detect Deleted Entities | ❌ | Not supported yet | 
| Domains | ✅ | Supported via the domainconfig field | 
| Extract Tags | ✅ | Enabled by default | 
| Platform Instance | ✅ | Can be equivalent to Salesforce organization | 
| Schema Metadata | ✅ | Enabled by default | 
Prerequisites
In order to ingest metadata from Salesforce, you will need one of:
- Salesforce username, password, security token
- Salesforce username, consumer key and private key for JSON web token access
- Salesforce instance url and access token/session id (suitable for one-shot ingestion only, as access token typically expires after 2 hours of inactivity)
The account used to access Salesforce requires the following permissions for this integration to work:
- View Setup and Configuration
- View All Data
Integration Details
This plugin extracts Salesforce Standard and Custom Objects and their details (fields, record count, etc) from a Salesforce instance. Python library simple-salesforce is used for authenticating and calling Salesforce REST API to retrive details from Salesforce instance.
REST API Resources used in this integration
- Versions
- Tooling API Query on objects EntityDefinition, EntityParticle, CustomObject, CustomField
- Record Count
Concept Mapping
This ingestion source maps the following Source System Concepts to DataHub Concepts:
| Source Concept | DataHub Concept | Notes | 
|---|---|---|
| Salesforce | Data Platform | |
| Standard Object | Dataset | subtype "Standard Object" | 
| Custom Object | Dataset | subtype "Custom Object" | 
Caveats
- This connector has only been tested with Salesforce Developer Edition.
- This connector only supports table level profiling (Row and Column counts) as of now. Row counts are approximate as returned by Salesforce RecordCount REST API.
- This integration does not support ingesting Salesforce External Objects
CLI based Ingestion
Starter Recipe
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide.
pipeline_name: my_salesforce_pipeline
source:
  type: "salesforce"
  config:
    instance_url: "https://mydomain.my.salesforce.com/"
    username: user@company
    password: password_for_user
    security_token: security_token_for_user
    platform_instance: mydomain-dev-ed
    domain:
      sales:
        allow:
          - "Opportunity$"
          - "Lead$"
    object_pattern:
      allow:
        - "Account$"
        - "Opportunity$"
        - "Lead$"
sink:
  type: "datahub-rest"
  config:
    server: "http://localhost:8080"
Config Details
- Options
- Schema
Note that a . is used to denote nested fields in the YAML recipe.
| Field | Description | 
|---|---|
| access_token string | Access token for instance url | 
| api_version string | If specified, overrides default version used by the Salesforce package. Example value: '59.0' | 
| auth Enum | Default: USERNAME_PASSWORD | 
| consumer_key string | Consumer key for Salesforce JSON web token access | 
| ingest_tags boolean | Ingest Tags from source. This will override Tags entered from UI Default: False | 
| instance_url string | Salesforce instance url. e.g. https://MyDomainName.my.salesforce.com | 
| is_sandbox boolean | Connect to Sandbox instance of your Salesforce Default: False | 
| password string | Password for Salesforce user | 
| platform string | Default: salesforce | 
| platform_instance string | The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details. | 
| private_key string | Private key as a string for Salesforce JSON web token access | 
| security_token string | Security token for Salesforce username | 
| use_referenced_entities_as_upstreams boolean | (Experimental) If enabled, referenced entities will be treated as upstream entities. Default: False | 
| username string | Salesforce username | 
| env string | The environment that all assets produced by this connector belong to Default: PROD | 
| domain map(str,AllowDenyPattern) | A class to store allow deny regexes | 
| domain. key.allowarray | List of regex patterns to include in ingestion Default: ['.*'] | 
| domain. key.allow.stringstring | |
| domain. key.ignoreCaseboolean | Whether to ignore case sensitivity during pattern matching. Default: True | 
| domain. key.denyarray | List of regex patterns to exclude from ingestion. Default: [] | 
| domain. key.deny.stringstring | |
| object_pattern AllowDenyPattern | Regex patterns for Salesforce objects to filter in ingestion. Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True} | 
| object_pattern.ignoreCase boolean | Whether to ignore case sensitivity during pattern matching. Default: True | 
| object_pattern.allow array | List of regex patterns to include in ingestion Default: ['.*'] | 
| object_pattern.allow.string string | |
| object_pattern.deny array | List of regex patterns to exclude from ingestion. Default: [] | 
| object_pattern.deny.string string | |
| profile_pattern AllowDenyPattern | Regex patterns for profiles to filter in ingestion, allowed by the object_pattern.Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True} | 
| profile_pattern.ignoreCase boolean | Whether to ignore case sensitivity during pattern matching. Default: True | 
| profile_pattern.allow array | List of regex patterns to include in ingestion Default: ['.*'] | 
| profile_pattern.allow.string string | |
| profile_pattern.deny array | List of regex patterns to exclude from ingestion. Default: [] | 
| profile_pattern.deny.string string | |
| profiling SalesforceProfilingConfig | Default: {'enabled': False, 'operation_config': {'lower_fre... | 
| profiling.enabled boolean | Whether profiling should be done. Supports only table-level profiling at this stage Default: False | 
| profiling.operation_config OperationConfig | Experimental feature. To specify operation configs. | 
| profiling.operation_config.lower_freq_profile_enabled boolean | Whether to do profiling at lower freq or not. This does not do any scheduling just adds additional checks to when not to run profiling. Default: False | 
| profiling.operation_config.profile_date_of_month integer | Number between 1 to 31 for date of month (both inclusive). If not specified, defaults to Nothing and this field does not take affect. | 
| profiling.operation_config.profile_day_of_week integer | Number between 0 to 6 for day of week (both inclusive). 0 is Monday and 6 is Sunday. If not specified, defaults to Nothing and this field does not take affect. | 
| stateful_ingestion StatefulIngestionConfig | Stateful Ingestion Config | 
| stateful_ingestion.enabled boolean | Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_apiis specified, otherwise FalseDefault: False | 
The JSONSchema for this configuration is inlined below.
{
  "title": "SalesforceConfig",
  "description": "Base configuration class for stateful ingestion for source configs to inherit from.",
  "type": "object",
  "properties": {
    "env": {
      "title": "Env",
      "description": "The environment that all assets produced by this connector belong to",
      "default": "PROD",
      "type": "string"
    },
    "platform_instance": {
      "title": "Platform Instance",
      "description": "The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://docs.datahub.com/docs/platform-instances/ for more details.",
      "type": "string"
    },
    "stateful_ingestion": {
      "title": "Stateful Ingestion",
      "description": "Stateful Ingestion Config",
      "allOf": [
        {
          "$ref": "#/definitions/StatefulIngestionConfig"
        }
      ]
    },
    "platform": {
      "title": "Platform",
      "default": "salesforce",
      "type": "string"
    },
    "auth": {
      "default": "USERNAME_PASSWORD",
      "allOf": [
        {
          "$ref": "#/definitions/SalesforceAuthType"
        }
      ]
    },
    "username": {
      "title": "Username",
      "description": "Salesforce username",
      "type": "string"
    },
    "password": {
      "title": "Password",
      "description": "Password for Salesforce user",
      "type": "string"
    },
    "consumer_key": {
      "title": "Consumer Key",
      "description": "Consumer key for Salesforce JSON web token access",
      "type": "string"
    },
    "private_key": {
      "title": "Private Key",
      "description": "Private key as a string for Salesforce JSON web token access",
      "type": "string"
    },
    "security_token": {
      "title": "Security Token",
      "description": "Security token for Salesforce username",
      "type": "string"
    },
    "instance_url": {
      "title": "Instance Url",
      "description": "Salesforce instance url. e.g. https://MyDomainName.my.salesforce.com",
      "type": "string"
    },
    "is_sandbox": {
      "title": "Is Sandbox",
      "description": "Connect to Sandbox instance of your Salesforce",
      "default": false,
      "type": "boolean"
    },
    "access_token": {
      "title": "Access Token",
      "description": "Access token for instance url",
      "type": "string"
    },
    "ingest_tags": {
      "title": "Ingest Tags",
      "description": "Ingest Tags from source. This will override Tags entered from UI",
      "default": false,
      "type": "boolean"
    },
    "object_pattern": {
      "title": "Object Pattern",
      "description": "Regex patterns for Salesforce objects to filter in ingestion.",
      "default": {
        "allow": [
          ".*"
        ],
        "deny": [],
        "ignoreCase": true
      },
      "allOf": [
        {
          "$ref": "#/definitions/AllowDenyPattern"
        }
      ]
    },
    "domain": {
      "title": "Domain",
      "description": "Regex patterns for tables/schemas to describe domain_key domain key (domain_key can be any string like \"sales\".) There can be multiple domain keys specified.",
      "default": {},
      "type": "object",
      "additionalProperties": {
        "$ref": "#/definitions/AllowDenyPattern"
      }
    },
    "api_version": {
      "title": "Api Version",
      "description": "If specified, overrides default version used by the Salesforce package. Example value: '59.0'",
      "type": "string"
    },
    "profiling": {
      "title": "Profiling",
      "default": {
        "enabled": false,
        "operation_config": {
          "lower_freq_profile_enabled": false,
          "profile_day_of_week": null,
          "profile_date_of_month": null
        }
      },
      "allOf": [
        {
          "$ref": "#/definitions/SalesforceProfilingConfig"
        }
      ]
    },
    "profile_pattern": {
      "title": "Profile Pattern",
      "description": "Regex patterns for profiles to filter in ingestion, allowed by the `object_pattern`.",
      "default": {
        "allow": [
          ".*"
        ],
        "deny": [],
        "ignoreCase": true
      },
      "allOf": [
        {
          "$ref": "#/definitions/AllowDenyPattern"
        }
      ]
    },
    "use_referenced_entities_as_upstreams": {
      "title": "Use Referenced Entities As Upstreams",
      "description": "(Experimental) If enabled, referenced entities will be treated as upstream entities.",
      "default": false,
      "type": "boolean"
    }
  },
  "additionalProperties": false,
  "definitions": {
    "DynamicTypedStateProviderConfig": {
      "title": "DynamicTypedStateProviderConfig",
      "type": "object",
      "properties": {
        "type": {
          "title": "Type",
          "description": "The type of the state provider to use. For DataHub use `datahub`",
          "type": "string"
        },
        "config": {
          "title": "Config",
          "description": "The configuration required for initializing the state provider. Default: The datahub_api config if set at pipeline level. Otherwise, the default DatahubClientConfig. See the defaults (https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/graph/client.py#L19).",
          "default": {},
          "type": "object"
        }
      },
      "required": [
        "type"
      ],
      "additionalProperties": false
    },
    "StatefulIngestionConfig": {
      "title": "StatefulIngestionConfig",
      "description": "Basic Stateful Ingestion Specific Configuration for any source.",
      "type": "object",
      "properties": {
        "enabled": {
          "title": "Enabled",
          "description": "Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False",
          "default": false,
          "type": "boolean"
        }
      },
      "additionalProperties": false
    },
    "SalesforceAuthType": {
      "title": "SalesforceAuthType",
      "description": "An enumeration.",
      "enum": [
        "USERNAME_PASSWORD",
        "DIRECT_ACCESS_TOKEN",
        "JSON_WEB_TOKEN"
      ]
    },
    "AllowDenyPattern": {
      "title": "AllowDenyPattern",
      "description": "A class to store allow deny regexes",
      "type": "object",
      "properties": {
        "allow": {
          "title": "Allow",
          "description": "List of regex patterns to include in ingestion",
          "default": [
            ".*"
          ],
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "deny": {
          "title": "Deny",
          "description": "List of regex patterns to exclude from ingestion.",
          "default": [],
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "ignoreCase": {
          "title": "Ignorecase",
          "description": "Whether to ignore case sensitivity during pattern matching.",
          "default": true,
          "type": "boolean"
        }
      },
      "additionalProperties": false
    },
    "OperationConfig": {
      "title": "OperationConfig",
      "type": "object",
      "properties": {
        "lower_freq_profile_enabled": {
          "title": "Lower Freq Profile Enabled",
          "description": "Whether to do profiling at lower freq or not. This does not do any scheduling just adds additional checks to when not to run profiling.",
          "default": false,
          "type": "boolean"
        },
        "profile_day_of_week": {
          "title": "Profile Day Of Week",
          "description": "Number between 0 to 6 for day of week (both inclusive). 0 is Monday and 6 is Sunday. If not specified, defaults to Nothing and this field does not take affect.",
          "type": "integer"
        },
        "profile_date_of_month": {
          "title": "Profile Date Of Month",
          "description": "Number between 1 to 31 for date of month (both inclusive). If not specified, defaults to Nothing and this field does not take affect.",
          "type": "integer"
        }
      },
      "additionalProperties": false
    },
    "SalesforceProfilingConfig": {
      "title": "SalesforceProfilingConfig",
      "type": "object",
      "properties": {
        "enabled": {
          "title": "Enabled",
          "description": "Whether profiling should be done. Supports only table-level profiling at this stage",
          "default": false,
          "type": "boolean"
        },
        "operation_config": {
          "title": "Operation Config",
          "description": "Experimental feature. To specify operation configs.",
          "allOf": [
            {
              "$ref": "#/definitions/OperationConfig"
            }
          ]
        }
      },
      "additionalProperties": false
    }
  }
}
Code Coordinates
- Class Name: datahub.ingestion.source.salesforce.SalesforceSource
- Browse on GitHub
Questions
If you've got any questions on configuring ingestion for Salesforce, feel free to ping us on our Slack.