You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add Provider authentication for DataSources (#5899)
* Initial draft of datasource auth, from prompt:
> Modify the DataSourcesService to accept a `Provider` argument via the datasource Options field, and pass the provider to the data source constructor in BuildFromProtobuf.
>
> The feature should be controlled by two components: a flag called "authenticated_datasources" and a field on the DataSource protocol buffer message called `provider_auth` in the RestDataSource definition.
* Adjust rest datasource to use provider methods when available
* Add tests for datasources; improve coverage to 90%
* Address comments / lint errors
Adds documentation.
* Store additional datasource metadata in database
Copy file name to clipboardExpand all lines: docs/docs/understand/data_sources.md
+42-13Lines changed: 42 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,16 +12,18 @@ While providers in Minder typically create or manage entities (e.g., repositorie
12
12
- They do **not** create entities. Data sources only enhance an entity already known to Minder.
13
13
- They can reference external services—for instance, pulling in vulnerability data from OSV or ClearlyDefined or a malware scanning service.
14
14
- They have arguments that help shape the queries or requests the data source makes against external systems (e.g., specifying the package name, ecosystem, or version).
15
+
- They can leverage the authentication from the current Provider to fetch additional authenticated data after the initial ingestion.
15
16
16
17
---
17
18
18
19
### Why Would You Use a *data source*?
19
20
20
21
You would create a data source in Minder whenever you need additional information about an entity that was not included in the initial ingest. Common scenarios include:
21
22
22
-
-**Enriching dependencies**: If a provider ingests a list of dependencies from a repository, a data source can query a vulnerability database (like OSV or ClearlyDefined) to see if any are known to be risky *from a security or licensing point of view*.
23
-
-**Performing security checks**: A data source might call out to a malware scanner or an external REST service to verify the integrity of binaries or tarballs.
24
-
-**Fetching attestation data**: If you need statements of provenance or supply-chain attestations from a separate system, a data source can gather this data for your entity.
23
+
-**Followup queries**: In some cases, it may be necessary to fetch additional information to evaluate the state of the entity based on data from the initial ingestion. (For example, checking whether a workflow action has been passing after determining the relevant action.)
24
+
-**Enriching dependencies**: If a provider ingests a list of dependencies from a repository, a data source can query a vulnerability database (like OSV or ClearlyDefined) to see if any are known to be risky *from a security or licensing point of view*.
25
+
-**Performing security checks**: A data source might call out to a malware scanner or an external REST service to verify the integrity of binaries or tarballs.
26
+
-**Fetching attestation data**: If you need statements of provenance or supply-chain attestations from a separate system, a data source can gather this data for your entity.
25
27
-**Aggregating metadata from multiple sources**: For instance, combining ClearlyDefined’s scoring data with an internal database that tracks maintainers, deprecation status, or license data.
26
28
27
29
Essentially, data sources let Minder orchestrate external queries that feed into policy evaluations (e.g., Rego constraints) to create richer compliance, security, or operational checks.
@@ -32,14 +34,15 @@ Essentially, data sources let Minder orchestrate external queries that feed into
32
34
33
35
When you invoke a data source in a Rego policy, you typically provide a set of arguments. These arguments tell the data source *what* to fetch or *how* to fetch it.
34
36
35
-
For example, consider the YAML snippet below:
37
+
For example, consider the two YAML snippets below:
- `private_vuln_reporting`→ Fetches whether the repository has private vulnerability reporting enabled
84
-
- **endpoint**: A template URI with placeholders for `{owner}` and `{repo}`.
85
-
- **parse**: Indicates the response format (`json`).
86
-
- **input_schema**: Uses JSON Schema to define the parameters needed by this data source in Rego. If you specify `input_schema` incorrectly, you will receive an error at runtime, helping ensure that the data you pass in matches what the data source expects.
96
+
- **version / type / name**: Defines this resource as a data source called `ghapi`.
97
+
- **context**: Typically holds the project context. Here it’s `{}`, meaning it’s globally available (or within your chosen project scope).
98
+
- **rest**: Declares REST-based operations. If `providerAuth` is set to `true`, the provider's authentication mechanism will be used if the method's endpoint matches the provider's URL. Under `def`, we define three endpoints:
99
+
- `license`→ Fetches repository license info from GitHub
- `private_vuln_reporting`→ Fetches whether the repository has private vulnerability reporting enabled
102
+
- `graphql`→ Performs a GraphQL query
103
+
104
+
Each method defined in the rest endpoints has the following fields:
105
+
106
+
- **endpoint**: A [RFC 6570](https://tools.ietf.org/html/rfc6570) template URI with the supplied arguments (see [Using a data source in a Rule](#using-a-data-source-in-a-rule)).
107
+
- **method**: The HTTP method to invoke. Defaults to `GET`.
108
+
- **headers**: A key-value map of static headers to add to the request.
109
+
- **bodyobj**: Specifies the request body as a static JSON object.
110
+
- **bodystr**: Specifies the request body as a static string.
111
+
- **body_from_field**: Specifies that the request body should be produced from the specified argument. Objects will be converted to JSON representation, while strings will be used as an exact request body.
112
+
- **parse**: Indicates the response format (`json`). If unset, the result will be the body as a string.
113
+
- **input_schema**: Uses JSON Schema to define the parameters needed by this data source in Rego. If you specify `input_schema` incorrectly, you will receive an error at runtime, helping ensure that the data you pass in matches what the data source expects.
87
114
- *(Note: You can define additional properties as needed, but only fields explicitly handled by the data source code will be recognized.)*
115
+
- **expected_status**: Defines the expected response code. The default expected code is 200. If an unexpected response code is received, an error will be raised.
116
+
- **fallback**: If the request fails after 4 attempts and a fallback is defined, the specified **http_status** and **body** will be returned.
0 commit comments