Connecting APIs - Bulk data
In this tutorial we’ll see how to connect to an API source that has some bulk data we want to bring into PromptQL.
This is what we’ll do:
- We will set up a connector that has a DuckDB source
- We will set up a job to load data from our API source
We’re loading data into DuckDB for this example, but you could load data into any database that has a supported connector (eg: PostgreSQL, MongoDB, ClickHouse). We’re going to use TypeScript to write a loading script to load data - but how you choose to load data is completely up to you.
Loading data from an external API
Add the hasura/duckduckapi connector
ddn update-cli --version v2.12.0-alpha.2
Once you’ve setup your project, add this connector:
ddn connector init github -i
> Hub connector hasura/duckduckapi (select hasura/duckduckapi from the list)
> Connector port XXXX (a random port will automatically be suggested)
Go to the connector directory and install dependencies
cd app/connector/github
npm install
Initialize a table and sample data
Open the file app/connector/github/index.ts
and define your DuckDB schema there:
// ...
const connectorConfig: duckduckapi = {
dbSchema: `
-- Create repositories table with commonly needed fields
DROP TABLE IF EXISTS repositories;
CREATE TABLE repositories (
id INTEGER PRIMARY KEY,
name VARCHAR NOT NULL,
description TEXT,
);
-- Sample data
INSERT INTO repositories (id, name, description)
VALUES (1, 'my-project', 'A sample repository');
`,
functionsFilePath: path.resolve(__dirname, './functions.ts'),
};
// ...
Add to project metadata
Once we create new entities in our sources, we need to get them into our project’s metadata. This allows the AI assistant to access that data via PromptQL.
# Grab the model definitions
ddn connector introspect github
# Check out what models are available to track. You'll see some sample ones which you can ignore for now.
ddn model list github
> ...
# Add the repositories model
ddn model add github repositories
# Build the metadata and restart your local environment
ddn supergraph build local
Restart the docker container:
ddn run docker-start
Try PromptQL
Head over to PromptQL playground and check if you’re able to interact with your repositories.
Open up the console:
ddn console --local
Head over to the PromptQL playground and try accessing your repositories:
> What repositories do I have?
>>> Certainly! I'll fetch the information about the repositories you have. Let me query the database to get that information for you.
>>> Query Plan:
>>> 1. Query the app.Repositories table to retrieve the id, name, and description of all repositories.
>>> 2. Order the results by the repository name for easier readability.
>>> 3. If no repositories are found, print a message indicating this.
>>> 4. If repositories are found, store them in an artifact for display and print the number of repositories found.
>>> I've retrieved the information about your repositories. Here's what I found:
>>> You have one repository in your account. This repository is named "my-project" and is described as "A sample repository".
>>> Is there any specific information about this repository that you'd like to know more about? Or would you like to perform any actions related to this repository?
Setup a job to continuously load data
Adding a job to load data can be done by kicking off an async task from our DuckDuckAPI connector.
Head over to app/connector/github/index.ts
and add the following code right after the connector starts:
// import statements...
// schema initialization...
async function insertData() {
const db = await getDB();
setInterval(async () => {
try {
const timestamp = new Date().toISOString();
await db.all(`
INSERT INTO repositories (id, name, description)
VALUES (
(SELECT COALESCE(MAX(id), 0) + 1 FROM repositories),
'project-${timestamp}',
'Automatically inserted at ${timestamp}'
)
`);
console.log(`Inserted new repository at ${timestamp}`);
} catch (err) {
console.error('Error inserting data:', err);
}
}, 1000);
}
(async () => {
const connector = await makeConnector(connectorConfig);
start(connector);
// Kick off an insert data job
insertData();
})();
A real world example
The steps above help you get started by understanding how to setup DuckDB, how to get a connection to it and how to start inserting data into it that comes from another source.
In a production ready example, you’ll need to:
- Connect to another API securely
- Incrementally pull in updates after the initial sync is done
- Handle API rate limits
- Persist data incrementally
- Recover from failures and process restarts
Check out the code at PromptQL Github example and starting reading through the code at app/connector/github/index.ts & to see how to put together a real world Bulk Data from API connector!