DocsGuidesConnect APIs - Bulk data

Connecting APIs - Bulk data

In this tutorial we’ll see how to connect to an API source that has some bulk data we want to bring into PromptQL.

This is what we’ll do:

  • We will set up a connector that has a DuckDB source
  • We will set up a job to load data from our API source

We’re loading data into DuckDB for this example, but you could load data into any database that has a supported connector (eg: PostgreSQL, MongoDB, ClickHouse). We’re going to use TypeScript to write a loading script to load data - but how you choose to load data is completely up to you.

Loading data from an external API

Add the hasura/duckduckapi connector

Once you’ve setup your project, add this connector:

ddn connector init github -i
> Hub connector hasura/duckduckapi (select hasura/duckduckapi from the list)
> Connector port XXXX (a random port will automatically be suggested)

Go to the connector directory and install dependencies

cd app/connector/github
npm install

Initialize a table and sample data

Open the file app/connector/github/index.ts and define your DuckDB schema there:

// ...
 
const connectorConfig: duckduckapi = {
  dbSchema: `
 
    -- Create repositories table with commonly needed fields
    DROP TABLE IF EXISTS repositories;
    CREATE TABLE repositories (
        id INTEGER PRIMARY KEY,
        name VARCHAR NOT NULL,
        description TEXT,
    );
 
    -- Sample data
    INSERT INTO repositories (id, name, description)
    VALUES (1, 'my-project', 'A sample repository');
 
  `,
  functionsFilePath: path.resolve(__dirname, './functions.ts'),
};
 
// ...

Add to project metadata

Once we create new entities in our sources, we need to get them into our project’s metadata. This allows the AI assistant to access that data via PromptQL.

# Grab the model definitions
ddn connector introspect github
 
# Check out what models are available to track. You'll see some sample ones which you can ignore for now.
ddn model list github
> ...
 
# Add the repositories model
ddn model add github repositories
 
# Build the metadata and restart your local environment
ddn supergraph build local

Restart the docker container:

ddn run docker-start

Try PromptQL

Head over to PromptQL playground and check if you’re able to interact with your repositories.

Open up the console:

ddn console --local

Head over to the PromptQL playground and try accessing your repositories:

> What repositories do I have?

>>> Certainly! I'll fetch the information about the repositories you have. Let me query the database to get that information for you.

>>> Query Plan:
>>> 1. Query the app.Repositories table to retrieve the id, name, and description of all repositories.
>>> 2. Order the results by the repository name for easier readability.
>>> 3. If no repositories are found, print a message indicating this.
>>> 4. If repositories are found, store them in an artifact for display and print the number of repositories found.

>>> I've retrieved the information about your repositories. Here's what I found:

>>> You have one repository in your account. This repository is named "my-project" and is described as "A sample repository".

>>> Is there any specific information about this repository that you'd like to know more about? Or would you like to perform any actions related to this repository?

Setup a job to continuously load data

Adding a job to load data can be done by kicking off an async task from our DuckDuckAPI connector.

Head over to app/connector/github/index.ts and add the following code right after the connector starts:

// import statements...
// schema initialization...
 
async function insertData() {
  const db = await getDB();
 
  setInterval(async () => {
    try {
      const timestamp = new Date().toISOString();
      await db.all(`
        INSERT INTO repositories (id, name, description) 
        VALUES (
          (SELECT COALESCE(MAX(id), 0) + 1 FROM repositories),
          'project-${timestamp}',
          'Automatically inserted at ${timestamp}'
        )
      `);
      console.log(`Inserted new repository at ${timestamp}`);
    } catch (err) {
      console.error('Error inserting data:', err);
    }
  }, 1000);
}
 
(async () => {
  const connector = await makeConnector(connectorConfig);
  start(connector);
 
  // Kick off an insert data job
  insertData();
})();

A real world example

The steps above help you get started by understanding how to setup DuckDB, how to get a connection to it and how to start inserting data into it that comes from another source.

In a production ready example, you’ll need to:

  1. Connect to another API securely
  2. Incrementally pull in updates after the initial sync is done
  3. Handle API rate limits
  4. Persist data incrementally
  5. Recover from failures and process restarts

Check out the code at PromptQL Github example and starting reading through the code at app/connector/github/index.ts & to see how to put together a real world Bulk Data from API connector!