Data Delivery Setup Guide
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude
Connect to Cursor
Install MCP server on Cursor
Connect to VS Code
Install MCP server on VS Code

Peach delivers your loan and portfolio data as Parquet files to Google Cloud Storage (GCS), from which you ingest into your preferred data platform. This guide walks you through configuring data delivery and connecting it to your environment.

Data delivery is provisioned as part of your Peach implementation. Your Customer Success contact will confirm your delivery configuration and access details.

What you get:

Data delivered as Parquet files (compressed with GZIP) to a Peach-managed GCS bucket: gs://peach-data-outbox/{client_name}/
Configurable delivery frequency: Daily (6 AM local), Twice Daily (6 AM and 6 PM local), or Hourly
Native compatibility with Snowflake, Databricks, BigQuery, AWS Athena, and most modern ETL tools

Data delivery pricing and frequency options are defined in your order form. Contact your Customer Success representative for details.

Setup Steps

Step	Action	Owner	Details
1	Confirm data delivery in your order form	Client + Peach CS	Select your delivery frequency (Daily, Twice Daily, or Hourly) and file format (Parquet).
2	Create service account and share with Peach	Client	Create a service account in your cloud environment (GCP, Azure, or AWS). Send the service account email address to your Peach contact. No keys or secrets are exchanged.
3	Peach grants access to GCS bucket	Peach	Peach adds your service account to the GCS bucket with read permissions. After this step, your data is accessible.
4	Configure ingestion pipeline	Client	Set up your ETL pipelines to pull data from the GCS bucket into your data warehouse. See platform-specific instructions below.
5	Validate data	Client	Confirm data completeness and accuracy in your environment before going live.

Which Instructions Apply to You?

The setup steps above are the same for all clients. The platform-specific instructions below depend on your data infrastructure. Navigate to the section that matches your data destination:

If your platform is not listed, contact your Peach technical representative.

Alternative Delivery Methods

In addition to the GCS-based options above, Peach also offers Snowflake-native data sharing:

Snowflake Reader Accounts: Access your data via a Peach-provided Snowflake Reader Account using Snowflake Secure Data Sharing. See Access Peach Data w/ Snowflake Reader Accounts for setup instructions.
Snowflake Direct Share: Peach shares your data directly to your existing Snowflake account via Snowflake Secure Data Sharing. See Access Peach Data w/ Snowflake Direct Share for setup instructions.

Databricks Delta

This section covers ingesting Parquet files from GCS into Databricks as Delta tables.

Prerequisites

GCP Project with Owner or Storage Admin permissions
Databricks workspace with cluster access
Databricks CLI installed
gcloud CLI installed

Install Required Tools

Install gcloud CLI: Follow Google's installation guide for your OS.
Authenticate gcloud:

gcloud auth login
gcloud auth application-default login

Install Databricks CLI: Follow Databricks CLI installation guide.
Authenticate Databricks CLI:

databricks configure --token
# Host: https://<YOUR-DATABRICKS-WORKSPACE-URL>
# Token: <YOUR-PAT>

Create GCP Service Account

# Create the service account
gcloud iam service-accounts create databricks-sa \
  --display-name="Databricks GCS Import SA"

# Create and download key
gcloud iam service-accounts keys create ./gcs-key.json \
  --iam-account=databricks-sa@$PROJECT_ID.iam.gserviceaccount.com

Hand-off Point: Send the service account email (databricks-sa@$PROJECT_ID.iam.gserviceaccount.com) to your Peach contact. Peach will grant read access to your GCS folder.

Store Credentials in Databricks Secret Manager

# Create a secret scope
databricks secrets create-scope --scope gcs-creds

# Upload the service account key
databricks secrets put-secret gcs-creds service-account.json \
  --string-value "$(cat ./gcs-key.json)"

Important: All secrets must live in Databricks Secret Manager. Do not embed credentials in notebooks.

Create Python Ingestion Script

Save this as gcs_to_delta.py:

# Load the JSON key from Secret Manager
key_json = dbutils.secrets.get("gcs-creds", "service-account.json")

# Write it to DBFS so Spark can pick it up
dbutils.fs.put("dbfs:/tmp/gcs-key.json", key_json, True)

# Read from GCS (replace {client_name} with your folder name)
df = spark.read.parquet(
    "gs://peach-data-outbox/{client_name}/transactions"
)

display(df)

# Write as a managed Delta table
df.write.format("delta") \
    .mode("overwrite") \
    .saveAsTable("default.transactions")

Deploy Databricks Job

Upload the script:

databricks fs mkdirs dbfs:/FileStore/scripts
databricks fs cp ./gcs_to_delta.py dbfs:/FileStore/scripts/gcs_to_delta.py

Create job_spec.json:

{
  "name": "GCS_to_Delta_Import",
  "tasks": [
    {
      "task_key": "run_ingestion_script",
      "new_cluster": {
        "spark_version": "17.0.x-scala2.13",
        "node_type_id": "e2-standard-16",
        "num_workers": 1,
        "spark_conf": {
          "spark.hadoop.fs.gs.auth.service.account.enable": "true",
          "spark.hadoop.fs.gs.auth.service.account.json.keyfile": "/dbfs/tmp/gcs-key.json"
        }
      },
      "spark_python_task": {
        "python_file": "dbfs:/FileStore/scripts/gcs_to_delta.py"
      }
    }
  ],
  "max_concurrent_runs": 1,
  "format": "MULTI_TASK",
  "timeout_seconds": 3600
}

Create and run the job:

databricks jobs create --json @job_spec.json
databricks jobs list
databricks jobs run-now <JOB_ID>

Verify

Run databricks jobs runs list --job-id <JOB_ID> to check status
In your Databricks workspace, navigate to Data > Tables
Verify that default.transactions exists and contains data

Azure Storage

This section covers transferring data from GCS to Azure Blob Storage using keyless authentication.

Prerequisites

Azure subscription with Owner or Contributor access
Azure Portal access

Create Azure Storage Account

In the Azure Portal, search for Storage accounts
Click + Create
Configure:
- Resource group: Choose existing or create new
- Storage account name: Choose a globally unique name (e.g., peachdatainbox)
- Region: Select a region close to your workloads
Click Review + Create → Create

Create a Container

Navigate into your new Storage Account
Under Data storage, select Containers
Click + Container
Configure:
- Name: e.g., peach-data
- Public access level: Private (no anonymous access)
Click Create

Register an Application in Microsoft Entra ID

In Azure Portal, search for Microsoft Entra ID → App registrations
Click + New registration
Enter:
- Name: gcs-to-azure-ingest
- Leave Redirect URI blank
Click Register
Save the Application (client) ID and Directory (tenant) ID

Grant Storage Permissions to the App

Go to your Storage Account
Select Access control (IAM)
Click + Add → Add role assignment
Configure:
- Role: Storage Blob Data Contributor
- Scope: Apply at the container level (preferred)
- Assign access to: User, group, or service principal
- Select: Your app registration (gcs-to-azure-ingest)
Save

Configure Federated Identity Credential

Return to your App registration
Go to Certificates & secrets → Federated credentials
Click + Add credential
Configure:
- Federated credential scenario: Other issuer
- Issuer: https://accounts.google.com
- Type: Explicit subject identifier
- Value: 117789079584147916823 (Peach's Google Service Account identifier)
- Name: peach-data
- Audience: api://AzureADTokenExchange
Save

Hand-off Point

Send the following to your Peach contact:

Item	Example
Storage account name	peachdatainbox
Container name	peach-data
Tenant ID	a1b2c3d4-e5f6-7890-abcd-ef1234567890
Client ID	12345678-abcd-ef90-1234-567890abcdef

Peach will configure a job to export your data into your Azure Storage Container.

Amazon S3 via Fivetran

This section covers using Fivetran to pull Parquet files from GCS into your Amazon S3 bucket.

Prerequisites

Fivetran account
Amazon S3 bucket owned by your organization
IAM roles and bucket policies configured

Set Up S3 Destination in Fivetran

Log in to Fivetran
Set up an Amazon S3 Data Lake destination following Fivetran's S3 destination guide
Configure IAM roles, encryption, and bucket policies as required

Create GCS Files Connector in Fivetran

Navigate to Connectors > Add Connector > Files > Google Cloud Storage
Configure:
- Bucket Name: (Provided by Peach)
- Folder Path: (Agreed upon with Peach)
- File Type: parquet
- Compression: gzip
After entering the bucket name, Fivetran displays a unique service account email: fivetran-connector-XXXXXX@fivetran-gcs-prod.iam.gserviceaccount.com

Hand-off Point

Send the Fivetran service account email to your Peach contact.

Note: Do not download or exchange any keys — this is a keyless setup.

Peach will grant the Fivetran service account roles/storage.objectViewer (read-only) access to your GCS folder.

Complete Connector Setup

After Peach confirms access is granted, return to Fivetran
Click Test Connection to verify connectivity
Set sync frequency to match your delivery cadence (e.g., every 15 minutes for hourly files)
Save and run the connector

Verify

Monitor the connector's status dashboard in Fivetran
Confirm rows processed and file ingestion success
Validate data appears in your S3 bucket as expected

Amazon S3 Push Delivery

This section covers configuring your AWS environment so that Peach can push data files directly into your Amazon S3 bucket. Unlike the Fivetran option above, this is a direct delivery — Peach writes files to your S3 bucket using OIDC federation. No static credentials are exchanged.

Peach assumes an IAM role in your AWS account using Web Identity Federation. Peach obtains a short-lived token and exchanges it for temporary AWS credentials via sts:AssumeRoleWithWebIdentity.

What Peach Provides

Before you begin, Peach will share the following:

Item	Description
GCP Service Account Email	`s3-file-delivery-sa@<project>.iam.gserviceaccount.com`
GCP Service Account Unique ID	A numeric identifier used in the IAM trust policy (not a secret)

File Organization

Files are delivered as compressed Parquet under a prefix you choose. The path structure mirrors the source layout:

s3://<your-bucket>/<prefix>/table_name/year=YYYY/month=MM/day=DD/hour=HH/batch=ID/file.parquet

For example, if you configure the prefix vendor/peach/, a delivery might look like:

s3://your-data-lake/vendor/peach/transactions/year=2026/month=04/day=03/hour=12/batch=202604031200/0000.parquet

Peach only writes objects under the prefix you designate. No reads, deletes, or writes outside that prefix are required.

Step 1: Create an S3 Bucket (or Use an Existing One)

If you do not already have a destination bucket:

In the AWS Console, navigate to S3 > Create bucket.
Choose a bucket name (e.g., your-company-data-lake) and region.
Under Object Ownership, select Bucket owner enforced (recommended — this disables ACLs and ensures all objects are owned by your account).
Leave other settings as default or adjust to your requirements.
Click Create bucket.

Step 2: Create an OIDC Identity Provider (One-Time per AWS Account)

Already have Google as a provider? In IAM > Identity providers, look for accounts.google.com. If it exists, verify that both sts.amazonaws.com and the service account unique ID (provided by Peach) are listed in its audiences. If so, skip to Step 3. If either audience is missing, click on the provider and select Add audience.

If you need to create the provider:

Via AWS Console:

Navigate to IAM > Identity providers > Add provider.
Select OpenID Connect.
Enter:
- Provider URL: https://accounts.google.com
- Audience: sts.amazonaws.com
Click Add provider.
Click on the newly created provider and select Add audience. Add the service account unique ID provided by Peach.

Via AWS CLI:

aws iam create-open-id-connect-provider \
  --url https://accounts.google.com \
  --client-id-list sts.amazonaws.com <SA_UNIQUE_ID> \
  --thumbprint-list 0000000000000000000000000000000000000000

Note: AWS no longer validates thumbprints for well-known OIDC providers like Google — it uses its own trusted CA library instead. The CLI requires the --thumbprint-list parameter syntactically, but the value is not used for validation. The console handles this automatically.

Step 3: Create an IAM Role

Via AWS Console:

Navigate to IAM > Roles > Create role.
Select Web identity as the trusted entity type.
Choose the identity provider accounts.google.com and audience sts.amazonaws.com.
Name the role (e.g., PeachDataDelivery).
Complete the wizard.

Important: The console wizard generates a trust policy with an aud condition but no sub condition. After the role is created, you must replace the entire trust policy with the JSON below to restrict access to Peach's specific service account.

Go to the role > Trust relationships > Edit trust policy. Replace the entire policy with the following, substituting <YOUR_ACCOUNT_ID> with your AWS account ID and <SA_UNIQUE_ID> with the numeric ID Peach provides:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::<YOUR_ACCOUNT_ID>:oidc-provider/accounts.google.com"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "accounts.google.com:sub": "<SA_UNIQUE_ID>"
        }
      }
    }
  ]
}

Via AWS CLI:

Save the trust policy JSON above to a file called trust-policy.json (replacing the placeholders), then run:

aws iam create-role \
  --role-name PeachDataDelivery \
  --assume-role-policy-document file://trust-policy.json

Important: The sub condition ensures that only Peach's specific service account can assume this role. Do not remove it. Audience validation is handled automatically by AWS via the OIDC provider's registered client ID list.

Step 4: Attach an S3 Write Policy

Create and attach an inline or managed policy to the role that grants write access only to your chosen prefix:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::<YOUR_BUCKET>/vendor/peach/*"
    }
  ]
}

Replace <YOUR_BUCKET> with your bucket name and vendor/peach/* with the prefix you want Peach to write under.

Via AWS CLI:

Save the policy JSON above to s3-policy.json, then run:

aws iam put-role-policy \
  --role-name PeachDataDelivery \
  --policy-name PeachS3Write \
  --policy-document file://s3-policy.json

If you use KMS encryption, also add the following statement to the policy:

{
  "Effect": "Allow",
  "Action": [
    "kms:GenerateDataKey",
    "kms:Decrypt"
  ],
  "Resource": "arn:aws:kms:<REGION>:<YOUR_ACCOUNT_ID>:key/<KMS_KEY_ID>"
}

Note: kms:Decrypt is not strictly required for the write operation, but is included so you can read the delivered files with the same role and for compatibility with future multipart uploads.

Step 5: Verify Your Setup (Optional)

Before handing off to Peach, you can confirm the role is configured correctly:

aws iam get-role --role-name PeachDataDelivery --query 'Role.AssumeRolePolicyDocument'

Verify the output matches the trust policy JSON from Step 3, with your account ID and Peach's service account unique ID filled in.

Hand-off Point

Send the following to your Peach contact:

Item	Example
S3 bucket name	`your-company-data-lake`
S3 prefix	`vendor/peach/`
IAM Role ARN	`arn:aws:iam::123456789012:role/PeachDataDelivery`
AWS region	`us-east-1`
KMS key ARN (optional)	Required only if you use server-side encryption with a customer-managed key

Peach will configure the delivery pipeline and begin sending files to your bucket.

Recommendations

Prefix scoping: Always scope the IAM policy to the narrowest prefix possible. Peach does not need access to any objects outside the delivery prefix.
Versioning: Consider enabling S3 versioning on your bucket for an additional layer of protection against accidental overwrites.
Encryption: If you require server-side encryption with a customer-managed KMS key (SSE-KMS), share the KMS key ARN with Peach during the hand-off. Otherwise, S3 default encryption (SSE-S3) will apply.

Troubleshooting

Symptom	Likely Cause	Resolution
Files are not arriving	The IAM trust policy `sub` condition does not match Peach's service account ID	Verify the numeric ID in the trust policy matches the value Peach provided
AccessDenied on S3 writes	The IAM policy Resource does not cover the write path	Ensure the policy includes the full prefix (e.g., `arn:aws:s3:::bucket/vendor/peach/*`)
KMS.AccessDeniedException	The delivery role does not have `kms:GenerateDataKey` on the KMS key	Add KMS permissions to the role's policy. If your KMS key has a restrictive key policy, also add the role ARN to the key policy.

If you encounter issues not covered above, reach out to your Peach contact and we will help resolve them.

BigQuery / Snowflake

This section covers ingesting Parquet files from GCS directly into BigQuery or Snowflake.

Note: If you prefer Snowflake-native data sharing instead of GCS-based ingestion, Peach also offers Snowflake Reader Accounts and Snowflake Direct Share as alternative delivery methods that do not require GCS ingestion.

Prerequisites

GCP Project with Owner or Storage Admin permissions
Google Cloud Service Account
For BigQuery: roles/bigquery.dataEditor on your dataset
For Snowflake: Account with CREATE INTEGRATION privileges

# Create the service account
gcloud iam service-accounts create peach_data \
  --display-name="Peach Data Ingestion"

# Get the service account email
SERVICE_ACCOUNT=peach_data@YOUR_PROJECT.iam.gserviceaccount.com

Hand-off Point: Send the service account email to your Peach contact. Peach will grant access to your GCS folder (gs://peach-data-outbox/{client_name}).

Grant BigQuery Permissions (if using BigQuery)

gcloud projects add-iam-policy-binding YOUR_PROJECT \
  --member="serviceAccount:$SERVICE_ACCOUNT" \
  --role="roles/bigquery.dataEditor"

BigQuery Ingestion

Option 1: External Table via SQL

CREATE OR REPLACE EXTERNAL TABLE
`YOUR_PROJECT.YOUR_DATASET.TABLE_NAME`
OPTIONS (
  format = 'PARQUET',
  uris = ['gs://peach-data-outbox/{client_name}/TABLE_NAME/*']
);

Option 2: Dataform-Based Ingestion

Generate table list:

export TABLES=$(gsutil ls gs://peach-data-outbox/{client_name}/ \
  | sed 's#gs://peach-data-outbox/{client_name}/##; s#/$##' \
  | paste -sd"," -)

Configure dataform.json:

{
  "defaultSchema": "YOUR_DATASET",
  "defaultCredentials": {
    "project_id": "YOUR_PROJECT",
    "keyFile": "/path/to/your-service-account.json"
  }
}

Create definitions/incremental_tables.js:

const tables = process.env.TABLES.split(',');

tables.forEach(name => {
  publish(name, {
    type: "external",
    bigquery: {
      external: {
        sourceFormat: "PARQUET",
        compression: "GZIP",
        sourceUris: [
          `gs://peach-data-outbox/{client_name}/${name}/*`
        ]
      }
    }
  });
});

Run Dataform:

dataform compile
dataform run

Snowflake Ingestion

Create Storage Integration:

CREATE STORAGE INTEGRATION gcs_peach_int
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = GCS
  ENABLED = TRUE
  STORAGE_ALLOWED_LOCATIONS = ('gcs://peach-data-outbox/{client_name}/')
  STORAGE_GCP_SERVICE_ACCOUNT = '$SERVICE_ACCOUNT';

Create Stage:

CREATE OR REPLACE STAGE peach_data_outbox
  URL = 'gcs://peach-data-outbox/{client_name}/'
  STORAGE_INTEGRATION = gcs_peach_int
  FILE_FORMAT = (TYPE = PARQUET COMPRESSION = GZIP);

Create External Table:

CREATE OR REPLACE EXTERNAL TABLE YOUR_DB.YOUR_SCHEMA.TABLE_NAME
  WITH LOCATION = @peach_data_outbox/TABLE_NAME/
  AUTO_REFRESH = TRUE
  ENABLE_SCHEMA_EVOLUTION = TRUE
  FILE_FORMAT = (TYPE = PARQUET COMPRESSION = GZIP);

DuckDB (Local Testing)

This section covers querying and ingesting Parquet files locally using DuckDB—an open-source, lightweight, serverless SQL database management system. This is ideal for quick data exploration, validation, or teams without existing cloud data warehouse infrastructure. It is designed to run complex analytical queries on large datasets efficiently without requiring a separate server process.

Note: DuckDB is best suited for local testing and data exploration, not production data pipelines. To access your Parquet files in GCS, you will need a GCP account and a service account. Share the service account email with Peach so we can grant read access to your GCS bucket. No keys or secrets are exchanged.

Key features of DuckDB:

In-process: Runs directly within your application, no separate server needed
Fast analytics: Optimized for analytical workloads with columnar storage
SQL compliant: Supports standard SQL with advanced analytical functions
Zero dependencies: Single binary installation, no external dependencies
Native Parquet support: First-class support for reading and writing Parquet files

Installation

Option 1: Command Line Interface (CLI)

The simplest way to get started with DuckDB is through its CLI tool.

macOS

# Using Homebrew
brew install duckdb

Linux

# Download the latest release
wget https://github.com/duckdb/duckdb/releases/latest/download/duckdb_cli-linux-amd64.zip
unzip duckdb_cli-linux-amd64.zip
chmod +x duckdb
sudo mv duckdb /usr/local/bin/

Windows

# Using winget
winget install DuckDB.cli

# Or download directly from:
# https://github.com/duckdb/duckdb/releases/latest

Verify Installation

duckdb --version

Option 2: Python Integration

If you're working with Python, install the DuckDB Python package:

pip install duckdb

Ingesting Parquet Files into DuckDB

Method 1: Using DuckDB CLI

This method is perfect for quick data exploration and one-off queries.

Step 1: Launch DuckDB

Open your terminal and start DuckDB:

# Start with an in-memory database (temporary)
duckdb

# Or create/open a persistent database file
duckdb mydata.db

You should see the DuckDB prompt at this point.

Step 2: Query Parquet Files Directly

DuckDB can query Parquet files without importing them first:

-- Use glob patterns to query multiple files
SELECT * FROM 'data/*.parquet';

-- Query multiple files with different patterns
SELECT * FROM 'data/year=2024/month=*/day=*/*.parquet';

Step 3: Create a View (Optional)

For easier querying, create a view that points to your Parquet file(s):

-- Create a view
CREATE VIEW my_data AS
SELECT * FROM 'path/to/data.parquet';

-- Now query the view
SELECT * FROM my_data WHERE category = 'A';

Step 4: Import Parquet Data into a Table

To permanently store the data in your DuckDB database:

-- Method A: Create table from Parquet file directly
CREATE TABLE my_table AS
SELECT * FROM 'data.parquet';

-- Method B: Create table with explicit schema first
CREATE TABLE my_table (
    id INTEGER,
    name VARCHAR,
    value DOUBLE,
    timestamp TIMESTAMP
);

-- Then insert data from Parquet
INSERT INTO my_table
SELECT * FROM 'data.parquet';

-- Method C: Import multiple Parquet files at once
CREATE TABLE combined_data AS
SELECT * FROM 'data/*.parquet';

Step 5: Verify the Import

-- Check row count
SELECT COUNT(*) FROM my_table;

-- Inspect schema
DESCRIBE my_table;

-- Preview data
SELECT * FROM my_table LIMIT 5;

-- Get basic statistics
SELECT
    COUNT(*) as total_rows,
    COUNT(DISTINCT id) as unique_ids,
    MIN(timestamp) as earliest_date,
    MAX(timestamp) as latest_date
FROM my_table;

Method 2: Using Python with DuckDB

This method is ideal for data pipelines and programmatic workflows.

Step 1: Setup

import duckdb

# Create an in-memory database connection
conn = duckdb.connect()

# Or connect to a persistent database
conn = duckdb.connect('mydata.db')

Step 2: Query Parquet Files Directly

# Simple query
result = conn.execute("""
    SELECT * FROM 'data.parquet' LIMIT 10
""").fetchall()

# Or use fetchdf() to get a pandas DataFrame
df = conn.execute("""
    SELECT * FROM 'data.parquet'
    WHERE category = 'A'
""").fetchdf()

print(df.head())

Step 3: Create a Table from Parquet

# Method A: Direct table creation
conn.execute("""
    CREATE TABLE my_table AS
    SELECT * FROM 'data.parquet'
""")

# Method B: Using read_parquet function
conn.execute("""
    CREATE TABLE my_table AS
    SELECT * FROM read_parquet('data/*.parquet')
""")

# Method C: From pandas DataFrame (if you load Parquet via pandas)
import pandas as pd
df = pd.read_parquet('data.parquet')
conn.execute("CREATE TABLE my_table AS SELECT * FROM df")

Step 4: Work with the Data

# Query the table
result = conn.execute("""
    SELECT category, COUNT(*) as count, AVG(value) as avg_value
    FROM my_table
    GROUP BY category
""").fetchdf()

print(result)

# Update data
conn.execute("""
    UPDATE my_table
    SET value = value * 1.1
    WHERE category = 'A'
""")

# Create indexes for faster queries
conn.execute("""
    CREATE INDEX idx_category ON my_table(category)
""")

Step 5: Close Connection

conn.close()

Troubleshooting

Issue: Memory Errors with Large Files

-- Increase memory limit
SET memory_limit='16GB';

-- Or use streaming/chunked processing
SELECT * FROM 'large.parquet' WHERE id BETWEEN 1 AND 1000;

Issue: Schema Mismatch

-- Check actual schema
DESCRIBE SELECT * FROM 'data.parquet';

-- Use explicit SELECT to handle mismatches
CREATE TABLE my_table AS
SELECT
    column1,
    column2,
    CAST(column3 AS INTEGER) as column3
FROM 'data.parquet';

Best Practices

Use Persistence: For production workloads, always use a file-based database (duckdb mydata.db) rather than in-memory.
Leverage Direct Queries: For read-only operations, query Parquet files directly without importing.
Index Frequently Queried Columns: Create indexes on columns used in WHERE clauses and JOINs
Compress Output: Always use compression when exporting to Parquet (GZIP for better compression ratio, SNAPPY for faster performance)
Set Appropriate Memory Limits: Configure memory_limit based on your system resources
Use Explicit Schemas: When creating tables, specify schemas explicitly for better type control
Batch Operations: For large imports, use single CREATE TABLE AS statements rather than multiple INSERTs

Additional Resources

Official Documentation
Parquet Guide
Python API
GitHub

Frequently Asked Questions

How often are files delivered?

Depends on your selected plan: Daily (once at 6 AM local), Twice Daily (6 AM and 6 PM), or Hourly.

What format and compression are used?

Parquet files compressed with GZIP, including embedded schema metadata.

What permissions does my service account need?

Your service account requires storage.objectViewer on the GCS folder. Additional permissions depend on your destination platform (e.g., bigquery.dataEditor for BigQuery).

How long is data retained in GCS?

Default retention is 365 days of files. Retention can be extended on request.

What if I have a sandbox environment?

If you have replica access in sandbox, it will be configured concurrently with production. Both can be set up in parallel.

What if my platform isn't listed here?

Contact your Peach technical representative. The documentation will vary based on your specific data ingestion setup.

Next Steps

Your Customer Success contact will:

Confirm your data delivery configuration as part of your order form
Coordinate technical setup steps with your team
Provide your Peach-managed GCS bucket path once provisioned

Questions

Contact support@peachfinance.com

Data Delivery Setup GuideCopyCopy for LLMCopy page as Markdown for LLMsView as MarkdownOpen this page as MarkdownOpen in ChatGPTGet insights from ChatGPTOpen in ClaudeGet insights from ClaudeConnect to CursorInstall MCP server on CursorConnect to VS CodeInstall MCP server on VS Code

Setup Steps

Which Instructions Apply to You?

Alternative Delivery Methods

Databricks Delta

Prerequisites

Install Required Tools

Create GCP Service Account

Store Credentials in Databricks Secret Manager

Create Python Ingestion Script

Deploy Databricks Job

Verify

Azure Storage

Prerequisites

Create Azure Storage Account

Create a Container

Register an Application in Microsoft Entra ID

Grant Storage Permissions to the App

Configure Federated Identity Credential

Hand-off Point

Amazon S3 via Fivetran

Prerequisites

Set Up S3 Destination in Fivetran

Create GCS Files Connector in Fivetran

Hand-off Point

Complete Connector Setup

Verify

Amazon S3 Push Delivery

What Peach Provides

File Organization

Step 1: Create an S3 Bucket (or Use an Existing One)

Step 2: Create an OIDC Identity Provider (One-Time per AWS Account)

Step 3: Create an IAM Role

Step 4: Attach an S3 Write Policy

Step 5: Verify Your Setup (Optional)

Hand-off Point

Recommendations

Troubleshooting

BigQuery / Snowflake

Prerequisites

Create and Share Service Account

Grant BigQuery Permissions (if using BigQuery)

BigQuery Ingestion

Option 1: External Table via SQL

Option 2: Dataform-Based Ingestion

Snowflake Ingestion

DuckDB (Local Testing)

Installation

Option 1: Command Line Interface (CLI)

Option 2: Python Integration

Ingesting Parquet Files into DuckDB

Method 1: Using DuckDB CLI

Method 2: Using Python with DuckDB

Troubleshooting

Best Practices

Additional Resources

Frequently Asked Questions

Next Steps

Questions

Was this page helpful?

Data Delivery Setup Guide
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude
Connect to Cursor
Install MCP server on Cursor
Connect to VS Code
Install MCP server on VS Code