Data Providers

OpenAnalyst connects to a broad ecosystem of data sources including relational databases, cloud data warehouses, SaaS platforms, cloud storage, APIs, and local file uploads.

Overview

A data provider (also called a connector or integration) is a configured link between OpenAnalyst and an external data source. Once connected, OpenAnalyst indexes the schema of your data source, making it available for natural-language queries, dashboards, and AI agent workflows.

All connection credentials are encrypted at rest using AES-256 and in transit over TLS 1.3. OpenAnalyst never stores a copy of your raw data — queries run against your source in real time or against a configurable cache layer.

Note: To add a new data source, navigate to Connections > Add Connection in your workspace. The connector catalog is searchable and organized by category.

Relational Databases

OpenAnalyst provides native connectors for all major relational databases. These connectors use read-only query execution and support parameterized queries, connection pooling, and SSL/TLS authentication.

Database	Min. Version	Authentication	SSL Support
PostgreSQL	11.0	Username/password, SSL client cert	Yes (required on prod)
MySQL	5.7	Username/password	Yes
MongoDB	4.4	Username/password, X.509, AWS IAM	Yes
Snowflake	All versions	Username/password, key-pair, SSO	Always-on
BigQuery	All versions	Service account JSON, OAuth	Always-on

PostgreSQL

The PostgreSQL connector supports standard connection strings and advanced options including read replicas, connection pooling via PgBouncer, and schema-level access control. You can restrict OpenAnalyst to specific schemas to limit the data it can access.

# Example connection string format
postgresql://readonly_user:password@db.example.com:5432/analytics_db?sslmode=require

# Granting read-only access (run in your PostgreSQL instance)
CREATE ROLE openanalyst_reader WITH LOGIN PASSWORD 'strong_password';
GRANT CONNECT ON DATABASE analytics_db TO openanalyst_reader;
GRANT USAGE ON SCHEMA public TO openanalyst_reader;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO openanalyst_reader;

MySQL

The MySQL connector is compatible with MySQL 5.7 and above, as well as MariaDB 10.3 and above. It supports both the classic mysql protocol and the newer X Protocol. For cloud-hosted MySQL instances (Amazon RDS, Google Cloud SQL, PlanetScale), standard connection parameters apply.

# Grant read-only MySQL access
CREATE USER 'openanalyst'@'%' IDENTIFIED BY 'strong_password';
GRANT SELECT ON analytics_db.* TO 'openanalyst'@'%';
FLUSH PRIVILEGES;

MongoDB

The MongoDB connector connects to replica sets and sharded clusters. OpenAnalyst automatically samples collection schemas to build its understanding of your document structure. For collections with highly variable schemas, you can provide a manual schema hint in the connection settings.

# Example MongoDB connection URI
mongodb+srv://openanalyst:password@cluster0.abc123.mongodb.net/analytics?authSource=admin

Snowflake

The Snowflake connector authenticates via username/password, RSA key-pair, or Okta-based SSO. You can specify a warehouse, database, schema, and role. OpenAnalyst creates its own session and respects Snowflake's query timeout and cost controls.

BigQuery

The BigQuery connector uses Google service account credentials. Create a service account with the BigQuery Data Viewer and BigQuery Job User roles, then download the JSON key file and paste its contents into the connector form.

Cloud Data Warehouses and Storage

In addition to relational databases, OpenAnalyst can query data stored in cloud object storage and modern lakehouse formats.

AWS S3: Connect via IAM access keys or IAM role assumption. OpenAnalyst can query Parquet, CSV, and JSON files directly, or integrate with AWS Athena for SQL-based querying over S3.
Google Cloud Storage: Authenticate with a service account. Supports Parquet, CSV, Avro, and ORC file formats.
Azure Blob Storage: Connect with a connection string or SAS token. Support for Delta Lake tables is available on Pro and above plans.

SaaS and Productivity Platforms

OpenAnalyst integrates directly with popular business tools to bring operational data into your analytics workflows without exporting to a database first.

Google Sheets: Authenticate via Google OAuth. OpenAnalyst can read any sheet in your Google Drive that you have granted access to. Data refreshes on query or on a configurable schedule.
Airtable: Connect with your Airtable API key or OAuth token. All bases and tables in your workspace are discoverable. Supports linked records and formula field expansion.
Notion: Use the Notion integration token to read database pages. OpenAnalyst maps Notion database properties to a tabular schema for querying.
Salesforce: Connect via OAuth 2.0. Supports standard and custom objects. OpenAnalyst uses the SOQL query interface internally.
HubSpot: Connect via private app tokens. Access CRM data including contacts, deals, companies, and custom pipelines.

APIs and Custom Sources

For data sources that do not have a native connector, OpenAnalyst provides a generic API connector that can fetch and parse JSON or CSV responses from any HTTP endpoint.

REST APIs: Configure the base URL, authentication method (API key header, Bearer token, Basic auth, OAuth 2.0), and pagination strategy (cursor, offset, link header).
GraphQL: Provide the GraphQL endpoint and your query. OpenAnalyst maps the response structure to a tabular format.
Webhooks: Use OpenAnalyst as a webhook receiver. Incoming payloads are stored in a managed buffer table that you can query like any other source.

File Uploads

For one-off analysis or importing historical data, OpenAnalyst accepts direct file uploads:

CSV: Up to 500 MB per file. Delimiter and encoding are auto-detected. Custom delimiters can be specified manually.
Excel (.xlsx, .xls): All sheets within the workbook are imported as separate tables. Formula values are captured at upload time.
JSON: Both flat and nested JSON structures are supported. Nested objects are flattened using dot notation (e.g., address.city).
Parquet: Compressed columnar format. Ideal for large datasets where CSV would be impractical.

Tip: For regularly updated file-based data, consider using the Google Sheets or Airtable connectors instead of repeated file uploads. They automatically reflect the latest data without manual re-upload.