Data Block
The data {} block is a top-level block alongside shared, server, and browser. It provides a declarative home for data source definitions, reusable transform pipelines, validation rules, and refresh policies. Instead of scattering data logic across server functions, the data {} block centralizes your data layer in one place.
Why a Data Block?
Without it, data sources, cleaning pipelines, and validation rules end up spread across server functions. The data {} block gives you:
- Source registry -- all data sources declared in one place
- Named pipelines -- reusable transform chains referenced by name
- Validation rules -- per-type constraints declared alongside the data
- Refresh policies -- how often sources reload
- Self-documenting -- new team members read
data {}to understand the data flow
Sources
A source declares a named data source. Sources are loaded lazily on first access and cached by default:
data {
source customers = read("customers.csv")
source orders = read("orders.csv")
source exchange_rates = read("https://api.exchangerate.host/latest")
}Type Annotations
Add a type annotation to enable compile-time column validation:
data {
source customers: Table<Customer> = read("customers.csv")
source orders: Table<Order> = read("orders.csv")
}How Sources Compile
Sources compile to lazy-initialized cached getters. The data is loaded once on first access, then cached:
// This source declaration:
data {
source customers = read("customers.csv")
}
// Compiles roughly to:
// let __data_customers_cache = null;
// function __data_customers() {
// if (!__data_customers_cache) {
// __data_customers_cache = read("customers.csv");
// }
// return __data_customers_cache;
// }Pipelines
A pipeline declares a named, reusable transform chain. Pipelines can reference sources and other pipelines:
data {
source raw_customers = read("customers.csv")
pipeline clean = raw_customers
|> drop_nil(.email)
|> fill_nil(.spend, 0.0)
|> derive(
.name = .name |> trim(),
.email = .email |> lower()
)
|> where(.spend > 0)
pipeline summary = clean
|> group_by(.country)
|> agg(
count: count(),
total_spend: sum(.spend),
avg_spend: mean(.spend)
)
|> sort_by(.total_spend, desc: true)
}Pipelines compile to async functions that execute the transform chain when called.
Validation Rules
The validate keyword declares per-type validation rules using column expressions:
data {
validate Customer {
.email |> contains("@"),
.name |> len() > 0,
.spend >= 0
}
validate Order {
.quantity > 0,
.amount > 0
}
}Each rule is a predicate on a column. The validate block compiles to a validator function that returns { valid: true/false, errors: [...] }:
// Compiled validator can be called as:
result = __validate_Customer(row)
// result.valid → true or false
// result.errors → ["Validation rule 1 failed", ...]Refresh Policies
For long-running servers, refresh policies control how often source data is reloaded. Two modes are available:
Interval Refresh
Reload a source on a timer:
data {
source exchange_rates = read("https://api.exchangerate.host/latest")
refresh exchange_rates every 1.hour
source customers = read("customers.csv")
refresh customers every 15.minutes
}Supported time units: seconds, minutes, hours (and their singular forms second, minute, hour).
Interval refresh compiles to a setInterval that clears the source cache, so the next access triggers a fresh load.
On-Demand Refresh
Reload only when explicitly triggered:
data {
source orders = read("orders.csv")
refresh orders on_demand
}This generates a refresh_orders() function that clears the cache, letting the next access reload the data.
Interaction with Other Blocks
Sources and pipelines declared in data {} are available in server {} and browser {} blocks by name:
data {
source users = read("users.csv")
pipeline active_users = users |> where(.active)
}
server {
fn get_active_users() {
active_users // references the pipeline directly
}
fn get_user(id: Int) {
users |> find(fn(u) u.id == id)
}
route GET "/api/users" => get_active_users
}
browser {
state users = []
effect {
users = server.get_active_users()
}
}Complete Example
A full data block showing all features together:
shared {
type Customer {
id: Int
name: String
email: String
spend: Float
country: String
}
}
data {
source customers: Table<Customer> = read("customers.csv")
source orders = read("orders.csv")
pipeline clean = customers
|> drop_nil(.email)
|> fill_nil(.spend, 0.0)
|> derive(.name = .name |> trim(), .email = .email |> lower())
|> where(.spend > 0)
pipeline summary = clean
|> group_by(.country)
|> agg(
count: count(),
total_spend: sum(.spend),
avg_spend: mean(.spend)
)
|> sort_by(.total_spend, desc: true)
validate Customer {
.email |> contains("@"),
.name |> len() > 0,
.spend >= 0
}
refresh customers every 10.minutes
refresh orders on_demand
}
server {
fn get_customers() { clean }
fn get_summary() { summary }
route GET "/api/customers" => get_customers
route GET "/api/summary" => get_summary
}Practical Tips
Put all data definitions in data {}. Keep server functions focused on serving and routing. Data loading, cleaning, and transformation belong in the data block.
Name your pipelines descriptively. Pipeline names like clean_customers and top_products serve as documentation. Other developers can read the data {} block to understand the full data flow.
Use on_demand for expensive sources. If a source is expensive to reload (large file, slow API), use refresh ... on_demand and trigger refreshes explicitly instead of on a timer.
Layer pipelines. Pipelines can reference other pipelines, so build incrementally: raw data → cleaned → filtered → aggregated. Each step is reusable on its own.