Window Functions

Window functions let you compute values across related rows without collapsing your data. If you have used table_group_by with table_agg, you know that aggregation reduces many rows into one row per group. Window functions do the opposite: they add new columns while keeping every original row intact.

This is powerful when you need rankings, running totals, comparisons to previous rows, or moving averages -- all while preserving the full detail of your dataset.

Prerequisites

Complete Joins & Combining Data or be comfortable with tables and pipes
Sample data files from the series intro

`table_window` at a Glance

Every window operation uses the same function:

tova

tableWindow(table, options, window_functions)

Argument	Description
`table`	The input table
`options`	`{partition: "col", order: "col", desc: true/false}` -- partition groups rows (like GROUP BY but without collapsing), order sorts within each partition, desc reverses the sort
`window_functions`	An object mapping new column names to `win_*` function calls

Both partition and order are optional. Without partition, the entire table is treated as one partition. Without order, rows keep their original order.

Ranking Functions

Ranking is the most common use of window functions. Given a sorted partition, each row gets a position number.

Row Number Within a Partition

Rank employees by salary within each department, highest first:

tova

async fn main() {
  employees = await read("data/employees.csv")

  ranked = tableWindow(employees,
    {partition: "department", order: "salary", desc: true},
    {rank_in_dept: winRowNumber()})
    |> tableSelect("name", "department", "salary", "rank_in_dept")

  print(ranked)
}

winRowNumber() assigns 1, 2, 3, ... sequentially. Because we set desc: true, the highest salary in each department gets rank 1. The partition: "department" option restarts the numbering for each department.

Dense Rank and Percent Rank

When two employees share the same score, winDenseRank() gives them the same rank and does not skip numbers. winPercentRank() returns a value between 0.0 and 1.0 representing relative position.

tova

async fn main() {
  employees = await read("data/employees.csv")

  perf_ranked = tableWindow(employees,
    {order: "performance_score", desc: true},
    {perf_rank: winDenseRank(), percentile: winPercentRank()})
    |> tableSelect("name", "department", "performance_score",
                    "perf_rank", "percentile")
    |> tableLimit(10)

  print(perf_ranked)
}

Notice there is no partition here -- the entire company is ranked together. The top performer gets perf_rank 1 and percentile 0.0 (the highest). The lowest performer gets percentile 1.0.

Dense Rank vs Rank

winRank() leaves gaps after ties (1, 2, 2, 4), while winDenseRank() does not (1, 2, 2, 3). Use dense rank when you want consecutive bucket numbers.

Ntile: Dividing into Buckets

winNtile(n) splits the ordered rows into n roughly equal groups. This is how you compute quartiles, deciles, or percentile buckets:

tova

async fn main() {
  employees = await read("data/employees.csv")

  quartiles = tableWindow(employees,
    {order: "salary", desc: true},
    {salary_quartile: winNtile(4)})
    |> tableDerive({quartile_label: fn(r) {
      match r.salary_quartile {
        1 => "Top 25%"
        2 => "Upper Mid"
        3 => "Lower Mid"
        4 => "Bottom 25%"
        _ => "Unknown"
      }
    }})

  print(quartiles |> tableSelect("name", "salary",
    "salary_quartile", "quartile_label"))
}

The highest-paid employees land in quartile 1 (top 25%), and we use table_derive with a match expression to attach human-readable labels.

Running Aggregates

Running (cumulative) aggregates grow as you move through the rows. They are essential for tracking totals over time, counting events, or computing running averages.

tova

async fn main() {
  sales = await read("data/sales.csv")

  running = tableWindow(sales |> tableSortBy("date"),
    {order: "date"},
    {
      cumulative_revenue: winRunningSum("amount"),
      deal_number: winRunningCount(),
      running_avg_deal: winRunningAvg("amount")
    })

  print(running |> tableSelect("date", "product", "amount",
    "cumulative_revenue", "deal_number", "running_avg_deal"))
}

Function	What it computes
`winRunningSum("amount")`	Sum of `amount` from the first row up to the current row
`winRunningCount()`	How many rows have been seen so far (1, 2, 3, ...)
`winRunningAvg("amount")`	Average of `amount` from the first row up to the current row
`winRunningMin("amount")`	Smallest `amount` seen so far
`winRunningMax("amount")`	Largest `amount` seen so far

Note that we sort the sales table by date first with tableSortBy("date") and also set order: "date" in the window options. The sort ensures the input rows are in chronological order, and the window order determines row sequencing within partitions.

Lag and Lead

win_lag and win_lead let you look at values from neighboring rows. This is invaluable for period-over-period comparisons.

tova

async fn main() {
  sales = await read("data/sales.csv")

  comparison = tableWindow(sales |> tableSortBy("date"),
    {order: "date"},
    {
      prev_amount: winLag("amount", 1, 0),
      next_amount: winLead("amount", 1, 0)
    })
    |> tableDerive({change_from_prev: fn(r) r.amount - r.prev_amount})

  print(comparison |> tableSelect("date", "product", "amount",
    "prev_amount", "next_amount", "change_from_prev"))
}

winLag("amount", 1, 0) says: look 1 row back in the partition, and if there is no previous row (the first row), return 0 as the default. win_lead does the same but looks forward.

After computing lag and lead, we use table_derive to calculate the change from the previous deal. A positive change_from_prev means the current deal was larger; negative means it was smaller.

Lag/Lead Arguments

The full signature is winLag(column, offset, default). The offset defaults to 1 and the default value defaults to nil. Always provide a default (like 0) if you plan to do arithmetic on the result -- otherwise the first/last rows will contain nil.

Moving Averages

A moving average smooths out noise by averaging over a sliding window of the last N rows. This is commonly used for trend lines in time-series data.

tova

async fn main() {
  sales = await read("data/sales.csv")

  smoothed = tableWindow(sales |> tableSortBy("date"),
    {order: "date"},
    {
      ma_3: winMovingAvg("amount", 3),
      running_min: winRunningMin("amount"),
      running_max: winRunningMax("amount")
    })

  print(smoothed |> tableSelect("date", "amount",
    "ma_3", "running_min", "running_max"))
}

winMovingAvg("amount", 3) averages the current row and the two rows before it. For the first row, the window contains only that row, so the moving average equals the value itself. For the second row, it averages two values, and from the third row onward it averages three.

Combining the moving average with running min and max gives you a complete picture: the trend line, the floor, and the ceiling.

First and Last Value

win_first_value and win_last_value retrieve the value from the first or last row within each partition. This is useful for showing boundaries alongside every row.

tova

async fn main() {
  employees = await read("data/employees.csv")

  hire_bounds = tableWindow(employees,
    {partition: "department", order: "hire_date"},
    {
      first_hired: winFirstValue("name"),
      last_hired: winLastValue("name")
    })

  print(hire_bounds |> tableSelect("name", "department",
    "hire_date", "first_hired", "last_hired"))
}

Every Engineering row will show the name of the earliest and most recent hire in Engineering. Every Marketing row will show the same for Marketing. The values are constant within a partition but differ across partitions.

Putting It All Together

Here is a complete example that combines several window techniques in a single pipeline:

tova

async fn main() {
  employees = await read("data/employees.csv")
  sales = await read("data/sales.csv")

  // Rank employees within their department
  ranked = tableWindow(employees,
    {partition: "department", order: "salary", desc: true},
    {rank_in_dept: winRowNumber()})
    |> tableSelect("name", "department", "salary", "rank_in_dept")

  // Dense rank + percent rank across the company
  perf_ranked = tableWindow(employees,
    {order: "performance_score", desc: true},
    {perf_rank: winDenseRank(), percentile: winPercentRank()})
    |> tableSelect("name", "department", "performance_score",
                    "perf_rank", "percentile")
    |> tableLimit(10)

  // Salary quartiles with labels
  quartiles = tableWindow(employees,
    {order: "salary", desc: true},
    {salary_quartile: winNtile(4)})
    |> tableDerive({quartile_label: fn(r) {
      match r.salary_quartile {
        1 => "Top 25%"
        2 => "Upper Mid"
        3 => "Lower Mid"
        4 => "Bottom 25%"
        _ => "Unknown"
      }
    }})

  // Running totals on sales
  running = tableWindow(sales |> tableSortBy("date"),
    {order: "date"},
    {
      cumulative_revenue: winRunningSum("amount"),
      deal_number: winRunningCount(),
      running_avg_deal: winRunningAvg("amount")
    })

  // Lag/lead comparison
  comparison = tableWindow(sales |> tableSortBy("date"),
    {order: "date"},
    {
      prev_amount: winLag("amount", 1, 0),
      next_amount: winLead("amount", 1, 0)
    })
    |> tableDerive({change_from_prev: fn(r) r.amount - r.prev_amount})

  // Moving average with running bounds
  smoothed = tableWindow(sales |> tableSortBy("date"),
    {order: "date"},
    {
      ma_3: winMovingAvg("amount", 3),
      running_min: winRunningMin("amount"),
      running_max: winRunningMax("amount")
    })

  // First and last hire per department
  hire_bounds = tableWindow(employees,
    {partition: "department", order: "hire_date"},
    {
      first_hired: winFirstValue("name"),
      last_hired: winLastValue("name")
    })

  print(ranked)
  print(perf_ranked)
  print(running)
}

Window Function Reference

All 16 window functions available in Tova:

Ranking

Function	Description
`winRowNumber()`	Sequential number in partition (1, 2, 3, ...)
`winRank()`	Rank with gaps for ties (1, 2, 2, 4)
`winDenseRank()`	Rank without gaps (1, 2, 2, 3)
`winPercentRank()`	Relative rank as a fraction (0.0 to 1.0)
`winNtile(n)`	Divide into n equal-sized buckets

Offset

Function	Description
`winLag(col, offset?, default?)`	Value from a previous row (default offset: 1)
`winLead(col, offset?, default?)`	Value from a following row (default offset: 1)
`winFirstValue(col)`	First value in the partition
`winLastValue(col)`	Last value in the partition

Running Aggregates

Function	Description
`winRunningSum(col)`	Cumulative sum from first row to current
`winRunningCount()`	Cumulative count (1, 2, 3, ...)
`winRunningAvg(col)`	Cumulative average from first row to current
`winRunningMin(col)`	Smallest value seen so far
`winRunningMax(col)`	Largest value seen so far

Sliding Window

Function	Description
`winMovingAvg(col, n)`	Average over the last n rows

`table_window` Options

Option	Type	Description
`partition`	String	Column name to partition by (optional -- omit to treat the whole table as one partition)
`order`	String	Column name to sort by within each partition (optional)
`desc`	Bool	If true, sort in descending order (default: false)