Skip to content

Chapter 12: Capstone — Text Analyzer

This is it. Everything you've learned comes together in one real project: a text analyzer that processes text, computes statistics, and outputs results in multiple formats.

This chapter isn't about learning new concepts — it's about applying everything from Chapters 1-11 in a cohesive, real-world program.

What We're Building

A text analysis tool that:

  • Counts words, unique words, characters, and lines
  • Finds the longest word and most common words
  • Estimates reading time
  • Outputs results in plain text, table, or JSON format
  • Uses clean types, pattern matching, pipes, and error handling

Architecture

Before writing code, let's plan the modules:

text-analyzer/
  types.tova        ← Data types (Ch 6)
  text.tova         ← Text processing (Ch 3, 4, 8)
  analysis.tova     ← Analysis logic (Ch 2, 7)
  formatter.tova    ← Output formatting (Ch 5)
  main.tova         ← Entry point

Step 1: Define the Types

Every good project starts with types. What data are we working with?

tova
// types.tova
type AnalysisResult {
  word_count: Int
  unique_words: Int
  char_count: Int
  line_count: Int
  avg_word_length: Float
  longest_word: String
  most_common: [{ word: String, count: Int }]
  reading_time_minutes: Float
}

type OutputFormat {
  Plain
  Tabular
  JsonFmt
}

The AnalysisResult record type holds everything we compute. OutputFormat is an ADT with three variants — one for each output style.

Concepts used: Record types, ADTs (Chapter 6)

Step 2: Text Processing

Clean, split, and count words:

tova
// text.tova
fn clean_word(word) {
  word
    |> lower()
    |> trim()
    |> replace(",", "")
    |> replace(".", "")
    |> replace("!", "")
    |> replace("?", "")
    |> replace(";", "")
    |> replace(":", "")
    |> replace("\"", "")
    |> replace("'", "")
}

fn extract_words(text) {
  split(text, " ")
    |> map(fn(w) clean_word(w))
    |> filter(fn(w) len(w) > 0)
}

fn count_frequencies(words) {
  var freq = {}
  for w in words {
    if freq[w] == undefined {
      freq[w] = 0
    }
    freq[w] += 1
  }
  freq
}

fn top_words(freq, n) {
  entries(freq)
    |> sorted(fn(e) 0 - e[1])
    |> take(n)
    |> map(fn(e) { word: e[0], count: e[1] })
}

Notice how clean_word uses a pipe chain, extract_words composes map and filter, and top_words is a pure data transformation pipeline.

Concepts used: Pipes (Chapter 8), Collections (Chapter 3), Strings (Chapter 4), Lambdas (Chapter 2)

Step 3: Analysis Logic

Compute all the statistics:

tova
// analysis.tova
fn analyze(text) {
  lines = split(text, "\\n")
  all_words = extract_words(text)
  freq = count_frequencies(all_words)

  word_count = len(all_words)
  unique_count = len(keys(freq))

  longest = if word_count > 0 {
    all_words |> sorted(fn(w) 0 - len(w)) |> take(1)
  } else {
    [""]
  }

  avg_len = if word_count > 0 {
    total_chars = all_words |> map(fn(w) len(w)) |> sum()
    toFloat(total_chars) / toFloat(word_count)
  } else {
    0.0
  }

  AnalysisResult(
    word_count: word_count,
    unique_words: unique_count,
    char_count: len(text),
    line_count: len(lines),
    avg_word_length: avg_len,
    longest_word: longest[0],
    most_common: top_words(freq, 10),
    reading_time_minutes: toFloat(word_count) / 200.0
  )
}

Everything is an expression. The if blocks compute values, pipes transform data, and the result is constructed from named fields.

Concepts used: Expressions (Chapter 1), Functions (Chapter 2), Pipes (Chapter 8)

Step 4: Output Formatting

Use pattern matching to dispatch to the right formatter:

tova
// formatter.tova
fn format_result(result, fmt) {
  match fmt {
    Plain => format_plain(result)
    Tabular => format_table(result)
    JsonFmt => format_json(result)
  }
}

fn format_plain(r) {
  lines = [
    "Text Analysis Results",
    repeat("=", 35),
    "Words:           {r.word_count}",
    "Unique words:    {r.unique_words}",
    "Characters:      {r.char_count}",
    "Lines:           {r.line_count}",
    "Avg word length: {toInt(r.avg_word_length * 10) / 10}",
    "Longest word:    {r.longest_word}",
    "Reading time:    {toInt(r.reading_time_minutes * 10) / 10} min",
    "",
    "Top 10 Words:",
    repeat("-", 25)
  ]

  for entry in r.most_common {
    bar = repeat("#", entry.count)
    lines.push("  {padEnd(entry.word, 15)} {bar} ({entry.count})")
  }

  join(lines, "\n")
}

fn format_table(r) {
  header = "| {padEnd(\"Metric\", 20)} | {padEnd(\"Value\", 15)} |"
  sep = "| {repeat(\"-\", 20)} | {repeat(\"-\", 15)} |"

  rows = [
    header, sep,
    "| {padEnd(\"Words\", 20)} | {padEnd(toString(r.word_count), 15)} |",
    "| {padEnd(\"Unique Words\", 20)} | {padEnd(toString(r.unique_words), 15)} |",
    "| {padEnd(\"Characters\", 20)} | {padEnd(toString(r.char_count), 15)} |",
    "| {padEnd(\"Lines\", 20)} | {padEnd(toString(r.line_count), 15)} |",
    "| {padEnd(\"Avg Word Length\", 20)} | {padEnd(toString(toInt(r.avg_word_length * 10) / 10), 15)} |",
    "| {padEnd(\"Longest Word\", 20)} | {padEnd(r.longest_word, 15)} |",
    "| {padEnd(\"Reading Time\", 20)} | {padEnd(\"{toInt(r.reading_time_minutes * 10) / 10} min\", 15)} |"
  ]

  join(rows, "\n")
}

fn format_json(r) {
  common_items = r.most_common
    |> map(fn(e) "    { \"word\": \"{e.word}\", \"count\": {e.count} }")
    |> join(",\n")

  lines = [
    "{",
    "  \"word_count\": {r.word_count},",
    "  \"unique_words\": {r.unique_words},",
    "  \"char_count\": {r.char_count},",
    "  \"line_count\": {r.line_count},",
    "  \"avg_word_length\": {toInt(r.avg_word_length * 100) / 100},",
    "  \"longest_word\": \"{r.longest_word}\",",
    "  \"reading_time_minutes\": {toInt(r.reading_time_minutes * 10) / 10},",
    "  \"most_common\": [",
    common_items,
    "  ]",
    "}"
  ]

  join(lines, "\n")
}

The top-level format_result dispatches on the ADT variant — clean, exhaustive, impossible to miss a case. Each formatter builds strings using interpolation and join.

Concepts used: Pattern matching (Chapter 5), ADTs (Chapter 6), Strings (Chapter 4), Pipes (Chapter 8)

Step 5: Putting It Together

tova
// main.tova
sample_text = "Tova is a modern programming language designed for clarity and performance. Tova compiles to JavaScript but generates optimized code that beats Go on many benchmarks. The language features pattern matching, algebraic data types, and a powerful pipe operator."

result = analyze(sample_text)

// Output in all formats
print(format_result(result, Plain))
print("")
print(format_result(result, Tabular))
Try "Text Analyzer" in Playground

What This Project Demonstrates

ConceptWhere It's Used
Immutable values (Ch 1)Every intermediate value is immutable
Expressions (Ch 1)if blocks compute avg_word_length
Functions (Ch 2)Small, focused, composable functions
Lambdas (Ch 2)Used in every map, filter, sortBy call
Collections (Ch 3)Arrays, objects, iteration, aggregation
Strings (Ch 4)Interpolation, join, split, replace, pad_end
Pattern matching (Ch 5)format_result dispatches on OutputFormat variant
Types (Ch 6)AnalysisResult record, OutputFormat ADT
Error handling (Ch 7)Guards for empty text edge cases
Pipes (Ch 8)extract_words, top_words, format_json
Modules (Ch 9)Logical separation into types/text/analysis/formatter

Extensions

Now that you have the core project, here are ways to extend it:

Add Error Handling with Result

tova
fn analyze_safe(text) {
  if len(trim(text)) == 0 {
    Err("Cannot analyze empty text")
  } else {
    Ok(analyze(text))
  }
}

Add More Statistics

  • Sentence count (split on .!?)
  • Paragraph count (split on double newlines)
  • Vocabulary richness (unique words / total words)
  • Flesch-Kincaid readability score
  • Syllable counting

Add File I/O

tova
async fn analyze_file(path) {
  content = await read_file(path)
  match content {
    Ok(text) => Ok(analyze(text))
    Err(msg) => Err("Could not read {path}: {msg}")
  }
}

Make It a CLI Tool

tova
cli {
  name: "textstat"
  version: "1.0.0"
  description: "Analyze text files"

  fn analyze(file: String, --format: String = "plain", --top: Int = 10) {
    // Read file, analyze, format output
  }
}

Final Exercises

Exercise 12.1: Add a Markdown variant to OutputFormat that outputs the analysis as a Markdown document with headers, a table for metrics, and a code block for the word frequency chart.

Exercise 12.2: Add a compare(text_a, text_b) function that analyzes both texts and produces a comparison report showing which text is longer, more varied, more complex, etc.

Exercise 12.3: Add a readability_score(text) function that estimates reading difficulty. Use average sentence length and average word length as factors.

What's Next?

Congratulations — you've mastered the Tova language. You can:

  • Write clean, expression-based code
  • Build composable functions and closures
  • Process collections with pipes and transformations
  • Model domains with types and pattern matching
  • Handle errors explicitly with Result and Option
  • Organize code into modules
  • Work with async operations
  • Optimize with @fast and @wasm

From here, explore:

The best way to solidify your skills is to build something real. Pick a project that excites you and start writing Tova.


← Previous: Performance Secrets | Next: Functional Programming →

Released under the MIT License.