Document Summarization

Build end-to-end document summarization pipelines with Jetty.

Overview

Learn how to:

Process documents via file upload
Extract text from PDFs and other formats
Generate summaries using LLMs
Handle long documents with chunking
Validate and transform outputs

Based on production workflows: gemini_chart2csv, sacred-translate-v1, my-new-task

Basic Summarization

Workflow: Simple Document Summary

{
  "init_params": {
    "instruction": "Provide a concise summary of this document in 3-5 bullet points."
  },
  "step_configs": {
    "summarize": {
      "model": "gemini-2.0-flash",
      "activity": "gemini_file_reader",
      "asset_path": "init_params.file_paths[0]",
      "prompt_path": "init_params.instruction"
    }
  },
  "steps": ["summarize"]
}

Running with File Upload

curl -X POST "https://flows-api.jetty.io/api/v1/run-sync/my-collection/doc-summary" \
  -H "Authorization: Bearer $JETTY_API_TOKEN" \
  -F "bakery_host=https://dock.jetty.io" \
  -F 'init_params={"instruction": "Summarize this document"}' \
  -F "files=@/path/to/document.pdf"

Chart and Table Extraction

Extract structured data from images and charts.

Workflow: Chart to JSON

Based on production workflow: gemini_chart2csv

{
  "init_params": {},
  "step_configs": {
    "extract": {
      "model": "gemini-2.0-flash",
      "activity": "gemini_file_reader",
      "asset_path": "init_params.file_paths[0]",
      "prompt": "Convert this chart to JSON output. Return only valid JSON in format: { 'data': [...] }. NO BACK TICKS!"
    },
    "notify": {
      "activity": "webhook_notify",
      "webhook_url": "https://your-webhook-endpoint.com/receive"
    }
  },
  "steps": ["extract", "notify"]
}

Translation Workflows

Translate documents while preserving structure.

Workflow: Document Translation with Validation

Based on production workflow: sacred-translate-v1

{
  "init_params": {
    "target_language": "french"
  },
  "step_configs": {
    "translate": {
      "model": "gemini-2.5-pro",
      "activity": "gemini_file_reader",
      "asset_path": "init_params.file_paths[0]",
      "prompt": "Translate this document to the target language. Preserve all formatting."
    },
    "save_translation": {
      "activity": "save_text_file",
      "file_text_path": "translate.outputs.text"
    },
    "read_original": {
      "activity": "read_text_file",
      "text_path": "init_params.file_paths[0]"
    },
    "combine_for_validation": {
      "activity": "text_concatenate",
      "text_paths": [
        "translate.outputs.text",
        "read_original.outputs.text"
      ]
    },
    "validate": {
      "model": "gemini-2.5-pro",
      "activity": "gemini_text_reader",
      "text_path": "combine_for_validation.outputs.json",
      "prompt": "Compare the translation with the original. List any missing or incorrectly translated sections."
    },
    "save_report": {
      "activity": "save_text_file",
      "file_text_path": "validate.outputs.text"
    }
  },
  "steps": ["translate", "save_translation", "read_original", "combine_for_validation", "validate", "save_report"]
}

Multi-Step Processing

Chain multiple operations on documents.

Workflow: PDF to Summary to Doubled Text

Based on production workflow: my-new-task

{
  "init_params": {},
  "step_configs": {
    "read_pdf": {
      "model": "gemini-2.0-flash",
      "activity": "gemini_file_reader",
      "asset_path": "init_params.file_paths[0]",
      "prompt": "Summarize this PDF into 2 paragraphs or less"
    },
    "process": {
      "activity": "text_doubler",
      "text_path": "read_pdf.outputs.text"
    }
  },
  "steps": ["read_pdf", "process"]
}

Text Concatenation

Combine multiple text sources into one.

Workflow: Merge Multiple Inputs

Based on production workflow: merge

{
  "init_params": {
    "header": "Document Analysis Report",
    "footer": "Generated by Jetty"
  },
  "step_configs": {
    "process_doc": {
      "model": "gemini-2.0-flash",
      "activity": "gemini_file_reader",
      "asset_path": "init_params.file_paths[0]",
      "prompt": "Extract key findings from this document"
    },
    "combine": {
      "activity": "text_concatenate",
      "text_paths": [
        "init_params.header",
        "process_doc.outputs.text",
        "init_params.footer"
      ]
    }
  },
  "steps": ["process_doc", "combine"]
}

text_concatenate Reference

{
  "combine": {
    "activity": "text_concatenate",
    "text_paths": [
      "step1.outputs.text",
      "init_params.static_text",
      "step2.outputs.content"
    ]
  }
}

Output:

{
  "outputs": {
    "text": "combined text from all paths",
    "json": "combined text from all paths"
  }
}

Gemini Model Options

Model	Best For	Speed
`gemini-2.0-flash`	General documents, fast processing	Fast
`gemini-2.5-pro`	Complex documents, accuracy-critical	Medium
`gemini-1.5-flash`	Large context windows	Fast

Document Summarization

Overview

Basic Summarization

Workflow: Simple Document Summary

Running with File Upload

Chart and Table Extraction

Workflow: Chart to JSON

Translation Workflows

Workflow: Document Translation with Validation

Multi-Step Processing

Workflow: PDF to Summary to Doubled Text

Text Concatenation

Workflow: Merge Multiple Inputs

text_concatenate Reference

Gemini Model Options

Best Practices

1. Choose the Right Model

2. Handle Large Documents

3. Validate Outputs

4. Save Intermediate Results

Next Steps

Overview​

Basic Summarization​

Workflow: Simple Document Summary​

Running with File Upload​

Chart and Table Extraction​

Workflow: Chart to JSON​

Translation Workflows​

Workflow: Document Translation with Validation​

Multi-Step Processing​

Workflow: PDF to Summary to Doubled Text​

Text Concatenation​

Workflow: Merge Multiple Inputs​

text_concatenate Reference​

Gemini Model Options​

Best Practices​

1. Choose the Right Model​

2. Handle Large Documents​

3. Validate Outputs​

4. Save Intermediate Results​

Next Steps​

Overview

Basic Summarization

Workflow: Simple Document Summary

Running with File Upload

Chart and Table Extraction

Workflow: Chart to JSON

Translation Workflows

Workflow: Document Translation with Validation

Multi-Step Processing

Workflow: PDF to Summary to Doubled Text

Text Concatenation

Workflow: Merge Multiple Inputs

text_concatenate Reference

Gemini Model Options

Best Practices

1. Choose the Right Model

2. Handle Large Documents

3. Validate Outputs

4. Save Intermediate Results

Next Steps