Debugging a flow

When you run a flow, you can monitor execution at both the flow level and the node level to identify and resolve issues.

Monitoring flow execution

The flow canvas provides a visual representation of the pipeline and its execution status:

Use the top bar to review:

  • Runtime - Indicates which execution engine is running your flow, Python or Spark.
  • Elapsed time - Displays the duration the flow has been running.
  • Nodes - Gives a summary of the number of nodes that are:
    • Completed
    • Failed
    • Completed with warnings
  • Documents - Tracks processing of individual documents, listing how many are:
    • Read
    • Skipped
    • Failed

Inspecting node details

Each node in the canvas displays a status icon that indicates its current state. Use these indicators to quickly identify where issues occur. Click a node in the flow to open its details panel.

Log details

Use the Log details tab to review execution logs for the node. The logs include:

  • OrchestratorType: PYTHON or SPARK
  • Step ID: A unique identifier (Node ID)
  • Starting execution message with the operator name.
  • Completion message with time taken (e.g., "time= 3.87 seconds").
  • Schema: Lists all column names and data types.
  • Operator Metadata: A dictionary containing node metadata.

Node summary

Use the Node summary tab to review node-level metrics and results, including:

  • Node status:
    • Completed - Finished successfully
    • Failed - Encountered an error
    • Skipped - Was not executed
    • Running - Currently processing
    • Completed With Warnings - Finished with warnings
    • Completed With Errors - Finished, but some documents failed
    • Pending - Operator is waiting to start.
  • Documents in scope - Total documents available for this operator to process
  • Completed docs count - Number of documents successfully processed by this operator
  • Processed docs - Total documents processed (successful and failed)
  • Failed docs - Number of documents that failed in this operator
  • Skipped docs - Number of documents skipped
  • Page type stats - Number of pages processed, by format
  • Total pages converted - Total number of pages processed
  • Total conversion time in seconds

To inspect intermediate output, click View table in the Node output section.

The table preview feature might impact performance. To disable it:

  1. In the flow canvas, click the Flow properties icon on the toolbar.
  2. Clear Enable node output preview for the flow.
  3. Click Save.