Skip to content

Sync Jobs & Data Status

Path: /dashboard/jobs · Permission: Sync Jobs module

Sync Jobs Monitoring Sync Jobs Monitoring

Sync Jobs track every background task that updates your knowledge base — file uploads, URL crawls, and FAQ indexing. Use this page to monitor progress, identify failures, and retry problematic jobs.


How sync jobs are created

You never need to create sync jobs manually. They are generated automatically whenever you:

  • Upload a file in Knowledge Base → Files
  • Add a URL in Knowledge Base → URLs
  • Add or update an FAQ in Knowledge Base → FAQs
  • Click Refresh Knowledge Base to trigger a full re-index
  • Re-crawl a URL to update content from a website that has changed

Each action creates one or more jobs that appear on this page immediately.


Job list columns

Column What it shows
Type The kind of job — file upload, URL crawl, FAQ sync, or full re-index
Source The file name or URL being processed — click to see the full path
Status Current state: Pending, Processing, Completed, or Error (color-coded)
Duration How long the job took from start to finish
Created When the job was first queued
Updated Last time the status changed

Job statuses explained

Status Color What it means What to do
Pending Gray In queue, waiting for a worker to pick it up Nothing — it will start automatically
Processing Blue/Amber Actively extracting text, chunking content, and creating vector embeddings Wait — progress is happening
Completed Green Content is fully indexed and searchable by the bot Nothing — the bot can now use this content
Error Red Something went wrong during processing Read the error message, fix the cause, and retry

What happens during processing

When a sync job runs, the system performs these steps:

  1. Extraction — Text is extracted from the source (PDF parsing, HTML stripping, FAQ formatting).
  2. Chunking — Long content is split into smaller, overlapping segments for better search accuracy.
  3. Embedding — Each chunk is converted into a numerical vector using the embedding model.
  4. Indexing — Vectors are stored in the search engine so the bot can retrieve relevant chunks when users ask questions.

This entire process typically takes 10–60 seconds for a single file, and several minutes for large sitemaps or bulk uploads.


Common error types and fixes

Error Likely cause Fix
Unsupported format File type not recognized Convert to PDF, DOCX, TXT, CSV, or Markdown and re-upload
Password protected PDF or DOCX is encrypted Remove the password protection and re-upload
Image-only PDF Scanned document without selectable text Use OCR software to convert to a text-based PDF first
URL blocked / 403 Website blocks automated crawlers Try a different URL, use a sitemap, or upload the page content as a file
URL timeout Page took too long to respond The site may be slow — try again later, or upload content manually
Empty content URL returned no readable text The page may be JavaScript-heavy (SPA). Upload the content as a file instead
Connection error The URL is unreachable Check that the URL is correct and publicly accessible

To retry a failed job, click the retry button on the job row. The system re-queues it with the same source.


Supported file formats

Format Extension Notes
PDF .pdf Must contain selectable text (not scanned images)
Word .docx Microsoft Word documents
Plain text .txt Simple text files
CSV .csv Comma-separated data — great for product lists and pricing
JSON .json Structured data — useful for API docs
Markdown .md Technical documentation and README files
PowerPoint .pptx Text content from presentation slides

Performance tips

  • Large PDFs and sitemaps take longer — A 100-page PDF or a sitemap with 500+ URLs may take several minutes. This is normal.
  • Stagger large uploads — Uploading 50 files at once creates a long queue. Upload in batches of 5–10 for faster feedback.
  • Remove outdated content — Old documents increase index size and can cause the bot to retrieve irrelevant information. Delete files you no longer need.
  • Monitor after bulk changes — After uploading many files or refreshing the knowledge base, check Sync Jobs to confirm everything completed successfully before testing the bot.
  • Use sitemaps for websites — Instead of adding 50 individual URLs, provide the site's sitemap.xml and let the system discover all pages automatically.

FAQ

Q: A job is stuck in "Processing" for a long time.

  • Large files or sitemaps with many pages can legitimately take several minutes. If a job has been stuck for more than 15 minutes, try retrying it. If it keeps failing, contact your account manager.

Q: A URL shows "Error" repeatedly.

  • The website likely blocks automated crawlers, requires authentication, or returns empty content. Try: a different page URL, providing a sitemap URL instead, or uploading the page content as a file.

Q: Do I need to manually trigger sync jobs?

  • No. Sync jobs are created automatically when you upload files, add URLs, or create FAQs. The Refresh Knowledge Base button in the Knowledge Base page triggers a full re-index of all sources.

Q: What's the difference between a sync job and a re-index?

  • A sync job processes a single new or updated source (one file, one URL, one FAQ). A re-index (triggered by Refresh Knowledge Base) rebuilds the entire vector index from all current sources. Re-indexing is useful after deleting content or when you want to ensure everything is fresh.

Q: Are sync jobs per-bot?

  • Yes. Each bot's sync jobs are independent. The jobs shown are for the currently selected bot only.