Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Docker-Based Unstructured API Setup Causes 422 Errors in Flowise Document Loader #4062

Open
xor83 opened this issue Feb 22, 2025 · 0 comments

Comments

@xor83
Copy link

xor83 commented Feb 22, 2025

I'm encountering a persistent issue with Flowise’s document loader when using our new Unstructured API server (running in Docker on Ubuntu 24). Although the Unstructured API endpoint works perfectly via Postman at https://unstructured.example.com/general/v0/general, Flowise fails to load files and returns 422 errors.

Environment Details:

Operating System: Ubuntu 24
Node.js Version: v20.18.2
(Note: In earlier tests, we ran Flowise directly via npm and everything worked; the issue appears with both npm and Docker-based setups for the Unstructured API.)
Flowise Setup:
    Running via Docker (using the flowiseai/flowise image) with a Docker Compose file
    Exposed on port 3010
Unstructured API Setup:
    Running via Docker using the image downloads.unstructured.io/unstructured-io/unstructured-api:latest
    Exposed on port 8000
Other Services:
    MySQL (v8.0) and phpMyAdmin are running alongside Flowise in the same Docker Compose setup

Configuration Details:

In our Docker Compose file, Flowise and the Unstructured API share the same network. Thus, Flowise’s document loader should call the API at:

http://unstructured-api:8000/general/v0/general

The S3 loader settings in Flowise (stored in its database) include parameters such as:

bucketName: "klaspadai"
keyName: "1732025804631-sample.pdf"
chunkingStrategy: "by_title"
encoding: "utf-8"
hiResModelName: "detectron2_onnx"
skipInferTableTypes: ["pdf", "jpg", "png"]
Critical: The field unstructuredAPIUrl is sometimes empty or set to:
    http://unstructured-api:8000
    http://unstructured-api:8000/general/v0/general

We have attempted updating this field explicitly to:

http://unstructured-api:8000/general/v0/general

The volume mapping for Flowise is set to map /home/developer/.flowise on the host to /root/.flowise in the container.
(Note: We updated our environment variables to use the container path /root/.flowise.)

Error Messages:

Flowise Log:

Failed to preview chunks: Error: documentStoreServices.previewChunksMiddleware - Error: documentStoreServices.previewChunks - Error: documentStoreServices.splitIntoChunks - Failed to load file /tmp/s3fileloader-bbdGLE/1732025804631-sample.pdf using unstructured loader.

Flowise Startup Error (from npm log):

Error: ENOENT: no such file or directory, mkdir '/home/developer/.flowise/logs'

Unstructured API Log:

405 Method Not Allowed (for a GET request to /general/v0/general)
422 Unprocessable Entity on a POST request to /general/v0/general

Troubleshooting Steps Taken:

Using Postman:
    Verified that https://unstructured.example.com/general/v0/general works as expected.
Configuration Changes:
    Updated Flowise’s loader configuration to set unstructuredAPIUrl explicitly to:

http://unstructured-api:8000/general/v0/general

Volume & Permissions:

Confirmed that the host folder /home/developer/.flowise exists and is properly mapped to /root/.flowise inside the Flowise container.
Adjusted environment variables to use container paths (e.g., /root/.flowise/logs) to avoid ENOENT errors.

Testing with Different Setups:

Running Flowise via npm (outside Docker) works fine.
The issue appears with Docker-based setups (both with npm and the Docker image of the Unstructured API).

Connectivity Checks:

From inside the Flowise container, running:

    docker exec -it flowise sh
    curl http://unstructured-api:8000/general/v0/general

    confirms that the Unstructured API is reachable.
Error Consistency:
    Despite setting the unstructuredAPIUrl in Flowise’s configuration, the S3 loader still fails to load the file from the temporary directory (e.g., /tmp/s3fileloader-bbdGLE/1732025804631-sample.pdf).

Questions:

Is there any known incompatibility between Flowise’s S3 loader and the new Docker-based Unstructured API setup?
Could the 422 errors be caused by a mismatch in the payload or headers when Flowise calls the new API?
Are there any additional configuration parameters or updates required on Flowise’s end to work with this new server setup?
Is it expected that the Unstructured API URL should be set to the full path (http://unstructured-api:8000/general/v0/general), or is there another preferred endpoint?

Any insights or suggestions to help resolve this issue would be greatly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant