Easily extendable API for prototyping with ML models to analyze and generate content from images.
Note: This project is under active development. Expect breaking changes. Use at your own risk.
Captylize is a simple API designed to facilitate easy prototyping of Hugging Face models and other image analysis models. It provides a straightforward interface for analyzing images with the goal of:
- Analyzing images (Using classification for e.g. age, emotion, nsfw)
- Generating captions for images (for tagging datasets or building prompts)
- Detecting objects and faces in images (coming soon™)
- Image captioning using VIT-GPT2 and Florence-2 models
- Age estimation
- Emotion detection
- NSFW content detection
- Support for both image URL and file upload inputs
- Easy-to-use REST API endpoints
For information about the models used, see the MODELS.md file.
-
Implement basic image captioning with VIT-GPT2 model
-
Add support for advanced captioning with Florence-2 model
-
Implement age estimation endpoint
-
Implement emotion detection endpoint
-
Implement NSFW content detection endpoint
-
Support both image URL and file upload inputs
-
Set up FastAPI framework with proper routing and dependency injection
-
Implement model manager for easy model loading and unloading
-
Create basic error handling and input validation
-
Add /detection endpoint for object & face detection
-
Remove confusion about the 'default models' and centralize configuration of these so it propagates to the /docs page
-
Add a /models endpoint that returns all models, loaded models, (tasks, general config?)
-
Add a /health endpoint to gain an overview of the service and resource usage
-
Add support for batched inputs
-
Add a proper Prediction response object (metrics, batched predictions- what else?)
-
Figure out an easier way to add new models
-
Docker support
-
Don't forget to have fun!
It is recommended to use a virtual environment to install the project dependencies.
First create and activate a virtual environment with python 3.11 or later.
Then install PyTorch 2 or later (project is developed using PyTorch 2.4.1)
Use the instructions on the PyTorch website to install the CPU or GPU version.
Then install the project dependencies:
With pip:
pip install -r requirements.txt
With poetry:
poetry install
To run the API locally (in development mode)
Execute the run_dev.sh
script.
Or the command:
uvicorn captylize.main:app --reload
Basic usage examples:
-
Image Captioning:
POST /api/v1/generations/captions/vit POST /api/v1/generations/captions/florence-2
-
Age Estimation:
POST /api/v1/analyses/ages
-
Emotion Detection:
POST /api/v1/analyses/emotions
-
NSFW Detection:
POST /api/v1/analyses/nsfw
-
Object Detection (coming soon™):
POST /api/v1/detections/objects
-
Face Detection (coming soon™):
POST /api/v1/detections/faces
For detailed API documentation, run the server and visit /docs
or /redoc
.
This project and code within is licensed under the MIT License. Models referenced in this project are licensed under their respective licenses.