A demo of zero-shot image classification and detection service using OpenAI's ViT-CLIP model through Hugging Face's inference engine, with Pinecone vector database integration for efficient similarity search.
The advantages of zero shot image classification are that the model doesn't need retraining to deal with new images.
Note: This is for demonstration purposes only. The code can be found here
Creates vector embeddings for images and their associated text labels, storing them in Pinecone.
POST /image/embed
Content-Type: multipart/form-data
image: <image_file>
text: <text_description>
{
"status": "success",
"id": "<generated_id>"
}
Performs zero-shot object detection on images by comparing them against stored embeddings.
POST /image/detect
Content-Type: multipart/form-data
image: <image_file>
{
"score": <similarity_score>,
"id": "<vector_id>",
"label": "<matched_label>"
}
curl -X POST https://nesasia.io/image/embed \
-H "Content-Type: multipart/form-data" \
-F "image=@/path/to/your/image.jpg" \
-F "text=description of the image"
Example Response:
{
"status": "success",
"id": "generated-id"
}
curl -X POST https://nesasia.io/image/detect \
-H "Content-Type: multipart/form-data" \
-F "image=@/path/to/your/image.jpg"
Example Response:
{
"found": true,
"score": 0.85,
"label": "matched label"
}
For more information about zero-shot image classification using CLIP: Zero-Shot Image Classification with CLIP
See handler.py for the huggingface custom handler.
For details on creating a custom handler for the inference endpoint: HuggingFace create custom handler.
Please direct any questions to [email protected]