Object Detection Zero-Shot

A demo of zero-shot image classification and detection service using OpenAI's ViT-CLIP model through Hugging Face's inference engine, with Pinecone vector database integration for efficient similarity search.

The advantages of zero shot image classification are that the model doesn't need retraining to deal with new images.

Note: This is for demonstration purposes only. The code can be found here

API Endpoints

1. Image Embedding (/image/embed)

Creates vector embeddings for images and their associated text labels, storing them in Pinecone.

Request Format:

POST /image/embed
Content-Type: multipart/form-data
image: <image_file>
text: <text_description>

Response:

{
    "status": "success",
    "id": "<generated_id>"
}

Features:

2. Image Detection (/image/detect)

Performs zero-shot object detection on images by comparing them against stored embeddings.

Request Format:

POST /image/detect
Content-Type: multipart/form-data
image: <image_file>

Response:

{
    "score": <similarity_score>,
    "id": "<vector_id>",
    "label": "<matched_label>"
}

Features:

Curl Examples

1. Image Embed Endpoint

curl -X POST https://nesasia.io/image/embed \
-H "Content-Type: multipart/form-data" \
-F "image=@/path/to/your/image.jpg" \
-F "text=description of the image"

Example Response:

{
    "status": "success",
    "id": "generated-id"
}

2. Image Detection Endpoint

curl -X POST https://nesasia.io/image/detect \
-H "Content-Type: multipart/form-data" \
-F "image=@/path/to/your/image.jpg"

Example Response:

{
    "found": true,
    "score": 0.85,
    "label": "matched label"
}

Further Reading

For more information about zero-shot image classification using CLIP: Zero-Shot Image Classification with CLIP

See handler.py for the huggingface custom handler.

For details on creating a custom handler for the inference endpoint: HuggingFace create custom handler.

Questions

Please direct any questions to [email protected]