VLM driven tool that processes surveillance videos, extracts frames, and generates insightful annotations using a fine-tuned Florence-2 Vision-Language Model. Includes a Gradio-based interface for querying and analyzing video footage.
Terminal score: 0–1000 raw, weighted across 4 dimensions. Public score: 0–10 normalized (shown in the 30-day stars chart above).