My take on the Agentic Object Detection
Here are the steps: Segmenting Everything with SAM : We detect everything and worry about filtering later. Filtering with CLIP Adding Reasoning with a model like GPT-4o Here is what I did with SAM and clip, we now need to use a good LLM on top and add some reasoning.. code: https://github.com/maylad31/agentic-object-detection

Here are the steps:
Segmenting Everything with SAM : We detect everything and worry about filtering later.
Filtering with CLIP
Adding Reasoning with a model like GPT-4o
Here is what I did with SAM and clip, we now need to use a good LLM on top and add some reasoning..