Have been reading this whole time Frangipani.. appreciate to decent research from those that take their time to weed out the trash.
I put together Neurocam this weekend:
Neurocam is a cutting-edge surveillance system for the Raspberry Pi 4 powered by the Akida neuromorphic processor with ChatGPT 4 frame analysis - stdp/neurocam
github.com
The app uses visual wake words for low power frame analysis. When it detects a person. It switches to yolo for object detection and tracking.
When it detects a person it also uses chatgpt4 to analyse the frame in a security context to determine what is going on and any behaviours or physical characteristics of the persons detected.
This should give you a good understanding of what will be possible once language models are running on chip, eg. A smart camera sensor
For fun, the app also controls a set of RGB LEDs to trigger them to flash when security is set to high.
Hopefully if a few if you are sitting on devices you should be able to get this up and running in no time.