There's a whole world of tools to launch local LLMs out there, and these are some of the best.
llama.cpp ' that can run AI models locally now supports image input. You can input images and text at the same time to have the machine answer questions such as 'What is in this image?' server : ...