Beyond the Cloud: On-premise Orchestration for Open-source LLMs
Serving Ollama models, on-premise from any machine in your company’s datacenter or even from the PC you use for playing video games, and orchestrating inference, as easy as using Redis, official ollama-python/ollama-js packages, and oshepherd, in your web application or your jupyter notebook.
- Timeslot: Sunday 6th April 2025, 10:00-11:00, Room B
- Tags: AI
Introducing different options for running Open Source LLMs, motivations and reasons on why doing it in-house instead of using closed source third-party APIs, in which escenarios make sense, and presenting two different examples: how to execute LLM inference from a web application, and from a Jupyter Notebook. In both cases leveraging any GPU enabled machine you have access, through the usage of oshepherd (https://github.com/mnemonica-ai/oshepherd), Ollama, and a Redis server.
Computer Engineer with +10 years in software development, working on e-commerce, biotech, fintech, and AI. Nowadays, Product Labs Dev at Halborn, and previously Staff Engineer at Distro (YC 2024). My last outstanding project was a capstone report for the “Machine Learning Engineer Nanodegree” program at Udacity: “Ensemble of Generative Adversarial Networks as a Data Augmentation Technique for Alzheimer research” which I presented at PyCon Bolivia 2022, and PyCon Italia 2023.
