Beyond the Cloud: On-premise Orchestration for Open-source LLMs

Experience Level: intermediate
Language: english

Introducing different options for running Open Source LLMs, motivations and reasons on why doing it in-house instead of using closed source third-party APIs, in which escenarios make sense, and presenting two different examples: how to execute LLM inference from a web application, and from a Jupyter Notebook. In both cases leveraging any GPU enabled machine you have access, through the usage of oshepherd (https://github.com/mnemonica-ai/oshepherd), Ollama, and a Redis server.


Raul Pino

Computer Engineer with +10 years in software development, working on e-commerce, biotech, fintech, and AI. Nowadays, Product Labs Dev at Halborn, and previously Staff Engineer at Distro (YC 2024). My last outstanding project was a capstone report for the “Machine Learning Engineer Nanodegree” program at Udacity: “Ensemble of Generative Adversarial Networks as a Data Augmentation Technique for Alzheimer research” which I presented at PyCon Bolivia 2022, and PyCon Italia 2023.

raul-pino