Talks

Many companies are interested in running open large language models such as Gemma and Deepseek because it gives them full control over the deployment options, the timing of model upgrades, and the private data that goes into the model. Ollama is an open source LLM inference server. In this 15 minute demo, I'll show you how run Ollama cost-efficiently on serverless GPUs that scale up and down rapidly, including down to zero when there are no incoming requests
Wietse Venema
Google
Wietse Venema is an engineer at Google Cloud. He wrote the O’Reilly book on Cloud Run.