Scale-to-zero LLM inference: Cost-efficient open model deployment on serverless GPUs Similarity score = 0.68 More