In recent years, large language models (LLMs) have made significant progress in generating human-like text, translating ...
A Rust, Python and gRPC server for text generation inference. Used in production at Hugging Face to power Hugging Chat, the Inference API and Inference Endpoint.