Passer au contenu principal

Performance of LLM backends and models in Curnagl

TODO

  • Introduction (Cristian)
  • Backends and models tested (Margot)
  • Hardware description (Margot)
  • Inference latency results (Margot and Cristian) -> create one table per model and replace nodes names by GPU card name, we can also improve column titles.

Introduction

 

Backends and models tested

 

Hardware description

 

Inference latency results