LLMOps 1

Metrics to understand for LLM production

Time for computation while inference
Loading model into memory
An breakeven exists between the batch size we choose to process in terms of inference and loading model.
below this breakeven the latency is affected by loading of the model
Above this breakeven the latency is affected by computation of the tokens
Making decision on the batch size is important

References -