End-to-End Latency Decomposition in AI Web Applications: Rethinking Infrastructure in LLM-Based Systems

Authors

DOI:

https://doi.org/10.7251/

Keywords:

AI web applications; latency decomposition; large language models (LLM): serverless computing; virtual private server (VPS); end-to-end latency; performance evaluation; cloud computing; AI systems; benchmarking

Abstract

The increasing integration of artificial intelligence into web applications, particularly through large language models (LLMs), has fundamentally reshaped the performance characteristics of modern systems. Unlike traditional architectures, where latency is primarily determined by backend infrastructure, AI-driven applications operate as multi-stage pipelines involving orchestration logic, network communication, and external model inference.

This paper introduces an end-to-end latency decomposition framework for analyzing performance in AI-powered web applications. A controlled experimental study is conducted using two production-equivalent implementations deployed in serverless and virtual private server (VPS) environments. The methodology distinguishes between full-stack execution, including LLM inference, and infrastructure-only scenarios, enabling precise isolation of latency contributions across infrastructure, application, and model layers.

The results indicate that in full-stack scenarios, model-related latency dominates system performance, accounting for approximately 85% of total response time, thereby minimizing the impact of infrastructure differences. In contrast, infrastructure-only scenarios reveal significant performance variations between deployment environments.

These findings challenge infrastructure-centric optimization approaches and demonstrate the need for system-level performance evaluation in LLM-based applications. The proposed framework provides a practical methodology for identifying performance bottlenecks and offers actionable insights for optimizing AI-driven web systems.

Downloads

Published

2026-06-29

Issue

Section

Original Research Papers