Please enable JavaScript.
Coggle requires JavaScript to display documents.
Ways to access client's LLM, Telemetry Data - Coggle Diagram
Ways to access client's LLM
Application-level SDK
Open Telemetry Integration technique:
Code-Based, using the OpenTelemetry API/SDK in the client application
Workflow: prompt.send → inference.call → response.receive
Captured Telemetry Signals:
Traces
llm.prompt.send
llm.prompt.content
llm.model.name
llm.temperature
llm.response.receive
llm.response.length (characters/tokens)
llm.status (success/error)
llm.inference.call (moment when the app actually talks to the LLM API)
http.url (the address called)
network.latency.ms
http.method (GET/POST)
Events
Conversation_Start
PII_Detected
Injection_Suspected
Metrics
llm.requests.total (Counter)
llm.request.duration.ms (tell if the request speed is too fast)
llm.tokens.inbound / outbound: How many tokens were sent vs generated
llm.errors.total (timeouts, API errors, validation failures)
host.cpu.utilization
OpenTelemetry Components Involved:
API Gateway / Proxy Instrumentation
Open Telemetry Integration technique:
Zero-Code via Proxy Plugins
Workflow: client → proxy.receive → proxy.forward.to.LLM → proxy.send.response → client
Captured Telemetry Signals:
Traces
proxy.request.receive
http.method (GET/POST)
http.url (the endpoint requested)
client.ip (where the request came from)
proxy.upstream.call.llm
upstream.service (LLM provider like openai)
network.latency.ms (time taken to reach and return)
proxy.response.send
http.status_code
200 (success)
401 (unauthorized)
500 (server error)
response.size.bytes
Events
trace ID
Auth_Failure: if proxy blocks a request due to auth failure
Metrics
proxy.requests.total
proxy.request.duration.ms
proxy.upstream.duration.ms
proxy.http.status.count (Counter per status code): helps to detect reliability/ security issues
OpenTelemetry Components Involved:
Inference Server Instrumentation
Open Telemetry Integration technique:
Zero-Code via Built-In Telemetry Module
Workflow: Client sends request → request queuing → Model Inference / Token Generation → Response Assembly → Response Sent Back
Captured Telemetry Signals:
Traces
server.token.generate
tokens.generated
model.version
batch.size
server.queue.wait
server.response.build
response.size (number of tokens/characters)
Events
Server_Start: Useful for uptime monitoring and detecting restarts
Queue_Overflow: Helps detect performance bottleneck
Metrics
tokens.generated.total
token.generation.duration.ms
queue.length
host.gpu.utilization
host.cpu.utilization
server.batch.size
OpenTelemetry Components Involved:
Telemetry Data
Performance
llm.session.start
user_id
start_timestamp
session_id
platform_type
llm.request.enqueue
queue_depth
queue_position
llm.request.dispatch
dispatch_timestamp
dispatch_latency_ms
trace.end_to_end_latency
start_timestamp
end_timestamp
total_latency_ms
llm.trace.tokens_per_second
inference_id
tokens_generated
generation_duration_ms
tokens_per_second
llm.inference.start
compute_resource (CPU|GPU)
memory_allocated_bytes
start_timestamp
model_version
llm.inference.execution
execution_duration_ms
compute_utilization_percentage
memory_peak_usage_bytes
Security
TRACES
llm.session.initiate
user_id
trace_id: Unique identifier for each request
llm_user_device_type
telemetry.ingest_latency_ms (int)
request.platform_type: "web" | "mobile" | "api"
llm.model.name: "my-llm"
device.os: "Windows" | "iOS" | "Android" | "Linux"
llm.request.analyze
llm.request.id
request.content_length
request_count_per_hour
rate_limit_status: "ok" | "throttled" | "blocked"
lm.request.inter_arrival_ms (int)
llm.user.authenticate
auth.attempt_timestamp
auth.token_status (enum: valid/expired/revoked)
geolocation
llm.prompt.send
llm.prompt.content
llm.model.name
llm.inference.call
http.url (the address called)
http.method (GET/POST)
network.latency.ms
llm.response.receive
llm.response.length (characters/tokens)
llm.status (success/error)
http.status_code
200 (success)
401 (unauthorized)
500 (server error)
METRICS
auth.attempts
auth_attempt_count
Labels
user_id
auth.token_status (valid|expired|revoked)
requests.platform_type
auth_failure_count
EVENTS
auth_failure
missing_credentials
invalid_token
expired_token
auth_success
auth_latency_ms
request.analyze
request_count
Labels
user_id
device.type
geolocation
llm.model.name
request_rate_per_min (gauge)
Labels: user_id, ip (rolling window)
EVENTS
request_rate_limit_violation
rate_limit_status
Ok
Throttled
Blocked
rate_limit_request_count
request_token_count
response_token_count
request_inter_arrival_ms =
timestamp_now_ms - timestamp_previous_request_ms
prompt_injection.check
prompt_injection_score
PII.check
pii_detected_count
pii_detected_type