Please enable JavaScript.

Coggle requires JavaScript to display documents.

Ways to access client's LLM, Telemetry Data - Coggle Diagram

- - - - Workflow: prompt.send → inference.call → response.receive
  - - - llm.prompt.send
        
        llm.prompt.content
        
        llm.model.name
        
        llm.temperature
      - llm.response.receive
        
        llm.response.length (characters/tokens)
        
        llm.status (success/error)
      - llm.inference.call (moment when the app actually talks to the LLM API)
        
        http.url (the address called)
        
        network.latency.ms
        
        http.method (GET/POST)
    - - Conversation_Start
      - PII_Detected
      - Injection_Suspected
    - - llm.requests.total (Counter)
      - llm.request.duration.ms (tell if the request speed is too fast)
      - llm.tokens.inbound / outbound: How many tokens were sent vs generated
      - llm.errors.total (timeouts, API errors, validation failures)
      - host.cpu.utilization
- - - - Workflow: client → proxy.receive → proxy.forward.to.LLM → proxy.send.response → client
  - - - proxy.request.receive
        
        http.method (GET/POST)
        
        http.url (the endpoint requested)
        
        client.ip (where the request came from)
      - proxy.upstream.call.llm
        
        upstream.service (LLM provider like openai)
        
        network.latency.ms (time taken to reach and return)
      - proxy.response.send
        
        http.status_code
        
        200 (success)
        
        401 (unauthorized)
        
        500 (server error)
        
        response.size.bytes
    - - trace ID
      - Auth_Failure: if proxy blocks a request due to auth failure
    - - proxy.requests.total
      - proxy.request.duration.ms
      - proxy.upstream.duration.ms
      - proxy.http.status.count (Counter per status code): helps to detect reliability/ security issues
- - - - Workflow: Client sends request → request queuing → Model Inference / Token Generation → Response Assembly → Response Sent Back
  - - - server.token.generate
        
        tokens.generated
        
        model.version
        
        batch.size
      - server.queue.wait
      - server.response.build
        
        response.size (number of tokens/characters)
    - - Server_Start: Useful for uptime monitoring and detecting restarts
      - Queue_Overflow: Helps detect performance bottleneck
    - - tokens.generated.total
      - token.generation.duration.ms
      - queue.length
      - host.gpu.utilization
      - host.cpu.utilization
      - server.batch.size
- - - - user_id
      - trace_id: Unique identifier for each request
      - llm_user_device_type
      - telemetry.ingest_latency_ms (int)
      - request.platform_type: "web" | "mobile" | "api"
      - llm.model.name: "my-llm"
      - device.os: "Windows" | "iOS" | "Android" | "Linux"
    - - llm.request.id
      - request.content_length
      - request_count_per_hour
      - rate_limit_status: "ok" | "throttled" | "blocked"
      - lm.request.inter_arrival_ms (int)
    - - auth.attempt_timestamp
      - auth.token_status (enum: valid/expired/revoked)
      - geolocation
    - - llm.prompt.content
      - llm.model.name
    - - http.url (the address called)
      - http.method (GET/POST)
      - network.latency.ms
    - - llm.response.length (characters/tokens)
      - llm.status (success/error)
      - http.status_code
        
        200 (success)
        
        401 (unauthorized)
        
        500 (server error)
  - - - auth_attempt_count
        
        Labels
        
        user_id
        
        auth.token_status (valid|expired|revoked)
        
        requests.platform_type
      - auth_failure_count
        
        EVENTS
        
        auth_failure
        
        missing_credentials
        
        invalid_token
        
        expired_token
        
        auth_success
      - auth_latency_ms
    - - request_count
        
        Labels
        
        user_id
        
        device.type
        
        geolocation
        
        llm.model.name
      - request_rate_per_min (gauge)
        
        Labels: user_id, ip (rolling window)
      - EVENTS
        
        request_rate_limit_violation
        
        rate_limit_status
        
        Ok
        
        Throttled
        
        Blocked
        
        rate_limit_request_count
      - request_token_count
      - response_token_count
      - request_inter_arrival_ms =
        timestamp_now_ms - timestamp_previous_request_ms
    - - prompt_injection_score
    - - pii_detected_count
      - pii_detected_type