Metrics Reference
LogChef exposes comprehensive Prometheus metrics to monitor instance usage, performance, and health. These metrics provide visibility into query patterns, user activity, authentication events, and system resource utilization.
Metrics Endpoint
All metrics are available at the /metrics
endpoint in Prometheus format:
GET http://localhost:8125/metrics
Configure your Prometheus server to scrape this endpoint for monitoring and alerting.
Metric Categories
LogChef emits metrics across several categories:
- HTTP Metrics: Request/response patterns and performance
- Query Metrics: Database query execution and performance
- Authentication Metrics: Login attempts and session management
- Authorization Metrics: Access control failures and patterns
- ClickHouse Metrics: Database connection and operation health
- Team & Source Metrics: Multi-tenant usage patterns
- AI Metrics: AI-powered feature usage and performance
HTTP Metrics
Monitor web interface performance and usage patterns.
Metric | Description | Type | Labels |
---|---|---|---|
logchef_http_requests_total | Total HTTP requests | Counter | method , endpoint , status , user_email , user_role |
logchef_http_request_duration_seconds | HTTP request latency distribution | Histogram | method , endpoint |
logchef_http_response_size_bytes | HTTP response size distribution | Histogram | method , endpoint |
logchef_http_errors_total | HTTP errors by type | Counter | method , endpoint , error_type , status , user_email , user_role |
logchef_http_active_requests | Currently active HTTP requests | Gauge | - |
Key Use Cases:
- Identify slow endpoints with
logchef_http_request_duration_seconds
- Monitor error rates by endpoint and user
- Track API usage patterns across teams
- Set up alerts for high error rates or latency spikes
Query Metrics
Track database query execution and performance patterns.
Metric | Description | Type | Labels |
---|---|---|---|
logchef_query_total | Total queries executed | Counter | source_id , source_name , database , table , query_type , result , user_email , user_role |
logchef_query_duration_seconds | Query execution time distribution | Histogram | source_name , database , table |
logchef_query_rows_returned | Distribution of rows returned | Histogram | source_name , database , table |
logchef_query_timeouts_total | Query timeouts by source | Counter | source_id , source_name , database , table , query_type |
logchef_query_errors_total | Query errors by type | Counter | source_id , source_name , database , table , error_type |
Query Types: select
, show
, describe
, explain
, insert
, create
, drop
, alter
, other
Error Types: timeout
, connection
, syntax
, permission
, not_found
, other
Key Use Cases:
- Monitor query performance per source/table
- Identify users running expensive queries
- Track query patterns and popular sources
- Set up alerts for high query failure rates
Authentication Metrics
Monitor login attempts and session management.
Metric | Description | Type | Labels |
---|---|---|---|
logchef_auth_attempts_total | Authentication attempts | Counter | method , result , user_email , user_role |
logchef_session_operations_total | Session operations (create, validate, destroy) | Counter | operation , result , user_email , user_role |
logchef_api_token_operations_total | API token operations | Counter | operation , result , user_email , user_role , token_name |
logchef_active_sessions | Currently active sessions | Gauge | - |
logchef_active_users | Currently active users | Gauge | - |
Auth Methods: session
, token
Operations: create
, validate
, destroy
, renew
Key Use Cases:
- Monitor authentication failure rates
- Track session lifecycle and active users
- Identify suspicious login patterns
- Alert on failed authentication spikes
Authorization Metrics
Track access control failures and permission violations.
Metric | Description | Type | Labels |
---|---|---|---|
logchef_authorization_failures_total | Authorization failures | Counter | endpoint , reason , user_email , user_role |
Failure Reasons: insufficient_role
, team_access_denied
, source_access_denied
, token_expired
Key Use Cases:
- Monitor access control violations
- Identify users attempting unauthorized access
- Track permission boundary effectiveness
ClickHouse Metrics
Monitor database connection health and operation performance.
Metric | Description | Type | Labels |
---|---|---|---|
logchef_clickhouse_connection_status | ClickHouse connection health (0=down, 1=up) | Gauge | source_id , source_name , database , table , host |
logchef_clickhouse_connection_validation_total | Connection validation attempts | Counter | source_id , source_name , database , table , host , result |
logchef_clickhouse_reconnections_total | Connection reconnection attempts | Counter | source_id , source_name , database , table , host , result |
Key Use Cases:
- Monitor ClickHouse connection health per source
- Track connection stability and reconnection patterns
- Set up alerts for database connectivity issues
Team & Source Metrics
Track multi-tenant usage and resource access patterns.
Metric | Description | Type | Labels |
---|---|---|---|
logchef_team_operations_total | Team management operations | Counter | team_id , team_name , operation , result , user_email , user_role |
logchef_source_operations_total | Source management operations | Counter | source_id , source_name , database , table , operation , result , user_email , user_role |
logchef_collection_operations_total | Saved query collection operations | Counter | source_name , team_name , operation , result , collection_name , user_email , user_role |
Operations: create
, update
, delete
, list
, get
Key Use Cases:
- Track team and source usage patterns
- Monitor saved query collection adoption
- Identify most active teams and sources
AI Metrics
Monitor AI-powered feature usage and performance.
Metric | Description | Type | Labels |
---|---|---|---|
logchef_ai_operations_total | AI operations (SQL generation, suggestions) | Counter | source_name , database , table , operation , result , user_email , user_role |
logchef_ai_duration_seconds | AI operation latency distribution | Histogram | source_name , operation |
AI Operations: sql_generation
, field_suggestion
, query_explanation
Key Use Cases:
- Track AI feature adoption and success rates
- Monitor AI operation performance and latency
- Identify popular AI features by source/user
Histogram Metrics
Monitor histogram generation for log data visualization.
Metric | Description | Type | Labels |
---|---|---|---|
logchef_histogram_total | Histogram generation operations | Counter | source_id , source_name , database , table , result , user_email , user_role |
logchef_histogram_duration_seconds | Histogram generation latency | Histogram | source_name , database , table |
Key Use Cases:
- Monitor histogram generation performance
- Track visualization feature usage patterns
Example Queries
Monitor Query Performance by Source
# Average query duration per sourcerate(logchef_query_duration_seconds_sum[5m]) / rate(logchef_query_duration_seconds_count[5m])
# Query success rate by sourcerate(logchef_query_total{result="success"}[5m]) / rate(logchef_query_total[5m])
Track User Activity
# Most active users by query volumetopk(10, sum by (user_email) (rate(logchef_query_total[1h])))
# Authentication failure rate by userrate(logchef_auth_attempts_total{result="failure"}[5m]) by (user_email)
Monitor HTTP Performance
# Slowest endpoints (95th percentile)histogram_quantile(0.95, rate(logchef_http_request_duration_seconds_bucket[5m]))
# Error rate by endpointrate(logchef_http_errors_total[5m]) / rate(logchef_http_requests_total[5m])
ClickHouse Health Monitoring
# Sources with connection issueslogchef_clickhouse_connection_status == 0
# Connection failure raterate(logchef_clickhouse_connection_validation_total{result="failure"}[5m])
Alerting Examples
High Query Failure Rate
- alert: HighQueryFailureRate expr: rate(logchef_query_total{result="failure"}[5m]) / rate(logchef_query_total[5m]) > 0.1 for: 2m labels: severity: warning annotations: summary: "High query failure rate detected" description: "Query failure rate is {{ $value | humanizePercentage }} for source {{ $labels.source_name }}"
Authentication Failures
- alert: HighAuthFailureRate expr: rate(logchef_auth_attempts_total{result="failure"}[5m]) > 5 for: 1m labels: severity: critical annotations: summary: "High authentication failure rate" description: "{{ $value }} failed authentication attempts per second"
ClickHouse Connection Down
- alert: ClickHouseConnectionDown expr: logchef_clickhouse_connection_status == 0 for: 30s labels: severity: critical annotations: summary: "ClickHouse connection down" description: "Connection to {{ $labels.source_name }} ({{ $labels.host }}) is down"
Dashboard Recommendations
For comprehensive monitoring, create dashboards tracking:
- Overview Dashboard: Key metrics, active users, query volume, error rates
- Performance Dashboard: Query latency, HTTP response times, histogram generation
- Security Dashboard: Authentication patterns, authorization failures, user activity
- Source Health Dashboard: ClickHouse connection status, per-source performance
- User Activity Dashboard: Top users, team usage patterns, AI feature adoption
The metrics provide deep visibility into LogChef usage patterns, enabling data-driven optimization and proactive issue resolution.