Shipping Logs with Vector (OTEL Schema)
This tutorial shows how to use Vector to collect logs and ship them to ClickHouse using the OpenTelemetry (OTEL) schema, making them ready for querying with Logchef.
Why OpenTelemetry Schema?
The OpenTelemetry schema provides a standardized format for logs across different sources, making them easier to query, analyze, and correlate. Using this schema with Logchef offers several advantages:
- Consistent field names across different log sources
- Built-in support for common query patterns
- Better integration with other observability tools
- Field-level indexing for faster queries
Prerequisites
- ClickHouse server running and accessible
- Basic understanding of your log sources
- Vector installed (or using Docker)
Step 1: Create the ClickHouse Table
First, we need to create a table in ClickHouse with the appropriate OTEL schema. This table will store all our logs:
CREATE TABLE IF NOT EXISTS default.logs( timestamp DateTime64(3) CODEC(DoubleDelta, LZ4), severity_text LowCardinality(String) CODEC(ZSTD(1)), severity_number Int32 CODEC(ZSTD(1)), service_name LowCardinality(String) CODEC(ZSTD(1)), namespace LowCardinality(String) CODEC(ZSTD(1)), body String CODEC(ZSTD(1)), log_attributes Map(LowCardinality(String), String) CODEC(ZSTD(1)),
INDEX idx_severity_text severity_text TYPE set(100) GRANULARITY 4, INDEX idx_log_attributes_keys mapKeys(log_attributes) TYPE bloom_filter(0.01) GRANULARITY 1, INDEX idx_log_attributes_values mapValues(log_attributes) TYPE bloom_filter(0.01) GRANULARITY 1, INDEX idx_body body TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 1)ENGINE = MergeTree()PARTITION BY toDate(timestamp)ORDER BY (namespace, service_name, timestamp)TTL toDateTime(timestamp) + INTERVAL 30 DAYSETTINGS index_granularity = 8192, ttl_only_drop_parts = 1;
Key features of this schema:
- timestamp: When the log was created (with millisecond precision)
- severity_text: Human-readable severity (INFO, WARN, ERROR, etc.)
- severity_number: Numeric severity level following OTEL standard
- service_name: The service that generated the log
- namespace: Logical grouping for services (e.g., “app”, “system”, “network”)
- body: The actual log message
- log_attributes: Map for storing additional fields specific to each log type
The CODEC
options optimize compression, while the various INDEX
options improve query performance.
Step 2: Configure Vector for Collecting Logs
Now let’s set up Vector to collect and transform logs. Below is a configuration for collecting syslog data, which you can adapt for your needs:
[api]enabled = true
# Source: Generate demo syslog events for testing[sources.generate_syslog]type = "demo_logs"format = "syslog"interval = 0.5
# Transform: Map syslog format to our OTEL schema[transforms.remap_syslog]inputs = ["generate_syslog"]type = "remap"source = ''' # Parse the syslog message into a structured format structured = parse_syslog!(.message)
# Map to OpenTelemetry fields .timestamp = format_timestamp!(structured.timestamp, format: "%Y-%m-%d %H:%M:%S.%f") .body = structured.message .service_name = structured.appname .namespace = "syslog"
# Map severity levels to standardized values .severity_text = if includes(["emerg", "err", "crit", "alert"], structured.severity) { "ERROR" } else if structured.severity == "warning" { "WARN" } else if structured.severity == "debug" { "DEBUG" } else if includes(["info", "notice"], structured.severity) { "INFO" } else { structured.severity }
# Convert to OTEL severity numbers # https://opentelemetry.io/docs/specs/otel/logs/data-model/#severity-fields .severity_number = if .severity_text == "ERROR" { 17 # ERROR } else if .severity_text == "WARN" { 13 # WARN } else if .severity_text == "DEBUG" { 5 # DEBUG } else { 9 # INFO }
# Store additional syslog fields in log_attributes .log_attributes = { "syslog.procid": structured.procid, "syslog.facility": structured.facility, "syslog.version": structured.version, "syslog.hostname": structured.hostname }
# Clean up temporary fields del(.message) del(.source_type)'''
# Sink: Send logs to ClickHouse[sinks.clickhouse]type = "clickhouse"inputs = ["remap_syslog"]endpoint = "http://clickhouse:8123"database = "default"table = "logs"compression = "gzip"healthcheck.enabled = falseskip_unknown_fields = true
This configuration does the following:
- Sets up a demo log source (replace with your actual log source)
- Transforms the logs to follow our OTEL schema
- Maps severity levels from syslog to standardized values
- Adds source-specific fields into the
log_attributes
map - Sends the transformed logs to our ClickHouse table
Step 3: Adapting for Real Log Sources
To collect real logs instead of demo data, replace the source
section with one of these common configurations:
For File-based Logs
[sources.file_logs]type = "file"include = ["/var/log/**/*.log"]read_from = "beginning"
[transforms.remap_file_logs]inputs = ["file_logs"]type = "remap"source = ''' # Set basic OTEL fields .timestamp = now() .service_name = "file_service" .namespace = "files" .body = .message
# Extract severity if possible (example pattern) severity_pattern = parse_regex(.message, r'^(?P<time>\S+)?\s+(?P<level>INFO|DEBUG|WARN|ERROR)') .severity_text = if exists(severity_pattern.level) { severity_pattern.level } else { "INFO" } .severity_number = if .severity_text == "ERROR" { 17 } else if .severity_text == "WARN" { 13 } else if .severity_text == "DEBUG" { 5 } else { 9 }
# Add file source information .log_attributes = { "file.path": .file, "host.name": get_env_var("HOSTNAME", "unknown") }'''
For Docker Container Logs
[sources.docker_logs]type = "docker_logs"include_containers = ["*"]
[transforms.remap_docker_logs]inputs = ["docker_logs"]type = "remap"source = ''' # Set basic OTEL fields .timestamp = now() .service_name = .container_name .namespace = "containers" .body = .message
# Try to extract severity .severity_text = "INFO" # Default value if match(.message, r"(?i)error|exception|fail|critical") { .severity_text = "ERROR" } else if match(.message, r"(?i)warn|warning") { .severity_text = "WARN" } else if match(.message, r"(?i)debug") { .severity_text = "DEBUG" }
# Map to severity number .severity_number = if .severity_text == "ERROR" { 17 } else if .severity_text == "WARN" { 13 } else if .severity_text == "DEBUG" { 5 } else { 9 }
# Add container metadata .log_attributes = { "container.name": .container_name, "container.image": .container_image, "container.id": .container_id }'''
Step 4: Configuring Logchef to Query These Logs
Once your logs are flowing into ClickHouse, you’ll need to configure Logchef to query them:
- Log in to Logchef
- Go to Sources > Add Source
- Enter the ClickHouse connection details:
- Host: Your ClickHouse host
- Port: 9000 (default ClickHouse TCP port)
- Database: default
- Table: logs
- Create a team and assign this source to the team
Now you can start querying your logs with patterns like:
namespace="syslog" severity_text="ERROR"
to find error logsservice_name="my-service" body:"connection refused"
to search for specific textlog_attributes.container.name="api-server"
to find logs from a specific container
Advanced: Adding Custom Dimensions
To add additional context to your logs, expand the log_attributes
map:
[transforms.enrich_logs]inputs = ["remap_syslog"] # or your transformtype = "remap"source = ''' # Add environment information .log_attributes.environment = get_env_var("ENVIRONMENT", "development")
# Add deployment information .log_attributes.version = get_env_var("APP_VERSION", "unknown") .log_attributes.deployment = get_env_var("DEPLOYMENT_ID", "unknown")
# Add custom business dimensions .log_attributes.tenant_id = get_env_var("TENANT_ID", "default") .log_attributes.region = get_env_var("REGION", "us-east-1")'''
This additional context makes it easier to filter logs in Logchef by business dimensions.
Conclusion
By following this tutorial, you’ve set up a complete logging pipeline that:
- Collects logs from various sources
- Transforms them into a standardized OTEL schema
- Stores them efficiently in ClickHouse
- Makes them accessible for querying in Logchef
This approach gives you a scalable, standardized logging solution that can handle large volumes of logs while keeping query performance fast.
Remember that Logchef doesn’t handle log ingestion directly - it’s a purpose-built UI for querying logs stored in ClickHouse. This separation of concerns allows you to use specialized tools like Vector for collection while leveraging Logchef’s powerful query capabilities.