quickwit icon indicating copy to clipboard operation
quickwit copied to clipboard

Optional Ingestion error handling

Open teochenglim opened this issue 1 year ago • 0 comments

Is your feature request related to a problem? Please describe.

Quickwit output the error ingestion on the system out error message if the data pass in has wrong type for example user_id was integer and now it is string. Where elasticsearch output the error on the client side and reject it.

This make sense however people may feel that this is quickwit side fault and quickwit would lose the opportunity for client to ingest data correctly using replay or the original message shall keep somewhere.

In facts with the efficiency of quickwit, you may write the log into another object storage place and allow re-import to another index, or allow user analyse the log error patterns later. The idea is once it is ingested into quickwit the data never disappear hence this is reliable.

Describe the solution you'd like

Possible solution:

  1. reject back to client
  2. write to somewhere else

A consistent error handling since that's no log aggregator (fleunt-*) and ES dynamic type mapping.

Describe alternatives you've considered yes, quickwit can only be use for fix log schema and if the log gone means gone. And user may lose confident to stream data in.

Additional context

Step 1, first ingesting 1000 logs using this python (this will similiar to ES, it will "guess" and cast the type) Step 2, ingesting a wrong type (string) using curl (to sabotage the log ingestion) Step 3, check container log

Step 1: "user_id" is an "int"

import random
import time
import requests
import sys
from datetime import datetime

# Define possible log levels and messages
LOG_LEVELS = ['INFO', 'ERROR', 'DEBUG', 'WARNING', 'CRITICAL']
LOG_MESSAGES = [
    "User logged in",
    "File not found",
    "Connection established",
    "Unexpected error occurred",
    "Resource loaded successfully",
    "Disk space is low",
    "System rebooted",
    "Timeout while connecting",
]

# Define additional fixed fields
FIXED_FIELDS = {
    "operation": ["create", "delete", "update", "read"],
    "region": ["us-east-1", "us-west-2", "eu-central-1", "ap-southeast-1"],
    "service": ["auth", "database", "storage", "compute"],
    "status_code": ["200", "404", "500", "403", "302"],
    "api_version": ["v1", "v2", "v3"],
    "method": ["GET", "POST", "PUT", "DELETE"],
    "platform": ["web", "mobile", "desktop"],
    "customer_type": ["free", "premium", "enterprise"],
    "os_version": ["windows", "macos", "linux", "android", "ios"]
}

def generate_log():
    """Generates a single randomized log entry with additional fields."""
    log_level = random.choice(LOG_LEVELS)
    message = random.choice(LOG_MESSAGES)
    timestamp = datetime.now().isoformat()
    
    # Generate a random user ID between 1 and 10,000
    user_id = random.randint(1, 10000)
    
    # Generate random values for the additional fields
    additional_fields = {field: random.choice(values) for field, values in FIXED_FIELDS.items()}
    
    log_entry = {
        "timestamp": timestamp,
        "level": log_level,
        "message": message,
        "user_id": user_id
    }
    
    # Add the fixed fields
    log_entry.update(additional_fields)
    return log_entry

def send_logs_to_quickwit(log_entry, quickwit_endpoint):
    """Sends the log entry to Quickwit using HTTP POST."""
    try:
        response = requests.post(quickwit_endpoint, json=log_entry)
        if response.status_code == 200:
            print(f"Log sent successfully: {log_entry}")
        else:
            print(f"Failed to send log: {response.status_code}, {response.text}")
    except Exception as e:
        print(f"Error sending log: {e}")

def main():
    # Command line argument for number of logs
    if len(sys.argv) != 2:
        print("Usage: python log_generator.py <num_of_logs>")
        sys.exit(1)
    
    num_of_logs = int(sys.argv[1])
    quickwit_endpoint = "http://localhost:7280/api/v1/fluentbit-logs/ingest"

    for _ in range(num_of_logs):
        log_entry = generate_log()
        send_logs_to_quickwit(log_entry, quickwit_endpoint)
        # time.sleep(0.01)

if __name__ == "__main__":
    main()

Step 2: "user_id" is now a "string"

curl -X POST http://localhost:7280/api/v1/fluentbit-logs/ingest -H "Content-Type: application/json" -d '{
  "timestamp": "2024-10-08T12:54:19.113236",
  "level": "WARNING",
  "message": "Connection established",
  "user_id": "5432abc",
  "operation": "delete",
  "region": "ap-southeast-1",
  "service": "auth",
  "status_code": "403",
  "api_version": "v3",
  "method": "GET",
  "platform": "mobile",
  "customer_type": "premium",
  "os_version": "android"
}'

Step 3: error show on container log

quickwit-1        | 2024-10-08T05:15:19.395Z  WARN quickwit_indexing::actors::doc_processor: JSON parse error: EOF while parsing an object at line 1 column 1 index_id="fluentbit-logs" source_id="_ingest-api-source"
quickwit-1        | 2024-10-08T05:15:19.395Z  WARN quickwit_indexing::actors::doc_processor: JSON parse error: invalid type: string "timestamp", expected a map at line 1 column 13 index_id="fluentbit-logs" source_id="_ingest-api-source"
quickwit-1        | 2024-10-08T05:15:19.395Z  WARN quickwit_indexing::actors::doc_processor: JSON parse error: invalid type: string "level", expected a map at line 1 column 9 index_id="fluentbit-logs" source_id="_ingest-api-source"
quickwit-1        | 2024-10-08T05:15:19.395Z  WARN quickwit_indexing::actors::doc_processor: JSON parse error: invalid type: string "message", expected a map at line 1 column 11 index_id="fluentbit-logs" source_id="_ingest-api-source"
quickwit-1        | 2024-10-08T05:15:19.395Z  WARN quickwit_indexing::actors::doc_processor: JSON parse error: invalid type: string "user_id", expected a map at line 1 column 11 index_id="fluentbit-logs" source_id="_ingest-api-source"
quickwit-1        | 2024-10-08T05:15:19.395Z  WARN quickwit_indexing::actors::doc_processor: JSON parse error: invalid type: string "operation", expected a map at line 1 column 13 index_id="fluentbit-logs" source_id="_ingest-api-source"
quickwit-1        | 2024-10-08T05:15:19.395Z  WARN quickwit_indexing::actors::doc_processor: JSON parse error: invalid type: string "region", expected a map at line 1 column 10 index_id="fluentbit-logs" source_id="_ingest-api-source"
quickwit-1        | 2024-10-08T05:15:19.395Z  WARN quickwit_indexing::actors::doc_processor: JSON parse error: invalid type: string "service", expected a map at line 1 column 11 index_id="fluentbit-logs" source_id="_ingest-api-source"
quickwit-1        | 2024-10-08T05:15:19.395Z  WARN quickwit_indexing::actors::doc_processor: JSON parse error: invalid type: string "status_code", expected a map at line 1 column 15 index_id="fluentbit-logs" source_id="_ingest-api-source"
quickwit-1        | 2024-10-08T05:15:19.395Z  WARN quickwit_indexing::actors::doc_processor: JSON parse error: invalid type: string "api_version", expected a map at line 1 column 15 index_id="fluentbit-logs" source_id="_ingest-api-source"

teochenglim avatar Oct 08 '24 05:10 teochenglim