CLIP icon indicating copy to clipboard operation
CLIP copied to clipboard

CLIP Model Document Classification Category Overlap Issue

Open rushabhT3 opened this issue 1 year ago • 0 comments

The current CLIP model implementation is unable to properly distinguish between similar document categories, particularly between receipts and invoices. When processing documents like detailed sales reports, the model shows significant overlap in confidence scores between categories that should be more distinctly classified.

Current Categories

CLIP_CATEGORIES = [
    "receipt",
    "invoice (table with with brought items and their price)",
    "cheque",
    "logo",
    "document",
    "blank document",
    "form",
    "contract",
    "letter",
    "chart",
    "graph"
]

Problem

  1. Category Overlap:

    • Receipt vs Invoice: Model struggles to differentiate between these when processing sales documents
  2. Example Case: When processing a restaurant sales report containing:

    • Itemized sales data
    • Payment summaries
    • Tax calculations
    • Business information

    The model produces ambiguous confidence scores between "receipt" and "invoice" categories.

Impact

  • Unreliable classification results
  • High uncertainty in document type determination
  • Reduced accuracy in automated document processing
  • Manual intervention often needed for correct categorization

Proposed Solutions

  1. Refine Category Definitions:

    • Add more specific categories like "sales_report", "financial_statement"
    • Create subcategories for business-specific documents
    • Include composite categories for hybrid documents
  2. Training Improvements:

    • Enhance training data with more diverse document examples
    • Include more restaurant-specific financial documents
    • Add clear distinguishing features between receipts and invoices
  3. Category Refinement:

REFINED_CATEGORIES = [
    "simple_receipt",  # Basic transaction receipts
    "detailed_sales_report",  # Comprehensive business sales data
    "commercial_invoice",  # Formal billing documents
    "financial_statement",  # Detailed financial reports
    "business_document",  # General business documentation
    "form",
    "contract",
    "letter",
    "chart",
    "graph"
]

Additional Context

Test document: Restaurant daily sales report containing detailed financial breakdowns, which received split classifications between receipt and invoice categories.

Labels

  • enhancement
  • machine-learning
  • document-classification
  • CLIP-model
  • accuracy

rushabhT3 avatar Jan 17 '25 11:01 rushabhT3