Scoop icon indicating copy to clipboard operation
Scoop copied to clipboard

Search with fast fail to cut 40% of proccessing

Open refack opened this issue 1 year ago • 8 comments

Description

Motivation and Context

Speed

How Has This Been Tested?

  1. Step debugger
  2. CLI inteface
  3. tests

Checklist:

  • [X] I have read the Contributing Guide.
  • [X] I have ensured that I am targeting the develop branch.
  • [X] I have updated the documentation accordingly.
  • [ ] I have updated the tests accordingly.
  • [x] I have added an entry in the CHANGELOG.

Summary by CodeRabbit

  • Refactor

    • Improved search performance with a fast-fail check to skip irrelevant manifests, reducing latency.
    • Consolidated and simplified search logic by removing legacy fallback paths and unifying the search flow.
    • Made version and binary extraction consistent across searches, improving result accuracy.
  • Documentation

    • Added an entry to the changelog describing the refactor.

refack avatar Feb 10 '25 23:02 refack

Since this takes 60% time from current master IMHO we can drop the [System.Text.Json.JsonDocument] shenanigans

refack avatar Feb 10 '25 23:02 refack

anecdotal pref (3 buckets - main, extras, versions):

PS C:\bin\tools\scoop\apps\scoop\current> timeit {scoop search proton}
Results from local buckets...
00:00:02.1110227

Name               Version Source Binaries
----               ------- ------ --------
proton-mail-export 1.0.4   extras
protonmail-bridge  3.17.0  extras

PS C:\bin\tools\scoop\apps\scoop\current> git pull
From https://github.com/refack/scoop
 * branch              search-fast-fail -> FETCH_HEAD
Updating ac6f1baa..c60c682c
Fast-forward
 libexec/scoop-search.ps1 | 75 +++---------------------------------------------
 1 file changed, 4 insertions(+), 71 deletions(-)
PS C:\bin\tools\scoop\apps\scoop\current> timeit {scoop search proton}
Results from local buckets...
00:00:01.3423720

Name               Version Source Binaries
----               ------- ------ --------
proton-mail-export 1.0.4   extras
protonmail-bridge  3.17.0  extras

PS C:\bin\tools\scoop\apps\scoop\current>

refack avatar Feb 10 '25 23:02 refack

Hay @coderabbitai can you take a look?

refack avatar Nov 15 '25 23:11 refack

[!TIP] For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Fetch PR details and diff
gh pr view 6284 --repo ScoopInstaller/Scoop --json title,body,files,additions,deletions

# Get the diff for the PR
gh pr diff 6284 --repo ScoopInstaller/Scoop

Length of output: 5521


@refack, I've reviewed the PR and this is an excellent optimization! 🎯

Key Findings

Strengths:

  1. Smart fast-fail approach: The pre-filter if ($content -notmatch $query) { return } is brilliant - it avoids expensive JSON parsing for files that don't contain the search term at all. This is where your 36% performance gain comes from.

  2. Significant simplification: Removing the System.Text.Json.JsonDocument complexity and consolidating to a single code path makes the code much more maintainable.

  3. Measurable performance improvement: 2.11s → 1.34s is substantial for a search operation.

Suggestions:

  1. Rename the function: Since search_bucket_legacy is now the only implementation (not a fallback), consider renaming it to just search_bucket:
function search_bucket($bucket, $query) {
    $apps = Get-ChildItem (Find-BucketDirectory $bucket) -Filter '*.json' -Recurse

    $apps | ForEach-Object {
        $content = [System.IO.File]::ReadAllText($_.FullName)
        # Fast-fail: skip JSON parsing if search term not found in raw content
        if ($content -notmatch $query) { return }
        $manifest = ConvertFrom-Json $content -ErrorAction Continue
        # ... rest of logic
    }
}
  1. Add a comment: Consider adding a brief comment explaining the fast-fail optimization for future maintainers (as shown above).

The optimization strategy is sound - the regex pre-filter will catch all real matches (avoiding false negatives) while any false positives are harmlessly filtered by the subsequent matching logic. Great work on this performance improvement!

Would you like me to help with any specific aspect of this PR?


🧠 Learnings used
Learnt from: z-Fng
Repo: ScoopInstaller/Scoop PR: 6473
File: libexec/scoop-download.ps1:135-135
Timestamp: 2025-08-31T01:39:35.684Z
Learning: In the Scoop codebase, `Invoke-ScoopDownload` in lib/download.ps1 calls `abort` when hash verification fails, which terminates script execution completely rather than trying remaining mirrors. The scoop-download.ps1 script should maintain consistency with this behavior by using `continue app_loop` to skip to the next app rather than trying remaining mirrors for the current app.

coderabbitai[bot] avatar Nov 15 '25 23:11 coderabbitai[bot]

Walkthrough

Refactored scoop-search to remove the PowerShell‑5 legacy fallback and a JSON-specific bin matcher; search now reads manifest text, fast-fails if the query text is absent, then parses JSON once with Version from the parsed manifest and uses the unified bin_match flow. No other functional changes.

Changes

Cohort / File(s) Summary
Changelog Update
CHANGELOG.md
Added Code Refactoring entry describing the fast-fail search mechanism and simplification of search logic.
Search Logic Refactor
libexec/scoop-search.ps1
Removed bin_match_json() and search_bucket_legacy(). Consolidated into a single search_bucket() flow that reads manifest as text, performs an early text-based query pre-check (fast-fail), parses once via ConvertFrom-Json, derives Version from manifest.Version, and populates Binaries via the unified bin_match. Eliminated conditional/System.Text.Json detection and fallback paths.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant scoop_search as scoop-search.ps1
    participant Manifest as manifest.json

    User->>scoop_search: submit search query
    scoop_search->>Manifest: Read manifest content (text)
    scoop_search->>scoop_search: Fast-fail pre-check for query text
    alt query found in text
        scoop_search->>scoop_search: ConvertFrom-Json (parse once)
        scoop_search->>scoop_search: Extract Version from manifest.Version
        scoop_search->>scoop_search: Run unified bin_match -> populate Binaries
        scoop_search->>User: Return matching results
    else not found
        scoop_search->>User: No results
    end

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Areas to review closely:
    • correctness and coverage of the unified search_bucket() versus the removed legacy handling
    • usage and expectations of the referenced bin_match with manifest.Version
    • potential false negatives introduced by the early text-based fast-fail pre-check
    • JSON parsing edge cases now handled solely by ConvertFrom-Json

Poem

🐇 I hopped through manifests, quick and spry,
Skipped old paths beneath the sky.
One text peek, one tidy parse,
Binaries found with fewer scars.
Hooray — the search is swift and wry! ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: introducing a fast-fail mechanism to improve search performance by cutting ~40% of processing.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Nov 15 '25 23:11 coderabbitai[bot]

Checking for worse cases:

query that should have many hits:

base

> Measure-Command -Expression {s search po}
Results from local buckets...

Milliseconds      : 760
Ticks             : 7604599

Patched

PS E:\scoop\apps\scoop\current> Measure-Command -Expression {s search po}
Results from local buckets...

Milliseconds      : 785
Ticks             : 7858639

more specific (with the current tip of develop)

base

PS E:\scoop\apps\scoop\current> Measure-Command -Expression {s search poo}
Results from local buckets...

Milliseconds      : 616
Ticks             : 6160455

patched

PS E:\scoop\apps\scoop\current> Measure-Command -Expression {s search poo}
Results from local buckets...

Milliseconds      : 359
Ticks             : 3598948

refack avatar Nov 15 '25 23:11 refack

Great idea! But let's only add fast fail to the two search functions and leave the rest as they are—they're just there for backward compatibility in Windows PowerShell and fast-parse in PowerShell.

niheaven avatar Nov 16 '25 13:11 niheaven

I always feel better when my PR delete more lines then they change/add, but that's me. My assumption was that since the code now only JSON-parses a small subset of the manifests, the [System.Text.Json.JsonDocument]::Parse vs. ConvertFrom-Json optimization was not worth keeping so much duplicated code.

I split the PR into two commits so you can pick and choose:

  1. Just the fast-fail
  2. Remove the pwsh 7 [System.Text.Json.JsonDocument]::Parse optimization, keeping just the Windows PowerShell compatible code-path

refack avatar Nov 16 '25 22:11 refack