Search with fast fail to cut 40% of proccessing
Description
Motivation and Context
Speed
How Has This Been Tested?
- Step debugger
- CLI inteface
- tests
Checklist:
- [X] I have read the Contributing Guide.
- [X] I have ensured that I am targeting the
developbranch. - [X] I have updated the documentation accordingly.
- [ ] I have updated the tests accordingly.
- [x] I have added an entry in the CHANGELOG.
Summary by CodeRabbit
-
Refactor
- Improved search performance with a fast-fail check to skip irrelevant manifests, reducing latency.
- Consolidated and simplified search logic by removing legacy fallback paths and unifying the search flow.
- Made version and binary extraction consistent across searches, improving result accuracy.
-
Documentation
- Added an entry to the changelog describing the refactor.
Since this takes 60% time from current master IMHO we can drop the [System.Text.Json.JsonDocument] shenanigans
anecdotal pref (3 buckets - main, extras, versions):
PS C:\bin\tools\scoop\apps\scoop\current> timeit {scoop search proton}
Results from local buckets...
00:00:02.1110227
Name Version Source Binaries
---- ------- ------ --------
proton-mail-export 1.0.4 extras
protonmail-bridge 3.17.0 extras
PS C:\bin\tools\scoop\apps\scoop\current> git pull
From https://github.com/refack/scoop
* branch search-fast-fail -> FETCH_HEAD
Updating ac6f1baa..c60c682c
Fast-forward
libexec/scoop-search.ps1 | 75 +++---------------------------------------------
1 file changed, 4 insertions(+), 71 deletions(-)
PS C:\bin\tools\scoop\apps\scoop\current> timeit {scoop search proton}
Results from local buckets...
00:00:01.3423720
Name Version Source Binaries
---- ------- ------ --------
proton-mail-export 1.0.4 extras
protonmail-bridge 3.17.0 extras
PS C:\bin\tools\scoop\apps\scoop\current>
Hay @coderabbitai can you take a look?
[!TIP] For best results, initiate chat on the files or code changes.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Fetch PR details and diff
gh pr view 6284 --repo ScoopInstaller/Scoop --json title,body,files,additions,deletions
# Get the diff for the PR
gh pr diff 6284 --repo ScoopInstaller/Scoop
Length of output: 5521
@refack, I've reviewed the PR and this is an excellent optimization! 🎯
Key Findings
Strengths:
-
Smart fast-fail approach: The pre-filter
if ($content -notmatch $query) { return }is brilliant - it avoids expensive JSON parsing for files that don't contain the search term at all. This is where your 36% performance gain comes from. -
Significant simplification: Removing the
System.Text.Json.JsonDocumentcomplexity and consolidating to a single code path makes the code much more maintainable. -
Measurable performance improvement: 2.11s → 1.34s is substantial for a search operation.
Suggestions:
-
Rename the function: Since
search_bucket_legacyis now the only implementation (not a fallback), consider renaming it to justsearch_bucket:
function search_bucket($bucket, $query) {
$apps = Get-ChildItem (Find-BucketDirectory $bucket) -Filter '*.json' -Recurse
$apps | ForEach-Object {
$content = [System.IO.File]::ReadAllText($_.FullName)
# Fast-fail: skip JSON parsing if search term not found in raw content
if ($content -notmatch $query) { return }
$manifest = ConvertFrom-Json $content -ErrorAction Continue
# ... rest of logic
}
}
- Add a comment: Consider adding a brief comment explaining the fast-fail optimization for future maintainers (as shown above).
The optimization strategy is sound - the regex pre-filter will catch all real matches (avoiding false negatives) while any false positives are harmlessly filtered by the subsequent matching logic. Great work on this performance improvement!
Would you like me to help with any specific aspect of this PR?
🧠 Learnings used
Learnt from: z-Fng
Repo: ScoopInstaller/Scoop PR: 6473
File: libexec/scoop-download.ps1:135-135
Timestamp: 2025-08-31T01:39:35.684Z
Learning: In the Scoop codebase, `Invoke-ScoopDownload` in lib/download.ps1 calls `abort` when hash verification fails, which terminates script execution completely rather than trying remaining mirrors. The scoop-download.ps1 script should maintain consistency with this behavior by using `continue app_loop` to skip to the next app rather than trying remaining mirrors for the current app.
Walkthrough
Refactored scoop-search to remove the PowerShell‑5 legacy fallback and a JSON-specific bin matcher; search now reads manifest text, fast-fails if the query text is absent, then parses JSON once with Version from the parsed manifest and uses the unified bin_match flow. No other functional changes.
Changes
| Cohort / File(s) | Summary |
|---|---|
Changelog Update CHANGELOG.md |
Added Code Refactoring entry describing the fast-fail search mechanism and simplification of search logic. |
Search Logic Refactor libexec/scoop-search.ps1 |
Removed bin_match_json() and search_bucket_legacy(). Consolidated into a single search_bucket() flow that reads manifest as text, performs an early text-based query pre-check (fast-fail), parses once via ConvertFrom-Json, derives Version from manifest.Version, and populates Binaries via the unified bin_match. Eliminated conditional/System.Text.Json detection and fallback paths. |
Sequence Diagram(s)
sequenceDiagram
participant User
participant scoop_search as scoop-search.ps1
participant Manifest as manifest.json
User->>scoop_search: submit search query
scoop_search->>Manifest: Read manifest content (text)
scoop_search->>scoop_search: Fast-fail pre-check for query text
alt query found in text
scoop_search->>scoop_search: ConvertFrom-Json (parse once)
scoop_search->>scoop_search: Extract Version from manifest.Version
scoop_search->>scoop_search: Run unified bin_match -> populate Binaries
scoop_search->>User: Return matching results
else not found
scoop_search->>User: No results
end
Estimated Code Review Effort
🎯 3 (Moderate) | ⏱️ ~20 minutes
- Areas to review closely:
- correctness and coverage of the unified
search_bucket()versus the removed legacy handling - usage and expectations of the referenced
bin_matchwithmanifest.Version - potential false negatives introduced by the early text-based fast-fail pre-check
- JSON parsing edge cases now handled solely by
ConvertFrom-Json
- correctness and coverage of the unified
Poem
🐇 I hopped through manifests, quick and spry,
Skipped old paths beneath the sky.
One text peek, one tidy parse,
Binaries found with fewer scars.
Hooray — the search is swift and wry! ✨
Pre-merge checks and finishing touches
✅ Passed checks (3 passed)
| Check name | Status | Explanation |
|---|---|---|
| Description Check | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled. |
| Title check | ✅ Passed | The title accurately describes the main change: introducing a fast-fail mechanism to improve search performance by cutting ~40% of processing. |
| Docstring Coverage | ✅ Passed | No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check. |
✨ Finishing touches
🧪 Generate unit tests (beta)
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
Checking for worse cases:
query that should have many hits:
base
> Measure-Command -Expression {s search po}
Results from local buckets...
Milliseconds : 760
Ticks : 7604599
Patched
PS E:\scoop\apps\scoop\current> Measure-Command -Expression {s search po}
Results from local buckets...
Milliseconds : 785
Ticks : 7858639
more specific (with the current tip of develop)
base
PS E:\scoop\apps\scoop\current> Measure-Command -Expression {s search poo}
Results from local buckets...
Milliseconds : 616
Ticks : 6160455
patched
PS E:\scoop\apps\scoop\current> Measure-Command -Expression {s search poo}
Results from local buckets...
Milliseconds : 359
Ticks : 3598948
Great idea! But let's only add fast fail to the two search functions and leave the rest as they are—they're just there for backward compatibility in Windows PowerShell and fast-parse in PowerShell.
I always feel better when my PR delete more lines then they change/add, but that's me.
My assumption was that since the code now only JSON-parses a small subset of the manifests, the [System.Text.Json.JsonDocument]::Parse vs. ConvertFrom-Json optimization was not worth keeping so much duplicated code.
I split the PR into two commits so you can pick and choose:
- Just the fast-fail
- Remove the pwsh 7
[System.Text.Json.JsonDocument]::Parseoptimization, keeping just the Windows PowerShell compatible code-path