No Offset Based Pagination in Search
Description of the Issue
There doesn't seem to be a way to use Offset based pagination with the box search command https://developer.box.com/guides/api-calls/pagination/offset-based/
Instead there is a hard limit on the size of the search Result defined as 100. This is smaller than the max results size in the REST API which is 200 results. https://developer.box.com/reference/get-search/
The limitation is significant because search is the only method I have found to recurse over a folder structure and get all items matching a pattern in one step for example... find me all the pdfs in my folders under a given start folder..
- it would be better if the search method applied the max and limits something like, .e.g
const RESULTS_LIMIT = 100;
cons MAX_RESULTS_LIMIT=200;
... and then later...
class SearchCommand extends BoxCommand {
async run() {
const { flags, args } = this.parse(SearchCommand);
let options = {}
if (flags.limit) {
options.limit= (MAX_RESULTS_LIMIT< flags.limit ? MAX_RESULTS_LIMIT: flags.limit);
} else {
options.limit= RESULTS_LIMIT;
}
if (flags.offset) {
options.offset= flags.offset;
}
(deep apologies if the code is pants I'm no expert!!!)
-
there is no method to pass an offset to get a specific set of results.
-
there is no way to get
total_countthe absolute result set size as per the docs
Maybe the best option here is to wrap all of this into the search options interface so we can have
box search abc would fetch the first 100 records as it does now...
box search:total_count --limit=200 would fetch back the total number of pages if the limit is 200 per page
box search abc* --limit=200 --offset=2 fetches 200 entries from the 3rd page of the collection
Versions Used
Box CLI: @box/cli/2.4.0 win32-x64 node-v12.6.0 Operating System: Windows 10
Steps to Reproduce
- Run a search on an area with more than 100 items in the folder hierarchy as reported by the box API.
- You can only ever get the top 100 results..
Error Message, Including Stack Trace
N/A
I ran into the same issue where I needed more than 100 results. I modified the search.js file max results to 2500 or 5000.The search takes a little longer, but I get all the results I need.
Location on a PC
C:\Program Files\@boxcli\client\src\commands\users\search.js
'use strict';
const BoxCommand = require('../box-command');
const { flags } = require('@oclif/command');
const _ = require('lodash');
const BoxCLIError = require('../cli-error');
const RESULTS_LIMIT = 5000;
Thank you @ianhorn for the answer. @vroommm We will create a ticket on our backlog to allow a user to specify the RESULTS_LIMIT.
Thanks @ianhorn , I think I can work with that just enough to get me out of a hole. :-)
@sujaygarlanka Thank you.
Just so I understand, do I take it the code is essentially interpreted at run time?
@vroommm I'm not sure I'm following. You will need to drill down to the search.js in the @boxcli directory in your computer's applications and edit that file. As seen from above, it will alter the default limit of 100 to whatever limit you need.
If I understand your question correctly, once you have edited the search.js file, you can run your query without changing any of your options for limit because you've reconfigured the BoxCLI program.
Example
box search FILENAME -s
@ianhorn The file I was looking at is under ...\@boxcli\client\src\commands\search.js not . @boxcli\client\src\commands\users\search.js.
I've made the change you suggested and it WORKS!!! (but you knew that!)
box search 97* --file-extensions=pdf --json -y --save-to-file-path=allfiles.json --fields=name,parent,shared_link -v
box-cli:output Filtering output with fields: [ 'type', 'id', 'name', 'parent', 'shared_link' ] +1ms
box-cli:output Filtering output with fields: [ 'type', 'id', 'name', 'parent', 'shared_link' ] +2ms
box-cli:output Formatted 5000 output entries for display +18ms
box-cli:output Using json output format +2ms
box-cli:output Processed output as JSON +20ms
box-cli:output File already exists at d:\BoxSync\allfiles.json +4ms
box-cli:output Writing output to specified location on disk: d:\BoxSync\allfiles.json +1ms
Output written to d:\BoxSync\allfiles.json
box-cli:output Finished writing output +11ms
So although I can't get back the marbles I lost battling with this, I can at least keep the few I have remaining for a little longer...
Thanks...
FYI: The above results also answer my secondary question... Although I can read java, and have dabbled, I've never knowingly used Node. An understanding of the architecture, however superficial is always nice...
@vroommm I'm glad you found the correct search.js file. Sorry if I provided the wrong one. I wouldn't worry too much about the javascript (node). I don't really understand it either, but enough to break things.
Out of curiosity, once you changed the LIMIT RESULTS = XXX, did you still trying using a limit in your query. I've been wanting to try that because on my search I get back 5000 results and only really need 250.
@ianhorn Haven't tried anything fancy with limit stuff yet. I will if/when I get a moment. I'm resisting the urge for one more compile and moving on...
@ianhorn The one thing I was trying to do, with moderate success, was to batch the files by date range... I was using a bulk import input file like this...
ancestor_folder_ids,query,type,created-at-from,created-at-to
44871803643,97*,file,-721d,-720d
44871803643,97*,file,-722d,-721d
that kind of worked... but only if there were less than 100 files on the day in question... not something I could guarantee.
@ianhorn @sujaygarlanka I managed to get the limit to work as a simple command parameter so you can use it like this to get the first 250 results say:
-
"box search [QUERY TERM] --limit=250"--> fetches the first 250 results -
"box search [QUERY TERM]"--> fetches the default 100 results -
"box search [QUERY TERM] --limit=1"--> fetches the 1st record.
The last one can be really useful to combine with sort so you can use it to get the latest, oldest, first, last based on the sort direction and sort term...
As it turns out the code is pretty simple (even I managed it):
const RESULTS_LIMIT = 100;
//... omit a bit...
class SearchCommand extends BoxCommand {
async run() {
const { flags, args } = this.parse(SearchCommand);
//Set the limit to the default unless we passed a value
let options = {};
if (flags.limit){
options.limit = flags.limit;
} else {
options.limit = RESULTS_LIMIT;
}
//... omit a bit more
// Limit the search results to avoid slamming the API
let limitedResults = [];
for await (let result of { [Symbol.asyncIterator]: () => results }) {
let numResults = limitedResults.push(result);
//edit by vroommm to use the current options value
if (numResults >= options.limit) {
break;
}
}
await this.output(limitedResults);
//... gosh how much are we leaving out
SearchCommand.description = 'Search for files and folders in your Enterprise';
SearchCommand.examples = [
'box search "Q3 OKR"',
'box search --mdfilter "enterprise.employeeRecord.name=John Doe"',
'box search *.pdf --limit=250'
];
SearchCommand._endpoint = 'get_search';
SearchCommand.flags = {
...BoxCommand.flags,
limit: flags.integer({
description: 'The max number of records to return in the result set DEFAULT: 100'
}),
//... phew we're done leaving stuff out
@sujaygarlanka @ianhorn Thirty seconds after I posted the above I fully answered my own questions with
//... stuff before
if (flags.offset){
options.offset = flags.offset;
}
//... really you're omitting some more
offset: flags.integer({
description: 'The 0 based page to return, default=0 or omitted for first page, 1= second etc..'
}),
limit: flags.integer({
description: 'The max number of records to return in the result set DEFAULT: 100'
}),
Which means you can get the nth value for any search by using...
"box search [QUERY TERM] --limit=1 --offset=0" --> fetches the 1st record.
"box search [QUERY TERM] --limit=1 --offset=[n-1]" --> fetches the nth record.
eg.
"box search [QUERY TERM] --limit=1 --offset=9" --> fetches the 10th record.
and of course you can use it to get pages of multiple records as:
"box search [QUERY TERM] --offset=0" --> fetches the 1st 0-100 records.
"box search [QUERY TERM] --offset=900" --> fetches records 900-999
"box search [QUERY TERM] --limit=1000 --offset=9000" --> fetches records 9,000-10,000 (if they exist)
Finally this approach should work for all the offset paginated box CLI methods... :-)
@sujaygarlanka @ianhorn
minor note: when --offset is non-zero --limit must be an integer divider so the following are allowed:
-
--limit=1 --offset=99the 100th record -
--limit=10 --offset=010 records 0-9 -
--limit=10 --offset=9010 records 90-99 -
--limit=3 --offset=93 records 9-11
but these will cause errors:
-
--limit=10 --offset=9 -
--limit=3 --offset=8
Good job @vroommm. That's impressive. I really struggle with node javascript but have started to have some success.
@vroommm and @ianhorn Glad you guys were able to figure this out. Are there any specific feature requests for the search command. It seems like the two are the ability update the results limit and the ability to pass in limit and offset parameters?
@vroommm Also, regarding the issues with the bulk command, it may be limited to 100 because you set RESULT_LIMIT to a 100.
@sujaygarlanka
Are there any specific feature requests for the search command. It seems like the two are the ability update the results limit and the ability to pass in limit and offset parameters?
I'd agree with that. Only possible extra would be a form of validation, e.g.
if (offset!=0 && (offset % limit != 0)) { ...then its not a valid combo so throw an error}
The default RESULT_LIMIT of 100 is a safe bet to leave as is, because as written:
-
--limit=nnnwill always override the default to get more results if you need them and, -
--offset=nnnwill qualify where to start returning data if it's really too big a set to return in one go
Happy Days :-)
Are there any specific feature requests for the search command. It seems like the two are the ability update the results limit and the ability to pass in limit and offset parameters?
@sujaygarlanka If it were possible to have a search:total-count that would be useful...
You would be able to get the number of results to expect without actually pulling them over the network...
so if
-
search:total-count xyxreturns{total_count:0}then don't bother and if -
search:total-count xyxreturns{total_count: 999999999}then you'd better think about your criteria
SDK-1379
This issue has been automatically marked as stale because it has not been updated in the last 30 days. It will be closed if no further activity occurs within the next 7 days. Feel free to reach out or mention Box SDK team member for further help and resources if they are needed.