Extend Find-IshDocumentObj and Find-IshPublicationOutput cmdlets for server-side out-of-memory protection by time slicing
Shorter crisper interactive experience is nice. Programming-wise, an explicit -IshSession is still preferred. Remember you can still use two sessions to compare or migrate content. Attempted as part of #45
- [x] Requires #45 merge for
New-IshSessionadaptions, etc - [x]
Get-IshEventprotected by-ModifiedSincedefaulting to last day - [x]
Get-IshBackgroundTaskprotected by-ModifiedSincedefaulting to last day - [ ]
Find-IshBaselineused to return everything, low risk on bringing the server down - [ ]
Find-IshEDTused to return everything, low risk on bringing the server down - [ ]
Find-IshOutputFormatused to return everything, low risk on bringing the server down - [ ]
Find-IshUserGroupused to return everything, low risk on bringing the server down - [ ]
Find-IshUserRoleused to return everything, low risk on bringing the server down - [ ]
Find-IshUserused to return everything, medium risk on bringing the server down - [ ]
Find-IshDocumentObjused to return everything, high risk on bringing the server down but would mean breaking behavior compatibility - [ ]
Find-IshPublicationOutputused to return everything, high risk on bringing the server down but would mean breaking behavior compatibility - [ ]
Find-IshAnnotation#78 will return everything, medium risk on bringing the server down but would mean breaking behavior compatibility
Thinking out loud... options are...
- Keep backward behavior compatibility even if having an implicit
IshSessiona singleFind-IshDocumentObjcould bring everything to its knees. Current 0.x behavior, no code change required. - Keep backward behavior compatibility but time slice by adding optional
-ModifiedSince(DeltaDateTimeStart, the year 2000 or so),-ModifiedUntil(DeltaDateTimeEnd, so Now+1day) and-ModifiedStep(DeltaTimeSpan, so per year?). In practice the API calls would use aMODIFIED-ONfilter to return less from over the API function in one go, but if not pipelined in PowerShell the client-side memory could still explode. Preferably with Write-Progress like behavior. Preferred option if I have the time, cleans up the ISHInsights DeltaCrawl code base as well.- Note that only
Find-IshDocumentObjandFind-IshPublicationOutputneed this protection I feel. All others are optional for consistency but can be implemented already over-MetadataFilter
- Note that only
- Break compatibility. Do the above
-ModifiedSince(DeltaDateTimeStart, defaulting to last day),-ModifiedUntil(DeltaDateTimeEnd, so Now+1day) and-ModifiedStep(DeltaTimeSpan, so more than one day).
First of all, I think we kind of want to have backward compatibility behavior, but still want to protect the application and database server
So I would do something like...
- introduce 2 new optional parameters: -ModifiedSince and -ModifiedStep (possibly also -ModifiedBefore)
- if only a metadata filter is provided, then we have the current behavior
- So if they did not filter wisely that might still give an issue
- if no metadata filter and none of the new optional parameters is provided, I would throw an exception to protect the system
- I don't see how you can have a good default for the new optional parameters that will make sense for all customers
- if only -ModifiedSince is provided, I would either do a smart default value for -ModifiedStep (per month if -ModifiedSince is less than 2 year, per year if -ModifiedSince is more than 2 year) or throw an exception that you also need to specify -ModifiedStep
- if the metadata filter is provided and -ModifiedSince (and -ModifiedStep) are also provided, then I would throw if the MODIFIED-ON is present in the metadata filter
Thanks, more food for thought... It looks like we are heading for option 2 so backwards compatible only doing x times more API calls then before, so theoretically somewhat slower but much more predictable for larger setups. On bigger database the Find cmdlet without any filter went wrong anyway as you attempt to pull the full database over.
- The
MODIFIED-ONwill be on language level, not logical otherwise you might miss updates of blobs - The
ModifiedSincedefault value would be the year 2000 for now, birth date of any database - The
-ModifiedStepdefault value for PublicationOutput is1 yearwhile for DocumentObj it should be smaller like2 months. Note that on very big databases, or actually databases where in those 2 months a big legacy import happened, it could still go wrong server-side or client-side - in those scenario you can overwrite the defaults provided.
Now a legacy conversion could be something better than below, the Find cmdlet could even show a progress bar
Find-IshDocumentObj -MetadataFilter (Set-IshMetadataFilterField -Level Lng -Name MODIFIED-ON -FilterOperator GreaterThan -Value 01/09/2019) |
Set-IshMetadataField -Name FCOMMENTS -Level Lng -Value "Hilde was here" |
Set-IshDocumentObj
In all scenarios the -ModifiedStep goes up, but you could also count down. So from very recent to the birth date of the database. This way you get recent results first which often make more sense.
Was looking for more standardized terminology and a way to make querying from Now to database birth date the default. So still pursuing backward compatible option 2.
- [ ]
-ModifiedBefore(instead of-ModifiedUntil) would default to Now+1day (DeltaDateTimeEnd) - [ ]
-ModifiedAfter(instead of-ModifiedSince) would default to database birth date, so year 2000 (DeltaDateTimeStart). Theoretically the last server-side Find operations will return empty results quite quickly. - [ ]
-ModifiedStepdefault value for PublicationOutput is aTimespanof1 yearwhile for DocumentObj it should be smaller like3 months(DeltaTimeSpan). Note that on very big databases, or actually databases where in those months a big legacy import happened, it could still go wrong server-side or client-side - in those scenario you can overwrite the defaults provided. The step would always be used to step back into history. - [ ] The three above parameters are all optional, and all have defaults protecting the server-side system. No need to throw. In case
-MetadataFilteris offered, then we suggest to simply merge, if that causes 3+MODIFIED-ONfilters, so be it - potentially push a Write-Warning out. - [ ] Document the potential performance slowdown which can be bypassed by explicitly passing a massive
-ModifiedStep, but would need that - [ ] Write-Progress is a must; showing the exact count of server-side Find operations and a progress bar.
- [ ] As only implementation for
Find-IshDocumentObjandFind-IshPublicationOutputis really required. TheMODIFIED-ONwill be on language level, not logical otherwise you might miss updates of blobs
Considered but not required for closing this issue
- Align parameter set across all
Find-*cmdlets, probablyFind-IshAnnotationfirst usingMODIFIED-ONon annotation level - Customize to other date fields, requiring
-ModifiedFieldNameand-ModifiedFieldLevel(on multi-card object types, always None on single card types)
Investigating further, the idea is good, the performance and accuracy guarantees however not. ISHRemote tries to be version-agnostic where possible, for #49 there are two reasons to put this idea on hold:
- On older Content Manager versions only one date filter (so
MODIFIED-ON) will be passed to the initial database query for an APIFindoperation. This means that potentially all objects are retrieved from the database server to the application server, before they get filtered again to be pushed to the client (soISHRemote). - On older Content Manager versions, on initial object creation (e.g.
Add-IshDocumentObj), theMODIFIED-ONfield is not filled in, only theCREATED-ONfield as they are in essence the same. So anullonMODIFIED-ONsimply complicates matters.
As a reminder, the main problem is how to iterate all data, even for large enterprise sets of data. Where this idea was to iterate over time, we are going back to iterating over the folder structure. Continuing with #92 and #91, together they allow to iterate the folder structure and in turn find content-objects/publicationoutputs based on filter criteria like language or recently changed.