filter-collections icon indicating copy to clipboard operation
filter-collections copied to clipboard

A short "how it works" paragraph?

Open testbird opened this issue 12 years ago • 11 comments

How it works is probably obivous to you. Could you provide a little more meta info about how it works in the Readme intro?

For example, where the sort/filter computation is done (server or client), why (pros/cons), if there can be a server side filtering from global data set to client data set, and further client side sort/filter without creating server load or retransmissions (as long as documents don't change).

testbird avatar Jan 24 '14 10:01 testbird

Aspects in question:

The sorting in publish function is said to have no absolute effect on the client (useful only for the filtering based on sorted documents). Thus, do only sort in publish functions if necessary? Or does it still result in some degree of presorted client collections. https://stackoverflow.com/questions/15153349/meteor-subscribe-doesnt-update-sort-order-of-collection

Re-subscriptions to other sorting/filtering causes full retransmissions, when it may bot be necessary.

publish (server-side filter):

  • usually just returns filtered find cursor
  • optionally "manual" (observe, added, ...) function? To propagate the filter result into a separate client collection (cache), without necessarily invalidating all prior results.

template helpers (usually client-side):

  • are always doing the final sorting and filtering
  • may be supported by multiple?, presorted? cached local collections

testbird avatar Jan 25 '14 10:01 testbird

As the publish function is auto generated, would you agree filter-collections are suitable only for public collections?

Edit: The beforePublish callback seems to provide a way to restrict publication, may f-c do some check() validatoin on the client's query first?

testbird avatar Jan 25 '14 11:01 testbird

Concerning the re-subs, the key queston may be "How to switch sorting/filterring while the connection is down/offline? (re-render without re-sub option, that may also be used to reduce server load?)

testbird avatar Jan 25 '14 12:01 testbird

For example, where the sort/filter computation is done (server or client), why (pros/cons), if there can be a server side filtering from global data set to client data set, and further client side sort/filter without creating server load or retransmissions (as long as documents don't change).

The FilterCollections instance will manage computations on the client-side to maitain updated the Collection's documents whenever the user interacts with the UI (filtering, sorting, searching and paging).

This was done this way because if you start storing documents on clients, browser memory rises. For example, if you are viewing 100 items per page and do click no Next button 10 times, you would have stored 1000 documents on the client side but you only need 100 to display. Perhaps I could bring a small "cache" system to assure that next and previows page items are also loaded to avoid server latency on every page change. Let me know your thougts.

The sorting in publish function is said to have no absolute effect on the client (useful only for the filtering based on sorted documents). Thus, do only sort in publish functions if necessary? Or does it still result in some degree of presorted client collections. https://stackoverflow.com/questions/15153349/meteor-subscribe-doesnt-update-sort-order-of-collection

As you correclty said, sorting in publish function have absolute no effect on the client but it does on mongo and minimongo and it's needed to build queries returning a correct range of documents based on skip and limit (and other criteria). Because of that I've considered that a new query will be almost always needed when performing any action.

Another topic here is that on server-side the sort and skip are used against mongodb, so publish function will ask mongo to bring documents between skip and limit range, sorted by X. Once the publisher returns the documents (cursor) to their subscribers, they are sent to minimongo and not always are stored using the cursor sorting so we need to sort again client-side.

Re-subscriptions to other sorting/filtering causes full retransmissions, when it may bot be necessary.

This is a good topic so I will explain the best I can (please read carfuly, perhups I'm wrong). The idea is to execute a client computation anytime the query changes since I've assumed that when anything in the query has changed, we need a different filtered/sorted dataset. Cases:

When sorting: if you change the page or the items per page, you are directlly modifying skip and limit query params. So anytime the user update this, different records are needed.

When filering: same as sorting, when the user updates a filter criteria, different subscription cursor is needes.

Finally, on every update, I'm including a simple check to verify that the query is acually changed. If it does, it executes the computation, if not, it do nothing.

Could you please let me know if I'm missing something here? Will be glad to improve it.

As the publish function is auto generated, would you agree filter-collections are suitable only for public collections?

Yes, I agree on that since I haven't done any test with local collections (I hope I will shortly). Anyway, the main goal of this package is to work with large datasets mainly so I can't imaging a large dataset inside a local collection, right?

The beforePublish callback seems to provide a way to restrict publication, may f-c do some check() validatoin on the client's query first?

Yes, this is a great idea and I've added to the project todo.

Concerning the re-subs, the key queston may be "How to switch sorting/filterring while the connection is down/offline? (re-render without re-sub option, that may also be used to reduce server load?)

This will be usefull if having all documents in client. Based on this, I'm starting thinking to add a paramater to let the developer setup how much documents should be sent to client and start doing checks on publisher to see what documents should be added/removed. What do you think?

Thanks a lot for all these feedback and questions. It will help to make this package better.

Finnaly, let me know if this answers make any sense and will add some documentation on readme.md based on this.

julpod avatar Jan 25 '14 20:01 julpod

Thank you for the answers, I think I have learned something, and compiled my design thoughts at #9 with this.

Writing "suitable only for public collections" I was thinking about restricting the published scope and validating that the client can not request data that should not be accessible, so this is now the topic of #7.

Concerning "how it works" I gather that a explaining sentence for the readme may be: Filter-collections currently subscribes only to one publication at a time, and does (costly) re-subs on every query change?

That's ok (the sort, filter and pagination features work quite nicely) and still allows for taking advantage of the local (minimongo) database cache.

testbird avatar Jan 26 '14 14:01 testbird

I think the ability to add f-c functionality to custom publish() functions (#10) could also be very helpful to distinguish scopes and pagination, and to explain how it works.

The client may then subscribe to various custom scope publications, and f-c may happily sort, filter and paginate according to the available scopes locally on the client.

But if the publication would return too much data, f-c could instead trigger subscribing a separate client collection to an autogenerated f-c publish() function (with a defined skip and limit), and render the data from there.

testbird avatar Jan 28 '14 13:01 testbird

May it be already possible to let the sort, filter and paginate template helpers work on an existing (local) collection, that is updated by a regular (manual) subscription to a scope that is small enough to get completely synced to client?

testbird avatar Jan 28 '14 14:01 testbird

I have the feeling things are clearing up for me a little.

Filter-Collections currently always seems to couple "sorting and filtering collections" with "paginated subscriptions".

It would be great, if its "sorting and filtering" template features could also be used on arbitrary (local) collections that are synced through custom pub/subs (to be used with sort orders and limits with which the local collection (cache) has been filled, and when the collection is small enough to support this).

I think I didn't mention it explicitly, but you probably know that it is possible to publish() documents from the same collection multiple times, for example "recent posts" and "own posts". As well as subscribe to a publication multiple times, for example with different sort orders. All the resulting documents are then synced into the client collection.

And this example from the book is said to publish all docs of type video from the Resources collection to a Videos collection on the client:

Meteor.publish('videos', function() {
var sub = this;
var handle = Resources.find({type: 'video'}).observe({
added: function(id, video) {
sub.added('videos', id, video);
}
// for other events, see _publishCursor for hints.
})
// mark complete, clean up when stopped
sub.ready();
sub.onStop(function() { handle.stop(); });
});

Unfortunately, the book does not teach anything more about this.

Following this lead, it should still be possible to implement paginated subscriptions that sync into a separate collection on the client. And to set up a new subscription for a new (shifted) pagination range, before tearing down the previous pagination range subscription. Resulting in transmitting only the additional documents into the separate collection, and then dropping just the documents that are not part of the new subscription range anymore. This avoids doing a full re-subscription and re-transmitting of all the documents in the overlapping range of the current and new paginated subscription.

testbird avatar Jan 28 '14 21:01 testbird

Here is a preliminary (currently half fictional) description of an idea how filter-collection could work more universally:

The filter-collections smart package for Meteor enables efficient dynamic filtering, sorting, and pagination in client applications, that can work based on shared local collection data caches (shared subscriptions) as well as on individual server queries that request only the required data (specific subscriptions stored in separate local collections).

Template helper elements are provided (with global Handlebars.registerHelper() ?) to add filter controls and render the results.

A central FilterCollections instance on the client keeps track of the parameters of all filters. Filters are named and all filter control and result elements in the templates refer to a specific filter. By default, the name of the template is used as the filter name, and thus each template works with its own filter by default, but template elements may just as well refer to other filters.

The template rendering helpers for a filter execute the requested query against a local collection that is determined and said to be ready (synced) by the FilterCollections instance (according to the subscribed scopes and their target collections). The local collection to execute the requested query against can change depending on the parameters and locally cached result data of the filters that are currently in use.

When filter control template events or direct method calls change the filter parameters, the FilterCollections instance automatically adjusts or adds a new client subscription accordingly, waits for the subscription to sync if the newly requested data was not part of prefetched range already, returns the proper collection to render the results from, and stops a previous subscription if a cache limit has been reached. By default, the FilterCollections instance will start adding and adjusting subscriptions that sync documents to the regular, shared, local collection that has the same name as on the server. The template rendering helpers then query the chosen collection according to the filter parameters.

A shared cache (multiple subscriptions updating the same local collection) has the advantage that documents may already be synced and every document only needs to be transmitted and updated once, regardless of in how many subscriptions it is included. But with a shared cache it is not possible to skip the first documents from a subscription, when only some later documents (pagination range) in a specific sort order are of interest for a filter. This is because other subscriptions may sync documents that come before those that are of interest for the filter, and the list rendered from the local cache according to the sorting filter will then start with documents that the filter thought to skip. Consequently, skip parameters can not be used with a shared cache, and thus paginating to documents that are located further down in a sort order will increase the required size of a shared cache considerably, if the collections are large.

Too large shared cache sizes can exceed the client's memory or processing capacity, and cause slowdowns when filtering in large scopes or paginating deeply into the results. To avoid this, FilterCollections can direct scope subscriptions into a separate local cache collection for each sort order, or even per individual filter. With these specific caches, it is possible to skip documents in subscriptions and still render the results from the cache reliably, so that the cache can be kept relatively small even when paginating and filtering from very large server collections. Of course, the smaller the cache size, the higher the probability that rendering requires a new request to the server, up to the point where every change in filter parameters requires a new request and database query on the server. One has to balance between client and server load, as well as network transfers.

Several options of the FilterCollection instance control the size of cache collections on the client.

  • a "per sort order cache" option for named filters (instead of starting with the shared cache)
  • a "individual cache (individual server search queries)" option for named filters
  • a "unmanaged/custom cache (rely on static or app managed subscriptions)" option, possibly by specifying a specific local collection name for a filter
  • critical max. limit for scopes that should be directed to the shared cache, for directing filters to per sort order caches and using skip subscriptions
  • critical shared cache size, for directing filters to per sort order caches and using skip subscriptions
  • critical per sort order cache size, for directing filters to individual caches and using fully filtering subscriptions (server queries) that include pagination (but still caching/prefetching last/neighbor pages).
  • max. individual cache size factor (determines max individual cache size) applied to pagination increment (page size)
  • number of neighboring pages to prefetch (include in a scope subscription)

wording: filter: client's filter, sort, and pagination (skip & limit) settings used to render the results from the local collection. A filter requires a collection with a satisfying scope subscription. page size: the number of documents that will be actually rendered by the results helper (pagination) neighborhood: the page requested by the filter control plus a number of prefetched neighboring pages scope (subscriptions): filter, sort, skip, limit that defines the synced document set of a subscription A scope can satisfy multiple filter settings. publication: the set of documents that may be synced to clients, usually limited to a certain subset of documents and fields from the database

Switching pages within an already subscribed and synced range can happen immediately based on the local collection data.

As long as no critical cache size limit has been reached, FilterCollections can default to keep everything in the synced cache (keep previous subscriptions running). But a size limiting separate collection cache may be used, e.g. if documents are continuously updated, to reduce load and traffic caused by syncing possibly non-interesting changes.

When caching limits have been reached, old subscriptions need to be stopped: shifts: if new range overlaps old range: sync new subscription, then stop previous jumps: if new page is not a neighbor, keep the last scope (for back jumps), but stop the oldest scope subscription in the list of recent jumps

Ways to define (limited) publications: • Custom limited publication that allow defined option arguments? For example: publish(find, sort, limit, skip) • Use a method like in the publish-with-relatioins package? • Publish functions that are automatically generated by Meteor.FilterCollection.publish() with a field whitelist and publication query defined to sanitize, extend and strip the client's query argument objects. (validate, extend and strip the user supplied objects for the find query, sort, fields, limit, and skip arguments)

testbird avatar Jan 30 '14 23:01 testbird

Edit: The following has been merged into the description in the previous comment.

~~I am wondering now if scopes actually do map 1:1 with subscriptions (did I use an unnecessary different word?), and only filter settings can map n:1 to subscriptions (as well as 1:1 if separated per filter (server processes all filter options), instead of per shared sort order(server only processes sort and skip options, filter only applied to client collection query)).~~

~~Concerning publish() functions, it could be an idea to provide some functions to merge the required publish options with the user supplied ones, to ensure the publications are properly limited to what it should publish. For example, validate merge and strip the user supplied objects for query, fields, limit.~~

testbird avatar Jan 31 '14 08:01 testbird

Looking at package https://github.com/dburles/reactive-relations it even seems to provide a better way to do publications than publish-with-relations. Maybe it is possible to just add find, sort, skip, etc. and whitelist arguments there and rely on that package to do the publications.

testbird avatar Feb 07 '14 09:02 testbird