solid-spec icon indicating copy to clipboard operation
solid-spec copied to clipboard

Provide guidelines for application data storage configuration

Open JornWildt opened this issue 7 years ago • 25 comments

Given that my browser based web-app has loaded the user's profile document and read the value of pim:storage, which yields the root of my POD https://elfisk.solid.community ... where should my app then store it's data and what should it do to protect the data from other web-apps?

Example:

  • My web-app registers my pets and store data about each pet in it's own document in a dedicated my-pets container.

  • My web-app lives at https://my-solid-pets.com which thus also happens to be the value of the Origin header the browser is going to send to my POD server.

Where should the my-pets container be located? Should it be https://elfisk.solid.community/my-pets/ or https://elfisk.solid.community/inbox/my-pet/ or what is the right "best practice" recommended location?

Data discovery by the user's type registry is not going to work since this a new web-app and nobody has ever had a chance to make a registration.

The location should be private by default (protected from other users) and restricted to requests from https://my-solid-pets.com only. What should my web-app do to ensure this?

(copied from https://forum.solidproject.org/t/the-right-location-for-a-new-apps-data/662)

JornWildt avatar Nov 27 '18 12:11 JornWildt

@JornWildt very good question! We dont have a firm answer tho this question has been raised on and off for many years.

Short answer is : it's up to the app.

Long answer : it would be nice to have a convention that lots of people follow.

One area that guides me for naming is the question "Where do other operating systems store their data?". So for example skype will store data in .skype in the home directory. Other operating systems will have a folder called "application_data".

Solid is slightly more fine grained in that it has private, public and shared data.

Right now I have used a folder /public/appdata/ though I would be interested to know where app developers people store their app data.

melvincarvalho avatar Nov 27 '18 13:11 melvincarvalho

Currently my "pet" app (which is actually a logbook for RC model flights btw) stores it right under the root as /solid-rc (like solid-rc/models and solid-rc/locations etc.). That application data is probably not going to be public for me, but someone else might be more open and want to share the app data with their friends. So it seems wrong to include /public/ in the container path as being "public" or not is a user choice.

Another approach would be to let the user choose a prefered location ... but I assume most end-users wouldn't know what to choose and why.

If /public is out, I like the /appdata suggestion - and leave the root open for more hard core "operating system" containers like for instance /inbox.

JornWildt avatar Nov 27 '18 13:11 JornWildt

@JornWildt good points. The /appdata folder I quite like (as a default).

Perhaps it would be an idea to do a short survey of the default paths of the most popular operating systems, and do a comparative analysis.

Then decide on a default location.

melvincarvalho avatar Nov 27 '18 13:11 melvincarvalho

All this stuff about data management is not entirely unlike the situation that mobile systems like Android and iOS faces. Those systems also have application defaults for where stuff like contacts, photos and calendars are stored.

The Solid spec designers could choose to follow those systems and declare best practices for those kind of things. I suppose it is covered by the type registry specification - but someone has to store suitable defaults for calendar, photos and contacts types ... which I guess is something that should be done when a POD provider server creates a new POD.

And then the type registry needs a standard default type registration for apps/types that do not yet have a registration ... in which case the POD providers could make sure every new app out there was able to look up a suitable app data location for the currently logged in user ... and those defaults should of course be based on Solid "best practice" specs, as well as being configurable by the end-user.

JornWildt avatar Nov 27 '18 13:11 JornWildt

My answer would actually be the same as in #128: data footprints.

RubenVerborgh avatar Nov 27 '18 13:11 RubenVerborgh

In general, we need to aim much higher than filesystems like Windows and Android. They only offer file-level interoperability; we aim to offer data-level interoperability.

File-level interoperability works to the extent where the software can find a file. In case of interoperable applications (let's say, those that open a JPEG), locating that file is up to the user. In less interoperable applications (let's say, Skype), they just have a fixed location so no intervention is needed.

But when multiple apps need to read things like a user profile, we definitely need an automated way of looking these things up; just leaving things to the apps will create mini-silos on a single pod.

Another part of the answer is that the application logic itself should not have to care about this. Any data operation, read or write, should become a query (https://ruben.verborgh.org/blog/2017/12/20/paradigm-shifts-for-the-decentralized-web/#interfaces-become-queries-p-5), and it is up to the query engine (which the app logic interfaces with) to determine the concrete location based on machine-readable data footprint descriptions.

RubenVerborgh avatar Nov 27 '18 13:11 RubenVerborgh

I very much agree on the "just leaving things to the apps will create mini-silos on a single pod." position. There is a lot of education to be done on this subject ... such that "old school" devs like me don't end up screwing up the whole "data-level operability" promissed by Solid.

Actually I do find it difficult to design my app "right", which is why I ask for these guidelines. And right now I have a working app that works very much like a silo, except that it at least recognizes the logged in user's storage reference :-)

There is a lot of technical specs about the Solid API - but not much guidance regarding how to model data for apps. It would be nice to see something like that, thanks :-)

JornWildt avatar Nov 27 '18 14:11 JornWildt

All this stuff about data management is not entirely unlike the situation that mobile systems like Android and iOS faces

Do you have details on where app data is stored in these systems?

melvincarvalho avatar Nov 27 '18 14:11 melvincarvalho

Do you have details on where app data is stored in these systems?

It's as you guys said above:

So for example skype will store data in .skype in the home directory. Other operating systems will have a folder called "application_data".

Those systems also have application defaults for where stuff like contacts, photos and calendars are stored.

The exact details aren't that important, but rather the fact that they just pick a folder, and that such a mechanism is described out of band. We likely want explicit and in-band, since we likely do not want the centralized decision making that such OS'es have.

RubenVerborgh avatar Nov 27 '18 14:11 RubenVerborgh

Any data operation, read or write, should become a query

Disagree with this. Queries are useful in some cases, but not all.

melvincarvalho avatar Nov 27 '18 14:11 melvincarvalho

Do you have details on where app data is stored in these systems?

No.

JornWildt avatar Nov 27 '18 14:11 JornWildt

Disagree with this.

Sure, this is my personal opinion, not that of the Solid community.

Queries are useful in some cases, but not all.

I won't say useful or not; I'd rather argue that consistent usage of queries allows applications to be independent of file-level decisions, which as such can change without impacting anything. The extreme opposite (and there are solutions in between) is that an application assumes that all of its data will be in /apps/xyz/data.

I think queries, or other declarative constructs, are a good way of hiding such decisions. There are others, we still need to analyze pros and cons (a bit of that in the blog post I linked).

RubenVerborgh avatar Nov 27 '18 15:11 RubenVerborgh

Continuing from #128 as the title here seems more appropriate for this part here:

Continuing your example, a client executing a SPARQL query for all cats, would have to follow a link from your profile to pets, then on that page follow links to cats. A client writing your first cat to your pod would need to make the pets page and link it up, in addition to adding the cat.

Understood, I think. But where should the app create those resources for the first cat ever? If no other app has defined a location for resources of type petstore:pet then my app would need a guideline for defining that location. And in the end it would end up as apps/petstore/pets or something completely different unless someone sets a standard data structure for, well, "stuff" in the POD.

JornWildt avatar Nov 27 '18 17:11 JornWildt

If no other app has defined a location for resources of type petstore:pet then my app would need a guideline for defining that location.

Indeed, that is captured in the footprint description.

RubenVerborgh avatar Nov 27 '18 17:11 RubenVerborgh

Indeed, that is captured in the footprint description.

Okay. Does such footprints exist at the moment? I assume not, based on your responses (sorry if I am wrong) - so what should a Solid web-app-developer do today :-)

JornWildt avatar Nov 27 '18 17:11 JornWildt

Reconsidering this issue ... somehow the whole concept about /appdata/my-app seems like a code smell. The simple fact that it organizes data around a specific app instead of, well, data, starts the foundation of an app-specific silo!

Maybe it would be better to organize around data and think of apps as a secondary dimension (and maybe I'm just starting to understand and rephrase what Ruben has been saying all the time).

So, if the pet-app manages pets then Pets should be the foremost classifier - and be located in the root as /pets/ without mentioning whatever app is able to edit it ... and there could (should) be multiple apps that are able to do that (which speaks even more for removing the app name completely).

That is also in line with the idea of Contacts, Appointments and Photos being app-independent things that should neither be stored in Google's nor Microsoft's corner of my POD, should they some day start to offer Solid apps.

That leaves us with the issue of namespacing. What if different developers needs to use the same datatype name for different things?

JornWildt avatar Nov 27 '18 18:11 JornWildt

Reconsidering this issue ... somehow the whole concept about /appdata/my-app seems like a code smell. The simple fact that it organizes data around a specific app instead of, well, data, starts the foundation of an app-specific silo!

True, but there are some app specific things as your apps get more sophisticated. And it can help to store them somewhere. We have the pod, we have the preferences, maybe a directory for an app might help too.

So, if the pet-app manages pets then Pets should be the foremost classifier - and be located in the root as /pets/ without mentioning whatever app is able to edit it ... and there could (should) be multiple apps that are able to do that (which speaks even more for removing the app name completely).

I think this is in line with Tim's ideas of panes (which I have come to like more and more over time) so there's a couple of different styles in which to code apps. I've done both in my time.

That is also in line with the idea of Contacts, Appointments and Photos being app-independent things that should neither be stored in Google's nor Microsoft's corner of my POD, should they some day start to offer Solid apps.

Type indexes can help with this.

That leaves us with the issue of namespacing. What if different developers needs to use the same datatype name for different things?

Namespacing is built into linked data, and therefore solid, via ontologies. One issue is which is the preferred ontology. But the idea is that anyone is free to make an ontology and allow it to proliferate ie permisionless bottom up design. That can get, unfortunately, political, as different people prefer different name spaces.

melvincarvalho avatar Nov 27 '18 19:11 melvincarvalho

Namespacing is built into linked data, and therefore solid, via ontologies

Well, yes, but that does not apply to resource locations. If my:pet is different your:pet then what kind of pets should be stored at /pets/? Maybe just both types as long as we agree that the concept of a "pet" is the same for both.

And what about items with multiple types - a cat can be both a "pet" as well as an "animal in zoo". The "zoo" app could store cats under /animals whereas my pet-app would use /pets/ ...

And speaking of the type registry ... consider the pet example again: the pet-app manages a collection of pets. Does that mean the URL it registers in the type registry is going to be for the collection, like for instance a registration for the type pet-collection, or does it register for a single instance in the collection, for instance the type pet. And what are those type registrations supposed to link to - a single document with all pets in a single resource or a container?

Too many choices here ...

JornWildt avatar Nov 27 '18 21:11 JornWildt

Sorry for bashing this, but I'm trying to figure out the best way to store app-data.

Here is a very concrete example: I'm working on a web-app for logging flights with RC aircrafts. For this purpose, part of my app allows me to create/read/update/delete the locations I fly from. Those locations are stored as instances of schema:Place as well as my own solidrc:Location - and I can choose one of them when I make a flight registration in my logbook.

What should happen with respect to the type registry? There could be a registration for schema:Place, meaning the user wants his/her "places" to be stored there. But does that really mean all places? Would you store places you visited on vacations (from your "tourist" guide web-app), famous locations from the Battle of Britain (should that interest you) together with places you regularly visit while training your dogs or flying with your RC-models? Maybe :-)

That collection could be nice to have available for driving instructions in some sort of mapping app - but in your flight-app you only want to see a short list of your flying locations.

A solution could be to store all "places" in a central /places/ folder and then, in the flight-app, only display those places that also have the specialized type solidrc:Location.

Again, lots of choices - could be nice with some kind of guidelines :-)

JornWildt avatar Nov 27 '18 22:11 JornWildt

Continuing on this line of thought:

Data belongs to the end-user, not the applications working on the data.

Take "photos" as another example - my app lets the end-user upload photos of their aircraft models to be shown as thumbnails in the logbook. Where should those photos be located?

Let's assume "photos" is generic enough to have a type registration - the end-user (or the POD service provider, or another app) has decided to make a registration for the type schema:ImageObject, pointing to /images. My RC-model app looks for that and stores the uploaded photos in /images.

But who says the end-user wants photos of aircraft models mixed with photos of the kids and previous family vacations? Probably not - she would most likely rather store them in a specific location based on what ever preferences she has.

So how is any app going to be able to choose a suitable data location without asking the end-user? That is, assuming the app wants to be a non-data-silo app - otherwise it could simple choose /apps/my-app/photos.

All this probably ends up with "Ask the user" - if the app has any data worth sharing, it should allow the end-user to choose a location (and suggest something like /images/aircraft-models for the example here..

JornWildt avatar Nov 28 '18 06:11 JornWildt

Wow, didn't know Github would cross reference links in that way!

JornWildt avatar Nov 28 '18 06:11 JornWildt

And here is another data storage guideline that would be nice to have: given that my pet-app stores a list of "my pets" ... should that list be stored as one big turtle document in for instance /pets/my-pets.ttl or should you rather split it into multiple resources like for instance /pets/my-pets/pet1.ttl, /pets/my-pets/pet2.ttl and so on? And if you choose multiple resources, what should then the naming convention be? If it's a counter, where is the counter then located? Should it be a GUID? Should it be a date? Should it be a "cool name" like /pets/my-pets/mc-allister.ttl or /pets/my-pets/rocky.ttl?

One giant turtle document would be nice for querying with SPARQL (and even allow for both sorting and paging with LIMIT and ORDER BY), but it would probably be rather difficult to share individual pets with friends (since access control lists works at ressource level - right?).

Again, lots of choices - would be nice with some sort of guidelines or discussion of data management.

JornWildt avatar Nov 28 '18 07:11 JornWildt

And yet another observation - it may not be such a good idea to store things directly under the root, like for instance /pets, for the simple reason that you cannot browse it from the data browser (it will show your "homepage" with "This is a public homepage of X, whose WebID is Y ..." instead of listing your root folders.

So it should probably be /data/pets instead, not /public/pets as I may not want my pet collection to be public and not /appdata/pets as that indicate the data belongs to an app (as opposed to the end-user).

JornWildt avatar Nov 28 '18 10:11 JornWildt

That latter thing is a UI bug we should fix.

RubenVerborgh avatar Nov 28 '18 10:11 RubenVerborgh

That latter thing is a UI bug we should fix.

Ah, okay, then its back to /pets again :-)

JornWildt avatar Nov 28 '18 11:11 JornWildt