batfish Reconsidering how dataSelectors work

dataSelectors is intended to meet two requirements:

You want to use data from the static analysis of your pages within your pages. Typically this means you want to build a navigation or index or table of contents. For example, you want to make a sidebar that lists all of your "example" pages — so you depend on the static analysis of your pages to populate that.
You want to pull in data from elsewhere (maybe a large file, maybe an API) but don't want all that data ending up in your JS bundle, just some.

Looking at the way some similar libraries handle this:

In Next.js, your page component can have a getInitialProps static property. It should return serializable data. https://github.com/zeit/next.js#fetching-data-and-component-lifecycle
In Gatsby your page module can have a named query export that's a GraphQL query. https://www.gatsbyjs.org/docs/querying-with-graphql/

In contrast, in Batfish you define dataSelector functions within the config. One significant advantage of this approach is that the dataSelectors are defined in an environment that's clearly Node-only, so you're free to import and use fs or http or whatever else you need in order to put together your serializable data, without fearing you'll break or bloat the client-side bundle. One significant disadvantage is that the connection between any given dataSelector and any given page isn't as clear. And I really don't like our current solution, where you import the serialized data from a temporary file. (See https://github.com/mapbox/batfish/issues/217).

Quickly brainstorming some possible new approaches:

Use a static property on page components, like Next.js, but just have it specify which data to inject — while the selector functions themselves, with their logic, still live in the config, in Node. For example: MyPage.injectData = ['xxx'], which would correspond to dataSelectors: { xxx: () => {...} }, and MyPage would get an xxx prop.
Batfish exposes a higher-order component that you can use to wrap your page and name the data it gets. For example, injectData(['xxx'])(MyPage) (otherwise same as the idea above).

Jul 07 '18 01:07 davidtheclark

I agree we can improve the way data selectors work. As @davidtheclark mentioned above next.js uses a static property on a react class. This feature relies on getInitialProps being executable in both browser and server environment. However, in our case, the data has to be known at the time of compiling (build step), which is good for us as we can reduce the complexity of the problem.

@davidtheclark mentions above,

One significant advantage of this approach is that the dataSelectors are defined in an environment that's clearly Node-only, so you're free to import and use fs or HTTP or whatever ...

On this topic of node envs, I noticed that the next.js doesn't seem to be shy of housing the browser and node code under the same roof. The example mentioned in next.js/fetching-data seems to use environment dependent parameters

getInitialProps receives a context object with the following properties: pathname - path section of URL req - HTTP request object (server only) res - HTTP response object (server only) (*notice the server only params)

If we want to make the connection between dataSelector and given page clear, maybe we can do something similar i.e. give some helper tools as a parameter to the callback. There are definitely some tradeoffs with this approach:

The risk of users doings import fs from 'fs'. I guess these errors can be easily caught in the build step. If a user wants to use X npm library to help with dataSelection, they can list that library in the batfish.config.js and then it would be provided as a parameter to the callback.
Letting node code mix with browser code, leads to a lot of if (typeof window === 'undefined') style of coding (ref). If we are willing to accept this necessary evil and do it in a controlled fashion (see the 1st point), we can finally put an end to the env debate.

Coming back to the two possible approaches @davidtheclark mentioned, I am more in favor of the second approach. As I mentioned above next.js uses the static property approach because they need to execute the function, if we plan on keeping the node code separate from browser code there is no point in having a static property which is magically processed by some code and the component gets the relevant props. I think the best way of providing props out of thin air is to use HOC (similar to react-router or react-redux). If we wanna keep them in the same file, we can still use the HOC something like this.

class MyComp {
  static propTypes = {
    // we can assert the data, Yay!
    myData: PropTypes.required
  }
}
export default dataSelector((paths, fs) => {
   // do the dataSelection
})(MyComp)

Jul 19 '18 07:07 kepta

In order to avoid adding unused code to the browser bundle the higher-order component would still have to be magical. A Babel plugin would be needed in order to replace the function call with the actual serialized data that is needed, right? And in order to pass fs in as an argument to a function that is part of the browser bundle, would fs have to be included in the browser bundle?

Jul 19 '18 15:07 davidtheclark