Is it necessary to store file content in db.json for large blog?
I have nearly about 800 markdown files, and it leads db.json increment to about 20M.
I don't think It is necessary to store content within db.json.
Yeah @ahuigo that is an ongoing discussion what to do with large sites. Any ideas? Just keep in memory or what? The db.json is a cache so it doesn't re-parse anything it already parsed.
Some ideas about decreasing the building time of Hexo.
- The
db.json- Stores only markdown files's meta info(
path,title,date,updated,category). and building info such aslast building time. - Don't cache the whole file content in
db.json. Read the content directly from file system If we need it.
- Stores only markdown files's meta info(
- We can just find out the modified files via
git,find, or other tools . https://stackoverflow.com/questions/16085958/scripts-find-the-files-have-been-changed-in-last-24-hours - Support incremental building. We can just build the modified files only when build site every time. Building should not relate to unmodified files .
For example:
# hexo g;
# {build_meta:{'last_time':'2018-09-29...'}, files_meta:{...}}
dbinfo = parse('db.json')
cmd = 'git diff-index --cached --name-status --diff-filter=ACMRD HEAD -- ./_posts '
output = getoutput(cmd).strip()
if output:
# find out modified files and deleted files
modified_blogs = {}
delete_blogs = []
for line in output.split('\n'):
status, path = line.split('\t')
if status == 'D':
delete_blogs.append(path)
continue
blog = parseBlog(path)
modified_blogs[path] = blog['meta']
# delete file
if path not in dbinfo['files_meta']:
html_path = f'public/{path}.html'
getoutput(f'rm {html_path}')
hexo_delete_tags(file_meta)
hexo_delete_category(file_meta)
# add & update file(Incremental Building)
for path,file_meta in modified_blogs.items():
hexo_generate_html(path)
hexo_add_update_tags(file_meta)
hexo_add_update_category(file_meta)
# save db.json
hexo_update_db('db.json',modified_blogs, delete_blogs)
I've written a script to generate static blog. https://github.com/ahuigo/a/blob/master/tool/pre-commit It's only for my own use, not for hexo.
See also https://github.com/hexojs/warehouse/issues/13
I'll close this issue, because the major performance overhead of Hexo is not reading or writing db.json, but processing the cross-refs, e.g. finding posts with a tag or tags of a post.
See https://github.com/hexojs/hexo/issues/2579#issuecomment-1328065978