hexo icon indicating copy to clipboard operation
hexo copied to clipboard

Is it necessary to store file content in db.json for large blog?

Open ahuigo opened this issue 7 years ago • 3 comments

I have nearly about 800 markdown files, and it leads db.json increment to about 20M.

I don't think It is necessary to store content within db.json.

ahuigo avatar Sep 26 '18 14:09 ahuigo

Yeah @ahuigo that is an ongoing discussion what to do with large sites. Any ideas? Just keep in memory or what? The db.json is a cache so it doesn't re-parse anything it already parsed.

tcrowe avatar Sep 29 '18 19:09 tcrowe

Some ideas about decreasing the building time of Hexo.

  1. The db.json
    1. Stores only markdown files's meta info(path,title,date,updated,category). and building info such as last building time.
    2. Don't cache the whole file content in db.json. Read the content directly from file system If we need it.
  2. We can just find out the modified files via git ,find, or other tools . https://stackoverflow.com/questions/16085958/scripts-find-the-files-have-been-changed-in-last-24-hours
  3. Support incremental building. We can just build the modified files only when build site every time. Building should not relate to unmodified files .

For example:

# hexo g;
# {build_meta:{'last_time':'2018-09-29...'}, files_meta:{...}}
dbinfo = parse('db.json') 
cmd = 'git diff-index --cached --name-status --diff-filter=ACMRD HEAD -- ./_posts '
output = getoutput(cmd).strip()
if output:
    # find out modified files and deleted files
    modified_blogs = {}
    delete_blogs = []
    for line in output.split('\n'):
        status, path = line.split('\t')
        if status == 'D':
            delete_blogs.append(path)
            continue

        blog = parseBlog(path)
        modified_blogs[path] = blog['meta']

    # delete file
    if path not in dbinfo['files_meta']:
        html_path = f'public/{path}.html'
        getoutput(f'rm {html_path}')
        hexo_delete_tags(file_meta)
        hexo_delete_category(file_meta)

    # add & update file(Incremental Building)
    for path,file_meta in modified_blogs.items():
        hexo_generate_html(path)
        hexo_add_update_tags(file_meta)
        hexo_add_update_category(file_meta)

    # save db.json
    hexo_update_db('db.json',modified_blogs, delete_blogs)

ahuigo avatar Sep 30 '18 02:09 ahuigo

I've written a script to generate static blog. https://github.com/ahuigo/a/blob/master/tool/pre-commit It's only for my own use, not for hexo.

ahuigo avatar Oct 06 '18 13:10 ahuigo

See also https://github.com/hexojs/warehouse/issues/13

stevenjoezhang avatar Nov 26 '22 15:11 stevenjoezhang

I'll close this issue, because the major performance overhead of Hexo is not reading or writing db.json, but processing the cross-refs, e.g. finding posts with a tag or tags of a post.

See https://github.com/hexojs/hexo/issues/2579#issuecomment-1328065978

stevenjoezhang avatar Nov 29 '22 08:11 stevenjoezhang