Support generating sitemap for SEO
WIP. Good to have a early review.
Any chance on getting this merged? I'd like to add a sitemap to my wiki as I'm using it for my public-facing website. For the time being I can generate a sitemap by hand, but the more pages I have that's going to be tedious and annoying.
@lawrenceching Some things we should add to keep the sitemap compliant, is have an option to set the priority page in the sitemap. Valid values are 0.0 to 1.0. Another thing to add is a change frequency, but a default value could be assigned for monthly or weekly.
Adding the <priority> component to the sitemap and page properties editor will require a change to the database schema by adding a new column.
https://www.sitemaps.org/protocol.html
That said, Google Search Console is indexing my wiki without the sitemap being supplied to it. I guess providing the Google Analytics token in the Administration dashboard is good enough?
@BPowell76 I used a default priority and change frequency for every page. And allow users to change them in config file.
@NGPixel I think my change is ready for review. Could you have a look, especially at the permission checking part?
I don't know if this is going to be merged, but I figured out how to generate a sitemap.txt file in the Linux terminal doing the following and using cron jobs to sync everything:
# generate-sitemap.sh
#
# database name for me is "wiki", replace accordingly
# query database and output results to file
psql -U postgres -d wiki < /var/lib/postgresql/select-published-pages.sql > sitemap.txt
# count rows and delete the first and last 2 rows
# the text output has a column name and a ascii table row line "-------"
# the last two rows for me were the number of rows returned and a blank line
ROWCOUNT=$(sed -n "$=" sitemap.txt)
SECLASTROW=$(($ROWCOUNT-1))
sed -i "${SECLASTROW},${ROWCOUNT}d" sitemap.txt
sed -i "1,2d" sitemap.txt
-- select-published-pages.sql
--
-- annoyingly, psql doesn't like camel case column names and prefers snake case
SELECT CONCAT('https://[insert_your_url_here]',path) AS "URL"
FROM pages
WHERE "isPublished"='t'
ORDER BY path ASC;