Starting with the Indieweb

indieweb

Over the past few weeks I have been working to get this personal website and blog "on the indieweb" and in large part I have succeeded. Here is an account of my experience and the challenges I encountered.

TL;DR

  • IndieMark is a great roadmap and priority list for your IndieWeb-site development
  • Using GravCMS I was able to quickly get my website to IndieMark Level 1
  • Multiple domains + 308 redirects + HSTS can mess with SEO on Bing/Yahoo/DuckDuckGo and Yandex.
  • The default GravCMS robots.txt does not work very well

Getting to IndieMark Level 1

I chose the IndieMark criteria as a roadmap and set of priorities for the project, specificaly I wanted to get to at least level 1 relatively quickly. I considered this to be the minimum feature-set that was required before I could claim to be an #indiewbe-er.

Identity: #ownyouridentity - Have a personal domain that you use as your primary identity on the web

This requirement was pretty straightforward, I had bought a number of domains (dfoley.ie, dsofeir.net, and davidfoley.org) a good while back and they were sitting idle. So I chose to use www.dfoley.ie and then redirect all the other domains (*dsofeir.net, *davidfoley.org, dfoley.ie) to this. Setting up www.dfoley.ie as the canonical domain was not a problem in terms of TLS certificates, configuring Apache, etc. However SEO did present some issues:

  1. I turned HSTS on for all domains and configured Apache to redirect any requests on any domain to "https://www.dfoley.ie/*". Google correctly consolidated the domains, identifying www.dfoley.ie as the canonical domain. However the Bing/Yahoo/DuckDuckGo and Yandex bots don't seem to correctly identify that the other domains are essentially an alias for www.dfoley.ie. They complain of crawl errors, absent or misconfigured robots.txt files, etc. This seems to negatively effect search ranking as attention is divided across multiple different domains. This is compounded by the use of HTTPS as the bots are under the impression that I have approximately 12 "different" sites where as I am trying to indicate that there is one canonical domain.
  2. I used HTTP-308 redirects in all cases. None of the search engine crawlers liked this. So I switched to 302 and this seemed to resolve the confusion for Google, however the other search engines had he same difficulty as described above.
  3. I setup a robots.txt file at https://www.dfoley.ie/robots.txt. Initialy I used the default provided with GravCMS documentation:
User-agent: *
Disallow: /backup/
Disallow: /bin/
Disallow: /cache/
Disallow: /grav/
Disallow: /logs/
Disallow: /system/
Disallow: /vendor/
Disallow: /user/
Allow: /user/pages/
Allow: /user/themes/
Allow: /user/images/

The search engines complained that the policy set by robots.txt was prohibiting them from crawling the site. Google said it would ignore the direction as it conflicted with the meta tags and it considered the homepage "important". The other search engines just did not crawl the site.

From my interpretation of the file I thought it should permit crawling as there was not explicit Disallow: / or Disallow: *. In any case I added the line Allow: / to explicitly permit crawling of everything other than what is disallowed.

I have created a pull request on the GravCMS GitHub project to resolve this issue.

Authentication: #useyourownidentity - Setup web sign-in for login. Sign-in to https://indieweb.org/, and create your user page, linking to your personal domain.

Following the directions on indieweb.org was easy enough here and it worked well in the end. Two small issues I came across:

  1. I did not have mod_deflate enabled in Apache and when I input my domain at indielogin.com to login the page gave an error saying it didn't accept the encoding and would only accept gzip/deflate. So I enabled mod_deflate and everything worked.

Interestingly GravCMS does not particularly like Apache mod_deflate. I set cache.gzip: true and cache.allow_webserver_gzip: true in user/config/system.yaml and everything seems to work.

Posts: #ownyourdata - Post some kind of original content on your own site

The blog functionality provided by the Quark theme for GravCMS was provided everything I nedded in this regard. Out of the box it provides permalinks and maks posts up with the h-entry microformat. The h-entry markup however does NOT include the author and does not properly include the 'p-name' property when the hero image is set.

Search: searchability is required for IndieMark level 1

This does not meaning you need a search box element in your sites UI, it just means that your site should be searchable by using a search engine. In practice this means having a permissive robots policy and making sure your site is mostly standards compliant HTML.

Further work

I am please with my progress and I am now working on getting the site to IndieMark Level 2. I will also be releasing a GravCMS theme based on Quark which will help users create a an indieweb-site quickly.

Credits

Thanks to unsplash-logoJukan Tateisi for the photo used in this post

Previous Post Next Post