Babel

SEO Friendly Multilingual Websites with MODx and Babel

In my previous article about setting up multilingual websites with MODx and Babel I described a solution which is based on different (sub)domains for each language. This domain based approach is implemented easily but has some drawbacks in a SEO point of view: By using different domains for each language you automatically split up your site into several single sites. Each site will be handled separately by search engines. For example they won't share the same page rank and backlinks. Using one domain and subfolders for each language may improve your site's overall ranking: All backlinks are connected to your top level domain. In this article I'll describe a possible solution of how to setup a multilingual website with MODx and Babel by using one domain and subfolders for each language.

This article doesn't focus on the SEO point of view. It's rather a technical tutorial of how to setup a multilingual website by using subfolders. If you'd like to read more about the "(sub)domains vs. subfolders" topic you may search the web (there are a lot of articles about this topic) or read some of the following posts of other blogs:

Technical Background

I'll describe the procedure of setting up the a multilingual site by providing a fictional example site http://www.example.com. The main website is reachable via http://www.example.com/ and is available in two languages:

  • German: http://www.example.com/de/
  • English: http://www.example.com/en/

For each language we are using one context: web for German, en for English. To determine the proper culture key we are using some rewrite rules in the .htaccess file. These rules check for the first subfolder of the requested URL and set the cultureKey request parameter which is used by MODx to initialize the lexicon.

Prerequisites

Before starting with this tutorial you should be sure that all requirements for a multilingual site are satisfied:

  • Friendly URLs are enabled: friendly_urls and use_alias_path are set to yes (1)
  • The Apache rewrite engine is activated and the rewrite base is set correctly:
    RewriteEngine On
    RewriteBase /
    
    If you're running your site in a non-root directory like /subfolder/mysite/xy you have to define your rewrite base like this:
    RewriteBase /subfolder/mysite/xy/
    
  • The base URL is set via the <base> Tag in your HTML head of all your templates:
    <head>
    	...
    	<base href="[[++site_url]]" />
    	...
    </head>
    

Step by Step Instructions

You have to follow the five steps described in my previous article about setting up multilingual websites and one additional step:

  1. Create your contexts for each language: no differences to domain based approach.
  2. Configure language specific settings of all your contexts: site_url, cultureKey and base_url.
    Differences: Instead of using different domains for the site_url setting you have to use subfolders and additionally specifiy the base_url according to the context's cultureKey:
    web context: site_url: http://www.example.com/de/ base_url: /de/
    en context: site_url: http://www.example.com/en/ base_url: /en/
    Hint: You should also define settings like site_start (id of default landing page), error_page, etc. for each of your contexts.
  3. Grant the "Load Only" access policy for all your contexts to the anonymous group: no differences to domain based approach.
  4. Create a gateway plugin which listens to the "OnHandleRequest" event to load the correct context.
    Differences: Instead of using the requested domain to determine the context, the cultureKey request parameter is used which is set by some rewrite rules (see below). Additionally there is no need to set the cultureKey of the modx object anymore:
    <?php
    if($modx->context->get('key') != "mgr"){
    	/* grab the current langauge from the cultureKey request var */
    	switch ($_REQUEST['cultureKey']) {
    		case 'en':
    			/* switch the context */
    			$modx->switchContext('en');
    			break;
    		default:
    			/* Set the default context here */
    			$modx->switchContext('web');
    			break;
    	}
    	/* unset GET var to avoid
    	 * appending cultureKey=xy to URLs by other components */
    	unset($_GET['cultureKey']);
    }
    
  5. Install the Babel Extra via package management: no differences to domain based approach.
  6. Change existing rewrite rules for friendly URLs and add additional rules to your .htaccess file (see next section for detailed description):
    # The Friendly URLs part
    # detect language when requesting the root (/)
    RewriteCond %{HTTP:Accept-Language} !^de [NC]
    RewriteRule ^$ en/ [R=301,L]
    RewriteRule ^$ de/ [R=301,L]
    
    # redirect all requests to /en/favicon.ico and /de/favicon.ico
    # to /favicon.ico
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteRule ^(en|de)/favicon.ico$ favicon.ico [L,QSA]
    
    # redirect all requests to /en/assets* and /de/assets* to /assets*
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteRule ^(en|de)/assets(.*)$ assets$2 [L,QSA]
    
    # redirect all other requests to /en/* and /de/*
    # to index.php and set the cultureKey parameter
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule ^(en|de)?/?(.*)$ index.php?cultureKey=$1&q=$2 [L,QSA]
    

Adding Rewrite Rules

To make your multilingual site work properly you have to add some rewrite rules to your .htaccess file which handle requests to your (physically non-existing) language subfolders.

First you have to replace the default rewrite rule shipped with the MODx ht.access file to the following:

# The Friendly URLs part
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(en|de)?/?(.*)$ index.php?cultureKey=$1&q=$2 [L,QSA]

If you're using other languages than German and English you have to change the (en|de) part of the rewrite rule according to your needs. For example for a website available in English, Spanish and French you would use the following rewrite rule:

RewriteRule ^(en|es|fr)?/?(.*)$ index.php?cultureKey=$1&q=$2 [L,QSA]

Ok, now your pages should be accessible via the language subfolders and linking your pages with relative links should work, too.

But there is still a problem regarding relative links: linking assets like CSS, JavaScripts, images etc. won't work properly. Normally all these files are located somewhere in the assets subfolder of your MODx root directory. When including an asset via a relative URL like assets/css/style.css the asset won't be found:

  1. The browser will try to request something like http://www.example.com/en/assets/css/style.css because the site's URL http://www.example.com/en/ (defined via the site_url context setting in step 2) is used to handle relative URLs.
  2. The rewrite rule from above will be applied and the request will be internally forwarded to http://www.example.com/index.php?cultureKey=en&q=assets/css/style.css
  3. MODx won't find any resource matching the alias assets/css/style.css and will return a 404 error code.

To solve this problem you have to add another rewrite rule before the rule from above which internally redirects all request to /[ck]/assets/* to /assets/* where [ck] is a valid culture key:

RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(en|de)/assets(.*)$ assets$2 [L,QSA]

Fine! Now you can use relative links for your pages and assets. Including images with TinyMCE should work, too.

You may want to add some additional rewrite rules for other files which are being referred via relative URLs. For example the favicon.ico in your root directory:

RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(en|de)/favicon.ico$ favicon.ico [L,QSA]

Additionally the server should automatically determine the language when the domain root (http://www.example.com/) is requested and perform a redirect to the suitable language version in the following way:

  • When the accepted language is not German (de) and the root has been requested (relative request URI is empty) redirect to the English version (en/): see first condition and rewrite rule from below (line 2 and 3).
  • Otherwise redirect to German version (de/) when the root has been requested: see second rewrite rule from below (line 4).
# detect language when requesting the root (/)
RewriteCond %{HTTP:Accept-Language} !^de [NC]
RewriteRule ^$ en/ [R=301,L]
RewriteRule ^$ de/ [R=301,L]

Hint: That's very rudimentary. The condition only checks whether the value of the Accept-Language HTTP header variable begins with the language (culture) key. But this variable contains much more than only a language key: Its a list of preferred (or even non-preferred) keys like this: Accept-Language: de-de,de;q=0.8,en-us;q=0.5,en;q=0.3. The q variable specifies the importance of the language from 0 to 1. Detecting the language with PHP in the gateway plugin is much better. But this is not the topic of this article and will be discussed in another post.

Ok now all rules and conditions can be added to your .htaccess file. It's very important to place them in the right order because Apache goes through the rules from top to bottom:

# The Friendly URLs part
# detect language when requesting the root (/)
RewriteCond %{HTTP:Accept-Language} !^de [NC]
RewriteRule ^$ en/ [R=301,L]
RewriteRule ^$ de/ [R=301,L]

# redirect all requests to /en/favicon.ico and /de/favicon.ico
# to /favicon.ico
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(en|de)/favicon.ico$ favicon.ico [L,QSA]

# redirect all requests to /en/assets* and /de/assets* to /assets*
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(en|de)/assets(.*)$ assets$2 [L,QSA]

# redirect all other requests to /en/* and /de/*
# to index.php and set the cultureKey parameter
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(en|de)?/?(.*)$ index.php?cultureKey=$1&q=$2 [L,QSA]

If you'd like to go deeper into defining rewrite rules you should read more about the Apache mod_rewrite module.

Is this approach optimal?

This solution works fine and editors can work as they did before without caring about relative links and subfolders. But I think this approach is rather a workaround than an optimal solution:

  • When linking assets relatively you link to non-existing "virtual" files.
  • By applying the rewrite rules for the assets the same file is served via several different URLs: http://www.example.com/assets/css/style.css, http://www.example.com/de/assets/css/style.css and http://www.example.com/en/assets/css/style.css return the same content.
  • Files which are used in all language versions are not cached for the whole site by your browser: The browser doesn't know that http://www.example.com/de/assets/css/style.css and http://www.example.com/en/assets/css/style.css are the same.
  • When working with the GoogleSiteMap Extra you won't be able to serve a sitemap.xml for your whole site without modifying the Extra manually. This is because your documents are distributed over several contexts and GoogleSiteMap is only capable of creating a sitemap for one context. XML sitemaps are very helpful to tell a search engine bot where to find all your pages. So you should use them and they should list all pages of your site!

The optimal approach: Babel 2.3!

In my opinion the "cleanest" way to manage a multilingual website would be using one single context for the whole site and placing your documents into resource containers for each language. At the moment Babel doesn't support this approach. But I have already developed a concept of how to change and extend Babel to be able to use the Extra for both approaches: domain based and subfolder based.

Therefore I'll introduce an additional level of abstraction into the Babel's architecture which will make it possible to run Babel in domain-based or subfolder-based mode. The look and feel will remain the same and the new version should be compatible with older ones.

I'll need to change a lot of code and currently I'm working on some other (payed) projects, too.

I hope you like my plans and I appreciate your feedback!