I18n of URLs

Zach Leatherman asked what style of internationalization you'd prefer (explicit or implicit) and my response just doesn't fit in 280 characters.

Released: 8. Jul 2022
Tags:
Share this on: Twitter

Preface

Why are we here

Some days ago Zach Leatherman used the official 11ty twitter account to poll which style of URL internationalization (i18n) one would prefer and while I do have an opinion, it's impossible for me to reduce it down to 140 280 characters, so here we go having a 1500 words blogpost about it.

What is my background?

I've been building websites for some time (in fact, my first live website is now more than a dozen years ago which feels like an eternity on the web...). While most of my sites are only supporting one language, I had my share building stuff available in at least two languages like this page for an acquaintance of mine.

As you can see there I chose the explicit style in that project, but in others I chose the implicit style. This post is partly to explain my reasoning and partly to show you what you could look out for when making the decision and maybe I'll even highlight some dark patterns along the way.

I18n is hard

Internationalization is hard, just as hard is spelling it correctly. Therefore I will use the short form (i18n) from now on (it stands for "i" + 18 characters noone can be bothered to write + "n").

Internationalization is hard. Even more so when you want to get it right and in my opinion the correct™️ way is not always the same, depending on your project.

Ways to build URLs when doing i18n

There are four common ways to build URLs when doing i18n. These are:

  • explicit: .../de/some/path and .../en/some/path
  • explicit (translated): .../de/ein/pfad and .../en/some/path
  • implicit: .../some/path and .../en/some/path
  • implicit (translated): .../ein/pfad and .../en/some/path

Sidenote on building language selects

In addition to this you should offer a way for the user to easily switch languages.
Easy in this case is also key!

Try imagining that you're a dutch native-speaker who only speaks dutch and you're on a website with a language select.
Below I've implemented four language selectors and I want you to switch them to dutch.
The names of the lanugages are in german on purpose as to make it a little more interesting for english native speakers while this would make it easier for dutch speakers.

Now that you might have made it to the correct answer try imagining beeing an arabic speaker and try to switch back to that. You can't even read the letters or you are a german speaker and now the name of your language is completely different (the german word for "german" is "deutsch") and has a completely different position in an alphabetically ordered list.

Take another look at the selects and think about which one is probably easiest to use, especially when it's not just seven languages. So to make it easy to use IMO you should follow these rules:

  • Start each entry with the language as it's written in said language
  • Order the list alphabetically (by localCompare for languages not using the latin character set)
  • Translated language names are only secondary information

If you follow these rules, it will probably be easy for your users to efficiently switch between languages.

Sidenote to my comment block: While it might sound wrong, it's not at all meant to downgrade indians! This is mainly (as far as I heard from my indian colleagues) an outphasing issue coming from the time directly after the british raj where the people who spoke english were placed higher in society by the british. Even after they left, speaking english is still considered a status symbol and many indians try to seem as literate in english as possible for social reasons and this includes setting personal devices to english instead of one of the indian languages. Another push factor for this was also the missing support for the indian languages in such systems.

If you're not doing something like this, but instead set the users language based on the users location:

Shame on you!

Someone might be using a VPN or simply be on vacation. Location ≠ language! On the same note: don't use only flags as language selects. Did you ever try to use such an input when having color blind mode activated? Try deciding between ireland or france on a bad screen or france and italy with a green-blue problem and this isn't yet talking about indonesia vs. monaco or ramonia vs. chad or luxmbourg vs. netherlands.

Slightly better, but still bad is to listen for the browser settings. Modern browser send an accept-language header based on your OS and browser settings, but that might not be correct. This can be either because the system isn't set up correctly, or maybe because of social reasons like in India, where many set their devices to english while not beeing good at understanding the language.

My opinion on Explicit vs. Implicit URLs

Let's get to why you came here...

To put it in a nutshell: Both options are fine - at least often.

Questions for you and your stakeholder

The following questions are my go tos when deciding on wether or not to use the explicit or implicit URL scheme.

Do I have a target audience with a well known language?

If e.g. you are building a site for a french company that exclusively sells french books in france, your target audience is probably speaking french. You might add some basic i18n to your page, but it's definetly not your first prio. In that case it's probably good to use the implicit URL scheme to simplify URLs.

Do I have a main language?

Maybe your audience doesn't have a main language, but you do. This might be the case if for example you are writing your blog in german, but have some posts that you also translate to english. Your content is clearly german-first and that's your main language. In such a case it might also be a great idea to either use the implicit or implicit (translated) URL scheme.

Might my main language change / disappear in the future?

Disappearing here means that you no longer have a main language and not that you just throw everything in your main language out!

You might have a main language now, but maybe in the future your business wants to go international or you might even want to switch to english as your main language like Uberspace did in 2019. In that case having chosen an implicit style of URLs will be a downside! I would highly recommend going with one of the explicit schemes.

Do I have / want to have full language coverage of everything?

This is a question (speaking from my experience) stakeholders will change their mind on multiple times during one meeting. First they'll say yes, then they'll see the effort of translating and say no, then they'll might see some legal implications and say hard yes and finally they find some diverging paths (typically some days after the meeting) and revert back to no.

Shoutout to Wikipedia here, who managed to have one of the best i18n solutions I've seen to date. They fully implemented the explicit/translated scheme using subdomains.

This is not only a question about wether or not you want to support full translations for main content, but also wether or not you will have diverging content per language. An example for diverging content might be a online shop that sells certain items only in some countries. An example for full coverage (or at least aiming for that) would be wikipedia.

How is my tooling support for each URL scheme?

And finally you should know what tooling you expect to use. Some tools might make one style easier than another.
If you prefer a non-translated scheme, switching between languages is as easy as repalcing just a part of the URL, while with translated URLs you have to know the translation. Explicit URLs als make the problem of repalcement easier than implicit.

In my experience when building with eleventy (11ty) explicit style URLs are the easiest to work with.

So where's my nutshell?

In the end all ways to internationalize your URLs are probably fine and doable, but I would always tend to explicit URLs (bonus points for translated versions), since they are easier to adapt to more usecases in the future.

There's also the argument of humans not being able to parse URLs anyways. I see that argument, but I for one do not completely agree with that and secondly I think we shouldn't use that as an excuse to make them even harder.

And now go out there and build sites with great i18n.