Internationalisation as a Postgres Data Type

Rationale

I've seen various approaches to storing translated data within a database, all treating it as a problem of schema design. However, I believe it can be approached as a data-type problem - what I call "atomic i18n". This "library" is my attempt at a proof-of-concept / prototype implementation of that idea, designed to be run in a vanilla install of PostgreSQL.

For full background, see my somewhat long-winded blog posts: Part 1: Database i18n as a Data Type Problem and Part 2: Translated Text as a Data Type in PostgreSQL.

Download

You can download the latest version of the "library" in one convenient ZIP file: i18n.zip.

The files included (to be run in this order) are:

env-pgsql.sql implementation of environment variables in Pl/PgSQL (see env documentation)
i18n-setup.sql contains the basic initialisation or the schema, type, and table
i18n-functions.sql contains both the "private" and "public" functions for manipulating the type
i18n-operators.sql contains operators for easier use of the main functions
i18n-example.sql contains some example code demonstrating the different functions, using the language table itself

Implementation

For full details of the implementation, please look at the source code, but the basic elements are:

the translated_text type is implemented as a simple Postgres array
translations are stored in the array based on a generated "language ID", since the indexes must be integers
for practical use, translations are referenced by "language codes", such as 2-letter codes from ISO 639-1, or the IETF language tag (e.g. "en-GB", "en-US") widely used online
abstraction functions and operators map internally from IDs to codes, so that operations can be performed safely and without knowledge of (or reliance on) the underlying architecture

Functions

Creation

translated_text i18n.translation_pair(language_code text, translation text)

language_code ==> translation

Create the most basic type of translated_text value - a single language, and the translation for that language.

The ==> operator is deliberately similar to the => operator used by hstore, but cannot be identical, as it would create ambiguity since the source data-types are the same.

translated_text i18n.list_to_translation(list_of_pairs text[][])

Create a whole translated_text value from a multi-dimensional array; useful for specifying literals in queries, or generating values from application code. Each element of the array should be a 2-dimensional array containing a language code and the appropriate translation. (Since Postgres arrays are inherently multi-dimensional, this is not enforceable in the function signature).

e.g. Select list_to_translation( ARRAY[['en','Hello'],['fr','Bonjour']] );

Retrieval

text i18n.translate(translated_item translated_text, language_code text)

translated_item -> language_code

The most obvious operation on a translated_text value: request the translation for a particular language, using the appropriate fall-back language(s) if necessary.

The -> operator is used by the hstore associative array type with a similar meaning.

void i18n.set_lang(language_code text)

Sets an "environment variable" using env.set (see env documentation) to represent the default language to use for all queries in the current "session".

text i18n.translate(translated_item translated_text)

translated_item ->!

Translate the value into the language set with i18n.set_lang, saving you from mentionning the required language multiple times when writing a query.

The ->! operator means "just translate it!" ;)

text[][] i18n.translation_to_list(translated_item translated_text)

The reverse of i18n.list_to_translation - convert the translated_text item into a multi-dimensional array. Use this for retrieving all the current translations of a particular item in an implementation-independent way.

Note that if you retrieve data in this format for use in an application (e.g. using PHP), you will probably find that the Postgres library will give you a string representation, rather than a native array value, because the low-level Postgres libraries return most complex types via string representations.

Modification

Note that these are all functions, taking one value and returning a new one; they are not modifying anything in place. Thus to update a column in a table, you would use something like Update ... Set name=i18n.set_translation(name, 'en', 'English') ...

translated_text i18n.set_translation(translated_item translated_text, language_code text, translation text)

Add or replace the translation for a particular language within a value.

translated_text i18n.delete(translated_item translated_text, language_code text)

translated_item - language_code

Remove the translation for a particular language from a value. Requesting that language will still return a value if a fall-back language is available.

Also available as the - (subtraction) operator.

translated_text i18n.combine(translated_item_1 translated_text, translated_item_2 translated_text)

translated_item_1 || translated_item_2

Add all translations from translated_item_2 to translated_item_1, over-writing as necessary.

The || operator is the SQL-standard operator for concatenation, and is used for various "concatenate" or "combine" operations with different types in Postgres, such as adding an element to an array, or combining two arrays.

This is particularly useful when paired with the i18n.translation_pair function or ==> operator: e.g. ... name = name || ('fr'==>'Anglais') || ('de'==>'Englisch') ...

Licensing and Contact Info

Let me know what you think of it, and what you're doing with it… Mail me on pg-atom [[AAHTT]] rwec.co.uk