Internationalisation as a Postgres Data Type
Rationale
I've seen various approaches to storing translated data within a database, all treating it as a problem of schema design. However, I believe it can be approached as a data-type problem - what I call "atomic i18n". This "library" is my attempt at a proof-of-concept / prototype implementation of that idea, designed to be run in a vanilla install of PostgreSQL.
For full background, see my somewhat long-winded blog posts: Part 1: Database i18n as a Data Type Problem and Part 2: Translated Text as a Data Type in PostgreSQL.
Download
You can download the latest version of the "library" in one convenient ZIP file: i18n.zip.
The files included (to be run in this order) are:
- env-pgsql.sql implementation of environment variables in Pl/PgSQL (see env documentation)
- i18n-setup.sql contains the basic initialisation or the schema, type, and table
- i18n-functions.sql contains both the "private" and "public" functions for manipulating the type
- i18n-operators.sql contains operators for easier use of the main functions
- i18n-example.sql contains some example code demonstrating the different functions, using the language table itself
Implementation
For full details of the implementation, please look at the source code, but the basic elements are:
- the translated_text type is implemented as a simple Postgres array
- translations are stored in the array based on a generated "language ID", since the indexes must be integers
- for practical use, translations are referenced by "language codes", such as 2-letter codes from ISO 639-1, or the IETF language tag (e.g. "en-GB", "en-US") widely used online
- abstraction functions and operators map internally from IDs to codes, so that operations can be performed safely and without knowledge of (or reliance on) the underlying architecture
Functions
Creation
translated_text i18n.translation_pair(language_code text, translation text)
language_code ==> translation
-
Create the most basic type of translated_text value - a single language, and the translation for that language.
The
==>
operator is deliberately similar to the=>
operator used by hstore, but cannot be identical, as it would create ambiguity since the source data-types are the same. translated_text i18n.list_to_translation(list_of_pairs text[][])
-
Create a whole translated_text value from a multi-dimensional array; useful for specifying literals in queries, or generating values from application code. Each element of the array should be a 2-dimensional array containing a language code and the appropriate translation. (Since Postgres arrays are inherently multi-dimensional, this is not enforceable in the function signature).
e.g.
Select list_to_translation( ARRAY[['en','Hello'],['fr','Bonjour']] );
Retrieval
text i18n.translate(translated_item translated_text, language_code text)
translated_item -> language_code
-
The most obvious operation on a translated_text value: request the translation for a particular language, using the appropriate fall-back language(s) if necessary.
The
->
operator is used by the hstore associative array type with a similar meaning. void i18n.set_lang(language_code text)
-
Sets an "environment variable" using
env.set
(see env documentation) to represent the default language to use for all queries in the current "session". text i18n.translate(translated_item translated_text)
translated_item ->!
-
Translate the value into the language set with
i18n.set_lang
, saving you from mentionning the required language multiple times when writing a query.The
->!
operator means "just translate it!" ;) text[][] i18n.translation_to_list(translated_item translated_text)
-
The reverse of
i18n.list_to_translation
- convert the translated_text item into a multi-dimensional array. Use this for retrieving all the current translations of a particular item in an implementation-independent way.Note that if you retrieve data in this format for use in an application (e.g. using PHP), you will probably find that the Postgres library will give you a string representation, rather than a native array value, because the low-level Postgres libraries return most complex types via string representations.
Modification
Note that these are all functions, taking one value and returning a new one; they are not modifying anything in place.
Thus to update a column in a table, you would use something like Update ... Set name=i18n.set_translation(name, 'en', 'English') ...
translated_text i18n.set_translation(translated_item translated_text, language_code text, translation text)
-
Add or replace the translation for a particular language within a value.
translated_text i18n.delete(translated_item translated_text, language_code text)
translated_item - language_code
-
Remove the translation for a particular language from a value. Requesting that language will still return a value if a fall-back language is available.
Also available as the
-
(subtraction) operator. translated_text i18n.combine(translated_item_1 translated_text, translated_item_2 translated_text)
translated_item_1 || translated_item_2
-
Add all translations from translated_item_2 to translated_item_1, over-writing as necessary.
The
||
operator is the SQL-standard operator for concatenation, and is used for various "concatenate" or "combine" operations with different types in Postgres, such as adding an element to an array, or combining two arrays.This is particularly useful when paired with the
i18n.translation_pair
function or==>
operator: e.g.... name = name || ('fr'==>'Anglais') || ('de'==>'Englisch') ...
Licensing and Contact Info
© Copyright Rowan Collins, 2009, but hereby licensed for use by whomever, for whatever purpose, with no limitation, and accepting no liability, expressed or otherwise.
Let me know what you think of it, and what you're doing with it… Mail me on pg-atom [[AAHTT]] rwec.co.uk