{"id":281,"date":"2014-02-16T21:14:05","date_gmt":"2014-02-16T21:14:05","guid":{"rendered":"https:\/\/rwec.co.uk\/blog\/?p=281"},"modified":"2014-02-19T12:47:41","modified_gmt":"2014-02-19T12:47:41","slug":"securely-importing-and-exporting-csv-with-postgresql","status":"publish","type":"post","link":"https:\/\/rwec.co.uk\/blog\/2014\/02\/securely-importing-and-exporting-csv-with-postgresql\/","title":{"rendered":"Securely Importing and Exporting CSV with PostgreSQL"},"content":{"rendered":"<p>Many moons ago, I posted <a title=\"save (postgres) sql output to csv file - StackOverflow\" href=\"http:\/\/stackoverflow.com\/a\/1517692\/157957\">a surprisingly popular answer on StackOverflow<\/a> regarding how to write to CSV files using PostgreSQL. The answer, in a nutshell, is the <code>COPY<\/code> statement. But there&#8217;s a catch &#8211; PostgreSQL imposes strict security limitations on the use of this statement, which are best dealt with using another feature, the <code>SECURITY DEFINER<\/code> option to <code>CREATE FUNCTION<\/code>. Here, I will attempt to explain in more detail what these security restrictions are, and how to do what you want without simply disabling that security.<\/p>\n<p><!--more--><\/p>\n<h2>CSV made easy: the COPY statement<\/h2>\n<p>There is no standard mechanism in SQL for reading or writing CSV (or similar) files, but most DBMSes provide at least one approach. In the case of PostgreSQL, this comes in the form of the very flexible <code>COPY FROM<\/code> and <code>COPY TO<\/code>. <a href=\"http:\/\/www.postgresql.org\/docs\/current\/interactive\/sql-copy.html\">The relevant manual page<\/a> sums these up pretty neatly:<\/p>\n<blockquote><p><code>COPY<\/code> moves data between PostgreSQL tables and standard file-system files.<br \/>\n<code>COPY TO<\/code> copies the contents of a table <em>to<\/em> a file, while <code>COPY FROM<\/code> copies data <em>from<\/em> a file to a table (appending the data to whatever is in the table already).<br \/>\n<code>COPY TO<\/code> can also copy the results of a <code>SELECT<\/code> query.<\/p><\/blockquote>\n<p>There are, basically, two places a file could be which PostgreSQL is expected to read or write: on the <em>server<\/em> (i.e. where PostgreSQL itself is running) or on the <em>client<\/em> (i.e. where your SQL is originating, such as a webserver running a PHP script).<\/p>\n<p>If your file is on the <em>server<\/em>, then the &#8220;full&#8221; version of <code>COPY<\/code> is available to you: you can write something like <code>COPY (SELECT * FROM public.foo) TO '\/tmp\/foo.csv' WITH (FORMAT CSV);<\/code> or <code>COPY public.foo FROM '\/tmp\/foo.csv' WITH (FORMAT CSV);<\/code>.<\/p>\n<p>But there&#8217;s a snag: you can only use this form if connected to PostgreSQL as the &#8220;superuser&#8221; (usually called &#8220;root&#8221;). Why? Because PostgreSQL has no way of knowing which PostgreSQL &#8220;roles&#8221; should map to which users or permissions on the underlying OS; the process running the DBMS itself may well have access to read and write all sorts of files which you really wouldn&#8217;t want someone to get their hands on. In a worst case scenario, an attacker who got hold of an SQL account with <code>COPY<\/code> privileges might be able to create a back-door to take complete control of your server, and PostgreSQL would rather not be responsible for that!<\/p>\n<h2>Side-stepping the problem: accessing files on the client<\/h2>\n<p>If your file is on the <em>client<\/em>, then the PostgreSQL server can&#8217;t just go onto that remote machine and grab the file &#8211; like uploading a file to a website, you need to provide the data in some way. Since as far as it&#8217;s concerned, you&#8217;re not actually reading a file, just throwing it data, PostgreSQL can apply the same security policy it would if you just wrote a bunch of <code>INSERT<\/code> statements, and doesn&#8217;t need the superuser-only restriction.<\/p>\n<p>This is why there are a couple of special cases in the <code>COPY<\/code> command: <code>COPY FROM STDIN<\/code> and <code>COPY TO STDOUT<\/code>. The downside is that it&#8217;s up to you to push that data in (or catch it coming out); for instance, graphical tools like PgAdmin III or phpPgAdmin generally have an &#8220;Execute query to file&#8221; option, which internally sets up a <code>COPY TO STDOUT<\/code> command and saves the result to disk.<\/p>\n<p>The command-line <code>psql<\/code> interface, meanwhile, has <a href=\"http:\/\/www.postgresql.org\/docs\/current\/interactive\/app-psql.html#APP-PSQL-META-COMMANDS-COPY\">a special <code>\\copy<\/code> command<\/a>, which you can run as though it were an ordinary <code>COPY<\/code> statement, but with the file handled by the client, not the server. This could be built in to a shell script, and not needing to copy data to\/from the server might be an advantage, but building it into a more complex data import\/export system could be tricky.<\/p>\n<h2>How to be root: SECURITY DEFINER functions<\/h2>\n<p>Obviously, if it were really only possible to use <code>COPY<\/code> when connected as superuser, it wouldn&#8217;t be very useful &#8211; you certainly don&#8217;t want all your scripts to be regularly connecting as &#8220;root&#8221; and leaving the door open for attackers to compromise that account. Luckily, there is another feature we can use to our advantage, which is described thus in <a href=\"http:\/\/www.postgresql.org\/docs\/current\/interactive\/sql-createfunction.html\">the manual page for <code>CREATE FUNCTION<\/code><\/a>:<\/p>\n<blockquote><p><code>SECURITY INVOKER<\/code> indicates that the function is to be executed with the privileges of the user that calls it. That is the default.<br \/>\n<code>SECURITY DEFINER<\/code> specifies that the function is to be executed with the privileges of the user that created it.<\/p><\/blockquote>\n<p>This brief explanation belies the power of this option: you can create a function such that, whatever user <em>actually<\/em> runs it, PostgreSQL acts as though it were run as a different user &#8211; specifically, the &#8220;owner&#8221; of the function.<\/p>\n<p>Hopefully it is obvious how this applies to the topic in hand &#8211; define a <code>SECURITY DEFINER<\/code> function while logged in as &#8220;root&#8221;, and it will be able to use the full <code>COPY<\/code> command, even when your application is logged in as a properly locked-down user. At its simplest, it could look something like this:<\/p>\n<pre>CREATE FUNCTION export_foo()\r\n\tRETURNS VOID\r\n\tSECURITY DEFINER\r\n\tLANGUAGE SQL\r\n\tAS $BODY$\r\n\t\tCOPY (SELECT * FROM public.foo) TO '\/tmp\/foo.csv' WITH ( FORMAT CSV );\r\n\t$BODY$;<\/pre>\n<p>Obviously, this is a rather awkward function &#8211; it exists only to add the <code>SECURITY DEFINER<\/code> attribute to one line of some larger piece of application logic. You could, though, expand this with procedural code (using <a href=\"http:\/\/www.postgresql.org\/docs\/current\/interactive\/plpgsql-overview.html\">PL\/pgSQL<\/a>, or <a href=\"http:\/\/www.postgresql.org\/docs\/current\/interactive\/xplang.html\">one of the other pluggable languages<\/a>) to process the whole task &#8211; for instance, define a Temporary Table, fill it from an uploaded CSV, sanity-check the data, and insert it into a permanenent table.<\/p>\n<h2>The dangers of over-generalising<\/h2>\n<p>Rather than expand the function for a specific purpose, you might be tempted to simply generalise it into one which wraps up the basic options of <code>COPY<\/code>.<\/p>\n<p>Here&#8217;s a simple example of how <strong>not<\/strong> to write a general-purpose import function:<\/p>\n<pre>CREATE FUNCTION copy_from_ignoring_all_security(table_name text, file_path text)\r\n        RETURNS VOID\r\n        LANGUAGE plpgsql\r\n        SECURITY DEFINER\r\n        AS $BODY$\r\n\t\tBEGIN\r\n\t\t\t-- This is a really bad idea. Do not copy this function.\r\n\t\t\tEXECUTE '\r\n\t\t\t\tCOPY\r\n\t\t\t\t\t' || quote_ident(table_name) || '\r\n\t\t\t\tFROM\r\n\t\t\t\t\t' || quote_literal(file_path) || '\r\n\t\t\t\tWITH (\r\n\t\t\t\t\tFORMAT CSV, HEADER\r\n\t\t\t\t);\r\n\t\t\t';\r\n\t\t\t-- This is a really bad idea. Do not copy this function.\r\n\t\tEND;\r\n\t$BODY$;<\/pre>\n<p>This is a <strong>really bad idea<\/strong>! The security restriction on <code>COPY<\/code> is there for a good reason, so simply over-riding it as though you know better is clearly not the way to go. Indeed, you could generalise this further, and write a function which took a string of SQL, and executed it as a superuser &#8211; thus giving any user which could execute that function all the rights of a superuser. This is actually quite a common mistake; for instance, some Linux systems configure the <code>sudo<\/code> command so that users can run any command as root at will.<\/p>\n<p><a href=\"http:\/\/xkcd.com\/149\/\"><img loading=\"lazy\" decoding=\"async\" title=\"xkcd's neat parody of 'sudo' in its common role of 'Simon Says'\" alt=\"'Make me a sandwich' 'What? Make it yourself' 'Sudo Make me a sandwich' 'Okay'\" src=\"http:\/\/imgs.xkcd.com\/comics\/sandwich.png\" width=\"360\" height=\"299\" \/><\/a><br \/>\n<cite>Image courtesy of <a href=\"http:\/\/xkcd.com\/\">xkcd (Randall Munroe)<\/a>, used under a <a href=\"http:\/\/creativecommons.org\/licenses\/by-nc\/2.5\/\" rel=\"license\">Creative Commons Attribution-NonCommercial 2.5 License<\/a>. <a href=\"http:\/\/store.xkcd.com\/collections\/apparel\/products\/sudo\">Also available as a T-shirt.<\/a><\/cite><\/p>\n<p>This is no more a correct configuration of <code>sudo<\/code> than our hypothetical PostgreSQL function is a correct use of <code>SECURITY DEFINER<\/code>, since it effectively makes users who can <code>sudo<\/code> into superusers. (It does afford some protection, since the user will generally be prompted for their password, like in Windows UAC; but on a multi-user system, it makes any weak user password into a complete system risk).<\/p>\n<h2>Doing it right: letting root say no<\/h2>\n<p>Rather than &#8220;running code as root&#8221;, perhaps a better way of phrasing what tools such as <code>sudo<\/code> and <code>SECURITY DEFINER<\/code> <em>should<\/em> be used for is &#8220;asking root to run code for you&#8221;. The crucial difference being that it is perfectly acceptable &#8211; indeed, essential &#8211; for root to &#8220;say no&#8221;, rather than blindly carrying out the requested action.<\/p>\n<p>In the case of <code>COPY<\/code>, it is up to your function to constrain what users can do &#8211; the tighter the restrictions the better &#8211; while allowing for all the scenarios your application needs. There are actually two things you need to define:<\/p>\n<ol>\n<li>Which <strong>files<\/strong> should the user be allowed to read\/write on disk? This might be a particular directory, for instance, and the filename might have to have a suitable prefix or extension.<\/li>\n<li>Which <strong>tables<\/strong> should the user be able to read\/write in the database? This would normally be defined by <code>GRANT<\/code>s in the database, but the function is now running as &#8220;root&#8221;, so tables which would normally be &#8220;out of bounds&#8221; will be fully accessible. You probably don&#8217;t want to let someone invoke your function and add rows on the end of your &#8220;users&#8221; table&#8230;<\/li>\n<\/ol>\n<p>You might want to just hard-code as much as possible, but remember to guard against awkward things like <code>\/..\/<\/code> and the dreaded null byte. If in doubt, a tight whitelist always trumps an incomplete blacklist. When you find something that doesn&#8217;t meet your requirements, you want to abort, and make sure your application knows you&#8217;ve aborted, so you will probably want to make use of <a href=\"http:\/\/www.postgresql.org\/docs\/current\/static\/plpgsql-errors-and-messages.html\"><code>RAISE EXCEPTION<\/code><\/a> or something similar.<\/p>\n<p>Finally, a couple of extra things to be careful of:<\/p>\n<ul>\n<li>You&#8217;ll be building dynamic SQL in your function, so need to make sure you&#8217;re guarded against SQL Injection. Make appropriate use of <a href=\"http:\/\/www.postgresql.org\/docs\/current\/interactive\/functions-string.html\">the <code>quote_ident<\/code> and <code>quote_literal<\/code> functions<\/a>, which will do appropriate double- and single-quoting for you.<\/li>\n<li>Think about which connecting users need to be able to import files <em>at all<\/em>, and <code>GRANT<\/code> only those users the <code>EXECUTE<\/code> permission on your function. That will probably take the form of a &#8220;group role&#8221; which can then be <code>GRANT<\/code>ed to actual &#8220;login roles&#8221;, but that&#8217;s a whole nother topic.<\/li>\n<\/ul>\n<h2>Putting it into practice: an annotated example<\/h2>\n<p>OK, this is the <strong>tl;dr<\/strong> bit of the post, I guess: I give you some code that you can take away and customise. <strong>I take absolutely no responsibility for the security of this function, or your modified version of it.<\/strong> I&#8217;ve explained the principles involved, and welcome feedback of anything I might have missed. <tt>:)<\/tt><\/p>\n<h3>Export from database to file<\/h3>\n<pre>CREATE FUNCTION export_temp_table_to_csv_file(table_name text, file_name text)\r\n        RETURNS VOID\r\n        LANGUAGE plpgsql\r\n        -- The magic ingredient: Anyone who can execute this can do so with superuser privileges,\r\n        --\tas long as the function was created while logged in as a superuser.\r\n        SECURITY DEFINER\r\n        AS $BODY$\r\n\t\tDECLARE\r\n\t\t\t-- These must be as restrictive as possible, for security reasons\r\n\r\n\t\t\t-- Hard-coded directory to which all CSV files will be exported\r\n\t\t\tfile_path text := '\/var\/my_application\/csv_output\/';\r\n\r\n\t\t\t-- File names must contain only alphanumerics, dashes and underscores,\r\n\t\t\t--\tand all must end in the extension .csv\r\n\t\t\tfile_name_regex text := E'^[a-zA-Z0-9_-]+\\\\.csv$';\r\n\r\n\t\t\t-- Only allow exports of tables whose names begin 'temp_export_',\r\n\t\t\t--\tindicating that they are Temporary Tables created for the purpose\r\n\t\t\ttable_name_regex text := '^temp_export_[a-z_]+$';\r\n\t\tBEGIN\r\n\t\t\t-- Sanity check input\r\n\t\t\tIF\r\n\t\t\t\ttable_name !~ table_name_regex\r\n\t\t\tTHEN\r\n\t\t\t\tRAISE EXCEPTION 'Invalid temp table name (% doesn''t match %)', table_name, table_name_regex;\r\n\t\t\tEND IF;\r\n\r\n\t\t\tIF\r\n\t\t\t\tfile_name !~ file_name_regex\r\n\t\t\tTHEN\r\n\t\t\t\tRAISE EXCEPTION 'Invalid data file name (% doesn''t match %)', file_name, file_name_regex;\r\n\t\t\tEND IF;\r\n\r\n\t\t\t-- OK? Go!\r\n\t\t\t-- Make sure there's zero chance of SQL injection here\r\n\t\t\tEXECUTE '\r\n\t\t\t\tCOPY\r\n\t\t\t\t\t' || quote_ident(table_name) || '\r\n\t\t\t\tTO\r\n\t\t\t\t\t' || quote_literal(file_path || file_name) || '\r\n\t\t\t\tWITH (\r\n\t\t\t\t\tFORMAT CSV, HEADER\r\n\t\t\t\t);\r\n\t\t\t';\r\n\t\tEND;\r\n\t$BODY$;\r\n\r\n\t-- Don't let just anyone do this privileged thing\r\n\tREVOKE ALL ON FUNCTION export_temp_table_to_csv_file( table_name text, file_name text )\r\n\t\tFROM PUBLIC;\r\n\tGRANT EXECUTE ON FUNCTION export_temp_table_to_csv_file( table_name text, file_name text )\r\n\t\tTO group_csv_exporters;<\/pre>\n<p>Now, your application will need to:<\/p>\n<ol>\n<li>open a normal, non-superuser, connection to the database<\/li>\n<li>create an appropriately named temp table, with the relevant column names and types<\/li>\n<li>insert whatever data it needs to export into the temp table<\/li>\n<li>run the function, e.g. <code>SELECT export_temp_table_to_csv_file('temp_export_foo', 'foo.csv');<\/code><\/li>\n<li>do whatever needs to be done with the file this creates on the DB server<\/li>\n<\/ol>\n<h3>Import from file to database<\/h3>\n<pre>CREATE FUNCTION import_csv_file_to_temp_table(table_name text, file_name text)\r\n        RETURNS VOID\r\n        LANGUAGE plpgsql\r\n        -- The magic ingredient: Anyone who can execute this can do so with superuser privileges,\r\n        --\tas long as the function was created while logged in as a superuser.\r\n        SECURITY DEFINER\r\n        AS $BODY$\r\n\t\tDECLARE\r\n\t\t\t-- These must be as restrictive as possible, for security reasons\r\n\r\n\t\t\t-- Hard-coded directory in which all CSV files to import will be placed\r\n\t\t\tfile_path text := '\/var\/my_application\/csv_input\/';\r\n\r\n\t\t\t-- File names must contain only alphanumerics, dashes and underscores,\r\n\t\t\t--\tand all must end in the extension .csv\r\n\t\t\tfile_name_regex text := E'^[a-zA-Z0-9_-]+\\\\.csv$';\r\n\r\n\t\t\t-- Only allow imports to tables whose names begin 'temp_csv_',\r\n\t\t\t--\tindicating that they are Temporary Tables created for the purpose\r\n\t\t\ttable_name_regex text := '^temp_import_[a-z_]+$';\r\n\t\tBEGIN\r\n\t\t\t-- Sanity check input\r\n\t\t\tIF\r\n\t\t\t\ttable_name !~ table_name_regex\r\n\t\t\tTHEN\r\n\t\t\t\tRAISE EXCEPTION 'Invalid temp table name (% doesn''t match %)', table_name, table_name_regex;\r\n\t\t\tEND IF;\r\n\r\n\t\t\tIF\r\n\t\t\t\tfile_name !~ file_name_regex\r\n\t\t\tTHEN\r\n\t\t\t\tRAISE EXCEPTION 'Invalid data file name (% doesn''t match %)', file_name, file_name_regex;\r\n\t\t\tEND IF;\r\n\r\n\t\t\t-- OK? Go!\r\n\t\t\t-- Make sure there's zero chance of SQL injection here\r\n\t\t\tEXECUTE '\r\n\t\t\t\tCOPY\r\n\t\t\t\t\t' || quote_ident(table_name) || '\r\n\t\t\t\tFROM\r\n\t\t\t\t\t' || quote_literal(file_path || file_name) || '\r\n\t\t\t\tWITH (\r\n\t\t\t\t\tFORMAT CSV, HEADER\r\n\t\t\t\t);\r\n\t\t\t';\r\n\t\tEND;\r\n\t$BODY$;\r\n\r\n\t-- Don't let just anyone do this privileged thing\r\n\tREVOKE ALL ON FUNCTION import_csv_file_to_temp_table( table_name text, file_name text )\r\n\t\tFROM PUBLIC;\r\n\tGRANT EXECUTE ON FUNCTION import_csv_file_to_temp_table( table_name text, file_name text )\r\n\t\tTO group_csv_importers;<\/pre>\n<p>Now, your application will need to:<\/p>\n<ol>\n<li>prepare a CSV file in the appropriate directory on the database server<\/li>\n<li>open a normal, non-superuser, connection<\/li>\n<li>create an appropriately named temp table, with the relevant column names and types<\/li>\n<li>run the function, e.g. <code>SELECT import_csv_file_to_temp_table('temp_import_foo', 'foo.csv');<\/code><\/li>\n<li>do whatever it needs to do with the data in the temp table<\/li>\n<\/ol>\n<h2>So. Any questions? <tt>;)<\/tt><\/h2>\n","protected":false},"excerpt":{"rendered":"<p>Many moons ago, I posted a surprisingly popular answer on StackOverflow regarding how to write to CSV files using PostgreSQL. The answer, in a nutshell, is the COPY statement. But there&#8217;s a catch &#8211; PostgreSQL imposes strict security limitations on the use of this statement, which are best dealt with using another feature, the SECURITY [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-281","post","type-post","status-publish","format-standard","hentry","category-uncategorized","post-preview"],"_links":{"self":[{"href":"https:\/\/rwec.co.uk\/blog\/wp-json\/wp\/v2\/posts\/281","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rwec.co.uk\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rwec.co.uk\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rwec.co.uk\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rwec.co.uk\/blog\/wp-json\/wp\/v2\/comments?post=281"}],"version-history":[{"count":8,"href":"https:\/\/rwec.co.uk\/blog\/wp-json\/wp\/v2\/posts\/281\/revisions"}],"predecessor-version":[{"id":289,"href":"https:\/\/rwec.co.uk\/blog\/wp-json\/wp\/v2\/posts\/281\/revisions\/289"}],"wp:attachment":[{"href":"https:\/\/rwec.co.uk\/blog\/wp-json\/wp\/v2\/media?parent=281"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rwec.co.uk\/blog\/wp-json\/wp\/v2\/categories?post=281"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rwec.co.uk\/blog\/wp-json\/wp\/v2\/tags?post=281"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}