A user-defined function (UDF) is a named function defined at the database level that can be called in queries and other contexts. CockroachDB supports invoking UDFs in SELECT, FROM, and WHERE clauses of DML statements.
Both stored procedures and user-defined functions are types of routines. However, they differ in the following ways:
- Functions return a value, and procedures do not return a value.
- Procedures must be invoked using a
CALLstatement. Functions can be invoked in nearly any context, such asSELECT,FROM, andWHEREclauses,DEFAULTexpressions, and computed column expressions. - Functions have volatility settings, and procedures do not.
Overview
The basic components of a user-defined function are a name, list of arguments, return type, volatility, language, and function body.
- An argument has a mode and a type.
- CockroachDB supports the
IN(default),OUT, andINOUTargument modes. For an example, see Create a function that usesOUTandINOUTparameters. - The type can be a built-in type, user-defined
ENUMor composite type, or implicit record type. A type can have aDEFAULTvalue.
- CockroachDB supports the
- The return type can be a built-in SQL type, user-defined
ENUMor composite type,RECORD,TABLE, PL/pgSQLREFCURSORtype, implicit record type,TRIGGER, orVOID.- Preceding a type with
SETOFindicates that a set, or multiple rows, may be returned. For an example, see Create a function that returns a set of results. VOIDindicates that there is no return type andNULLwill always be returned.
- Preceding a type with
- The volatility indicates whether the function has side effects.
VOLATILEandNOT LEAKPROOFare the default.- Annotate a function with side effects with
VOLATILE. This also prevents the cost-based optimizer from pre-evaluating the function. - A
STABLEorIMMUTABLEfunction does not mutate data. You cannot create aSTABLEorIMMUTABLEfunction that executes a mutation (INSERT,UPSERT,UPDATE,DELETE) statement. LEAKPROOFindicates that a function has no side effects and that it communicates nothing that depends on its arguments besides the return value (i.e., it cannot throw an error that depends on the value of its arguments). You must precedeLEAKPROOFwithIMMUTABLE, and onlyIMMUTABLEcan be set toLEAKPROOF.NOT LEAKPROOFis allowed with any other volatility.- Non-
VOLATILEfunctions can be optimized through inlining. For more information, see Create an inlined UDF.
- Annotate a function with side effects with
LANGUAGEspecifies the language of the function body. CockroachDB supports the languagesSQLandPLpgSQL(PL/pgSQL).- The function body:
- Can reference arguments by name or by their ordinal in the function definition with the syntax
$1. - Can be enclosed in a single line with single quotes
''or multiple lines with$$. - Can reference tables.
- Can reference only the
SELECTstatement.
- Can reference arguments by name or by their ordinal in the function definition with the syntax
Examples
Create a UDF
The following is a UDF that returns the sum of two integers:
CREATE FUNCTION add(a INT, b INT) RETURNS INT IMMUTABLE LEAKPROOF LANGUAGE SQL AS 'SELECT a + b';
Where:
- name:
add - arguments:
aof typeINT,bof typeINT - return type:
INT - volatility:
IMMUTABLE LEAKPROOF - language:
SQL - function body:
'SELECT a + b'
Alternatively, you could define this function as:
CREATE FUNCTION add(a INT, b INT) RETURNS INT IMMUTABLE LEAKPROOF LANGUAGE SQL AS 'SELECT $1 + $2';
Or as:
CREATE FUNCTION add(a INT, b INT) RETURNS INT LANGUAGE SQL AS $$
SELECT a + b;
$$;
For more examples of UDF creation, see CREATE FUNCTION.
View a UDF definition
To view the definition for the add() function:
SHOW CREATE FUNCTION add;
If you do not specify a schema for the function add when you create it, the default schema is public:
function_name | create_statement
----------------+---------------------------------------------------
add | CREATE FUNCTION public.add(IN a INT8, IN b INT8)
| RETURNS INT8
| IMMUTABLE
| LEAKPROOF
| CALLED ON NULL INPUT
| LANGUAGE SQL
| AS $$
| SELECT a + b;
| $$
(1 row)
Invoke a UDF
You invoke a UDF like a built-in function.
To invoke the add() function:
SELECT add(3,5) as sum;
sum
-------
8
(1 row)
Create a UDF using PL/pgSQL
The following user-defined function returns the nth integer in the Fibonacci sequence.
It uses the PL/pgSQL LOOP syntax to iterate through a simple calculation, and RAISE EXCEPTION to return an error message if the specified n is negative.
CREATE FUNCTION fib(n int) RETURNS INT AS $$
DECLARE
tmp INT;
a INT := 0;
b INT := 1;
i INT := 2;
BEGIN
IF n < 0 THEN
RAISE EXCEPTION 'n must be non-negative';
END IF;
IF n = 0 THEN RETURN 0; END IF;
IF n = 1 THEN RETURN 1; END IF;
LOOP
IF i > n THEN EXIT; END IF;
tmp := a + b;
a := b;
b := tmp;
i := i + 1;
END LOOP;
RETURN b;
END
$$ LANGUAGE PLpgSQL;
SELECT fib(8);
fib
-------
21
Create an inlined UDF
When possible, the cost-based optimizer will improve a function's performance by inlining the UDF within the query plan. The UDF must have the following attributes:
- It is labeled as
IMMUTABLE,STABLE, orLEAKPROOF(i.e., non-VOLATILE). - It has a single statement.
- It is not a set-returning function.
- Its arguments are only variable or constant expressions.
- It is not a record-returning function.
The following example demonstrates how inlining improves a UDF's performance.
Create tables
aandb:CREATE TABLE a ( a INT ); CREATE TABLE b ( b INT PRIMARY KEY );Insert a value (
10) into 1000 rows inaand 1 row inb:INSERT INTO a SELECT 10 FROM generate_series(1, 1000); INSERT INTO b VALUES (10);Create a
VOLATILEfunctionfoo_v()and aSTABLEfunctionfoo_s():CREATE FUNCTION foo_v(x INT) RETURNS INT VOLATILE LANGUAGE SQL AS $$ SELECT b FROM b WHERE b = x $$; CREATE FUNCTION foo_s(x INT) RETURNS INT STABLE LANGUAGE SQL AS $$ SELECT b FROM b WHERE b = x $$;Each function returns a specified value from table
b.View the query plan when
foo_v()(theVOLATILEfunction) is used in a selection query to retrieve equal values from tablea:EXPLAIN ANALYZE SELECT foo_v(a) FROM a WHERE a = 10;info -------------------------------------------------------------------------------------------- planning time: 2ms execution time: 77ms distribution: local vectorized: true rows read from KV: 1,000 (39 KiB, 1 gRPC calls) cumulative time spent in KV: 330µs maximum memory usage: 80 KiB network usage: 0 B (0 messages) sql cpu time: 75ms estimated RUs consumed: 0 • render │ └── • filter │ nodes: n1 │ actual row count: 1,000 │ sql cpu time: 75ms │ estimated row count: 1,000 │ filter: a = 10 │ └── • scan nodes: n1 actual row count: 1,000 KV time: 330µs KV contention time: 0µs KV rows read: 1,000 KV bytes read: 39 KiB KV gRPC calls: 1 estimated max memory allocated: 60 KiB sql cpu time: 87µs estimated row count: 1,000 (100% of the table; stats collected 19 seconds ago) table: a@a_pkey spans: FULL SCAN (33 rows)The query takes
77msto execute because the function is invoked for each row scanned in tablea.View the query plan when using
foo_s()(theSTABLEfunction) instead:EXPLAIN ANALYZE SELECT foo_s(a) FROM a WHERE a = 10;info ------------------------------------------------------------------------------------------------ planning time: 5ms execution time: 4ms distribution: local vectorized: true rows read from KV: 1,001 (39 KiB, 2 gRPC calls) cumulative time spent in KV: 832µs maximum memory usage: 420 KiB network usage: 0 B (0 messages) sql cpu time: 3ms estimated RUs consumed: 0 • render │ └── • merge join (left outer) │ nodes: n1 │ actual row count: 1,000 │ estimated max memory allocated: 340 KiB │ estimated max sql temp disk usage: 0 B │ sql cpu time: 3ms │ estimated row count: 1,000 │ equality: (a) = (b) │ right cols are key │ ├── • filter │ │ nodes: n1 │ │ actual row count: 1,000 │ │ sql cpu time: 5µs │ │ estimated row count: 1,000 │ │ filter: a = 10 │ │ │ └── • scan │ nodes: n1 │ actual row count: 1,000 │ KV time: 722µs │ KV contention time: 0µs │ KV rows read: 1,000 │ KV bytes read: 39 KiB │ KV gRPC calls: 1 │ estimated max memory allocated: 60 KiB │ sql cpu time: 202µs │ estimated row count: 1,000 (100% of the table; stats collected 42 seconds ago) │ table: a@a_pkey │ spans: FULL SCAN │ └── • scan nodes: n1 actual row count: 1 KV time: 110µs KV contention time: 0µs KV rows read: 1 KV bytes read: 30 B KV gRPC calls: 1 estimated max memory allocated: 20 KiB sql cpu time: 11µs estimated row count: 1 (100% of the table; stats collected 42 seconds ago) table: b@b_pkey spans: FULL SCAN (57 rows)The query takes only
4msto execute because the function is inlined and transformed to a join with an equality comparison(a) = (b), which has much less overhead than invoking a function for each row scanned in tablea.
Video Demo
For a deep-dive demo on UDFs, watch the following video:
Known limitations
User-defined functions have the following limitations:
- A
RECORD-returning UDF cannot be created without aRETURNstatement in the root block, which would restrict the wildcard type to a concrete one. #122945 - User-defined functions are not currently supported in:
- User-defined functions cannot call themselves recursively. #93049
- The
setvalfunction cannot be resolved when used inside UDF bodies. #110860 Casting subqueries to user-defined types in UDFs is not supported. #108184
Routines cannot be invoked with named arguments, e.g.,
SELECT foo(a => 1, b => 2);orSELECT foo(b := 1, a := 2);. #122264Routines cannot be created if they reference temporary tables. #121375
Routines cannot be created with unnamed
INOUTparameters. For example,CREATE PROCEDURE p(INOUT INT) AS $$ BEGIN NULL; END; $$ LANGUAGE PLpgSQL;. #121251Routines cannot be created if they return fewer columns than declared. For example,
CREATE FUNCTION f(OUT sum INT, INOUT a INT, INOUT b INT) LANGUAGE SQL AS $$ SELECT (a + b, b); $$;. #121247Routines cannot be created with an
OUTparameter of typeRECORD. #123448DDL statements (e.g.,
CREATE TABLE,CREATE INDEX) are not allowed within UDFs or stored procedures. #110080Polymorphic types cannot be cast to other types (e.g.,
TEXT) within routine parameters. #123536Routine parameters and return types cannot be declared using the
ANYENUMpolymorphic type, which is able to match anyENUMtype. 123048
Also refer to the PL/pgSQL known limitations.