Search

OracleDollars

Analytical Functions in SQL


Analytical Functions:
Introduced in Oracle 8i and are used for various purposes like calculating a running total, computing a percentage within a group, compute a moving average and many more in SELECT statements. Analytical functions add extensions to SQL language and not only make it easier to code but also makes the query execute faster.

How Analytic Functions Work?
It works on a group of records and it actually computes the aggregate value on a group of rows and returns the result. The main difference between Aggregate functions and Analytical functions is that, analytical functions return multiple rows for each group.

Important:
  • Analytical functions can only be used in a SELECT clause and ORDER BY Clause.
  • Difference between Analytical Function and Aggregate Function is that Aggregate functions return one row from a group of data’s whereas Analytical functions return multiple rows in a group of data.

Syntax:

Analytic-Function(Argument,Argument,...)
OVER (
Query-Partition-Clause
Order-By-Clause
Windowing-Clause
)

Some of the Analytical functions are as below:

AVG, CORR, COVAR_POP, COVAR_SAMP, COUNT, CUME_DIST, DENSE_RANK, FIRST, FIRST_VALUE, LAG, LAST, LAST_VALUE, LEAD, MAX, MIN, NTILE, PERCENT_RANK, PERCENTILE_CONT, PERCENTILE_DISC, RANK, RATIO_TO_REPORT, STDDEV, STDDEV_POP, STDDEV_SAMP, SUM, VAR_POP, VAR_SAMP, VARIANCE.

Some examples of Analytical functions in SQL Queries

a) Calculating a Running total:

Query:

SELECT ename, deptno, sal, SUM (sal) OVER (ORDER BY sal) "Running Total"
FROM emp

Output:


Note: One thing to note here is that, whenever ‘SAL’ is same for the same dept, the running total displayed here is a sum of both the salaries.

b) Retrieving the first and Last values in a group:

Query:
SELECT ename, sal, deptno, first_value(ename) over(Partition by deptno) FIRST, Last_value(ename) over(Partition by deptno) Last FROM emp

Output:



The above query gives the first and last values in a group.


c) Calculating a running average:

Query:

SELECT deptno, sal, AVG (sal) OVER (ORDER BY sal DESC) "Avg Sal"
FROM emp

Output:




d) TOP ‘n’ Analysis:

Query:

SELECT * FROM (SELECT ename, sal, ROW_NUMBER () OVER (ORDER BY sal DESC) RANK FROM emp) WHERE RANK <= 3

Output:



e) Accessing rows around a current row:

Query:

SELECT deptno, ename, sal, LAG (sal, 1, 0) OVER (ORDER BY hiredate, ename) "LAG",
LEAD (sal, 1, 0) OVER (ORDER BY hiredate, ename) "LEAD"
FROM emp

Output:



This query gives the LAG and LEAD values of an organization. LAG means the value previous to the current value and LEAD means the value next to the current value.
The syntax can be explained as below:


LAG(Column_name, Offset, Default_value)

LEAD(Column_name, Offset, Default_value)

Where:
Column_name is the name of the column
Offset defines on how many rows need to be skipped
Default value places a user defined value when there are now values in the offset range.

Windowing Clauses:
The windowing clause gives us a way to define a sliding or anchored window of data, on which the analytic function will operate, within a group. The default window is an anchored window that simply starts at the first row of a group and continues to the current row.

The various keywords used here are:
ROWS n PRECEDING, RANGE n PRECEDING

Note: RANGE option can only be used for number and Date datatypes.

Example:

1) RANGE window clause:

Query:

SELECT empno, sal, COUNT (*) OVER (ORDER BY sal ASC RANGE 100 PRECEDING) "counter"
FROM emp
ORDER BY empno

Output:



Explanation:
To understand the above query, we need to first look at the clause ‘RANGE 100 PRECEDING’, This creates a range from the current row for sal which is in the range of current value to (current value-100) and counts the salaries which falls within this range.
If we take the first row(1600) the count for this is 2. This is because there exists a value 1500, which is exactly at the start of the range and the number itself (1600), hence the count 2. Similarly, 1250 has a count 2 because of the existence of another value 1250 and 2975 has a count of only 1 because the value prior to this is 2850 which is more than 100 points below it and doesn’t fall in the range created. The other counts and the values follow the same criterion.


2) ROW window Clauses:

Query:

SELECT deptno "Deptno", ename "Ename", sal "Sal",
SUM (sal) OVER (ORDER BY ename ROWS 2 PRECEDING) "Sliding Total"
FROM emp

Output:



Explanation:
This is a row type windowing clause and the important thing to note here is the clause ‘ROWS 2 PRECEDING’. What this implies here is that, the total which is calculated here is based on the previous two rows of the current columns.
For instance, take the first row, the ‘Sliding total’ is 1100, same as the value of ‘Sal’ and the second row value for ‘Sliding total’ stands as the sum of the first and the second row. This is because, the second row does not have two rows preceding it, hence the second value is taken from the immediate column. If we take the case of the third row, the ‘Sliding total’ value is the sum of the third row, first and the second row, that is (2850+1100+1600=5550) and lets take the fourth row, the value of ‘Sliding total’ here is the sum of the immediate two rows preceding it and that is (2450+2850+1600=6900) and so on.

Hence, this concludes a small introduction of Analytical Functions in SQL. Feel free to post any comments or queries on the above topic.


No comments: