Search This Blog

2024-05-21

Java SQL Programming - Tuning SELECT Statement Performance, JDBC Performance Tuning

Product: Java Programming
Version: 0.1 - current
OS: Any operating system, Windows, Linux, UNIX, AIX, Solaris, HP-UX
Related Product: Any programming language that uses SELECT in the world
Java API, Java class, Java method: java.sql.Statement, java.sql.PreparedStatement
Java method: java.sql.PreparedStatement.setFetchSize, java.sql.Statement.setFetchSize

Overview

Any programming that can work with databases will offer an API to write SELECT statement.  In term of Java, they are following Java classes:

  • java.sql.Statement
  • java.sql.PreparedStatement
Certainly there are more way to run a SELECT statement without using above, but this post will focus in these 2, as they are indirectly called by any SELECT statement execution by any programming languages in the world.  This has been since the 1980 when a database programming was introduces, and many programmer today has no clue how this begin, especially why this is required.

This post is going to explain how to use Java programming language to tune a SELECT statement for a very specific use case:
  • SELECT statement is returning a lots of data, such as 1000 rows and above
The tuning will be to tune either of following Java JDBC class:
  • java.sql.Statement
  • java.sql.PreparedStatement
  • java.sql.ResultSet

Architecture

In order to explains the tuning of these 2 JDBC class, first I need to explain how SELECT statement work in low level.  I will use programmer's perspective to explain, and skip many low level database execution, and pick Oracle DB as an example.  This is mainly I worked more in Oracle, and Oracle KB and blogs are sharing more of its internal architecture than MS SQL Server, for example.

A SELECT statement execution goes through following process:

  1. Crafts a SELECT statement
    1. In Java programming, this often be stored as a String variable, or many SELECT statements in many String variables
  2. Creates a String variable to store above SELECT statement
  3. Create a new variable of JDBC class java.sql.PreparedStatement by using above String (or SELECT statement) to that it becomes an object that is parsed (Oracle's keyword), and in programmer's terminology, it means compiled by DB
    1. PreparedStatement or DB SQL parsing is always required to run any SQL, including SELECT, INSERT, DELETE, CREATE, DROP, TRUNCATE, function call, store procedure call.  Let's call these DDL (data definition) and DML (data manipulation)
    2. PreparedStatement will indirectly getting call by any JDBC if the program doesn't defined/specify/create this object when executing any SQL
    3. There is no way to bypass SQL parsing (Java PrepareStatement) when executing any DDL and DML
    4. The object in Java takes up very little memory (Java called heap memory)
      1. In Java documentation, you often read this will allocate some memory, but the document was written when max desktop RAM was 512MB
    5. The object in Oracle takes up very little memory
      1. In Oracle documentation, you often read this will allocate some memory, but the document was written when max desktop RAM was 2GB
    6. The variable will auto set its row fetch size of 10 rows, i.e. PreparedStatement.setFetchSize(10).  It will download 10 rows on every ResultSet..next() method. You can call PreparedStatement.getFetchSize() to display its row fetch size
    7. This PreparedStatement variable is ready to run by calling  PreparedStatement.executeQuery()
    8. The FetchSize doesn't take up memory in object PreparedStatement
  4. Creates a commonly used variable for JDBC class of java.sql.Statement that either
    1. Uses above PreparedStatement variable
    2. Runs above String variable contains a SELECT statement
    3. Runs a a SELECT statement, typically used to get current timestamp
    4. The variable will auto set its row fetch size of 10 rows, i.e. Statement.setFetchSize(10).  It will download 10 rows on every ResultSet..next() method. You can call Statement.getFetchSize() to display its row fetch size
    5. Oracle DB will indirectly performs SQL parsing inside the DB, which takes up a little memory in Oracle DB server
    6. The FetchSize doesn't take up memory in object Statement
  5. Run the SQL by calling either following JDBC method:
    1. PreparedStatement.executeQuery()
    2. Statement.executeQuery()
  6. Oracle DB will takes time to execution the SQL statement.  It will take longer time if Oracle DB has poor SQL execution plan, or high data volumes that DB takes time to load the data from hard disk, or joins them using CPU.  Java program doesn't do any processing, and no data is available for the Java program to download.  In other word, Java program will stop and wait until DB finished execute the SQL, and ready to return the data. executeQuery() will wait as long as the Oracle DB takes to run the SELECT statement.  If the SELECT takes 50 hr to execute, then executeQuery() will wait for 50 hr
  7. After Oracle completed executing the SELECT statement, fetch/download the rows into Java using JDBC class ResultSet
    1. Creates a new variable of JDBC class ResultSet
    2. Download 10 rows (record) of records using ResultSet.next() method
    3. If you set FetchSize to be higher using setFetchSize(), then ResultSet.next() will download more rows from the DB server
    4. ResultSet.next() method will pause longer the higher the FetchSize set to.  Imagine you are downloading files from Internet with bigger size.  The bigger the file, the longer it takes to download
    5. The more columns the SELECT statement indicates, the bigger the data volume and longer it takes to run next() method
    6. So total time to fetch 10 rows (or FetchSize if set to diff value) depends on (excludes time waiting for DB to run SELECT statement):
      1. Number of columns defined in SELECT statement
      2. Number of columns with data. If no data, then no impact
      3. Data size for each column
      4. Number of rows to fetch that defined by setFetchSize. Default is 10 rows
    7. FetchSize can be modify in real time JDBC method ResultSet.setFetchSize(new_value) anytime

SELECT Statement Tuning

After lengthy explanation of the SELECT statement architecture, let's get into SELECT statement tuning.

We will not discuss about tuning of following as some are business requirement, and some are SQL tuning beyond Java programming tuning:

  1. slow running SELECT statement that DB takes time to process
  2. poorly written SELECT statement
  3. SELECT statement is joining plus expecting to retrieve all data from very large tables, such as 10GB - 300TB

The performance tuning in Java program is for following Java JDBC class in their FetchSize:

  • java.sql.Statement
  • java.sql.PreparedStatement
  • java.sql.ResultSet
To further elaborate, performance tuning in Java program is increase number of rows to download from Oracle DB each time by calling following Java JDBC method:
  • java.sql.Statement.setFetchSize()
  • java.sql.PreparedStatement.setFetchSize()
  • java.sql.ResultSet.setFetchSize()

Why Tune Default Row Fetch Size 

  1. Default of 10 rows (FetchSize) is too small by today's standard.  A desktop computer often comes with 1Gbps network card, while servers are using 10Gbps network card.  Back to 30 years ago, network card was running at 10Mbps.  Many residential internet are between 50Mbps and 1Gbps.
  2. Data size are significantly bigger, such as storing image, video, audio.
  3. Max column size has increased. Oracle LONG column type is changed to CLOB (character) and BLOG (binary) which can hold 8TB - 128TB for 1 column.
  4. Business users have valid usage to display 1000 rows per screen, such as viewing reports, YouTube video, X tweets, so 10 rows is too little, and anonying
  5. Too frequent pause when calling next() after it finished process/display 10 rows due to network latency
  6. It is safe to set a proper FetchSize, such as 1000 rows when the SELECT returns 1 row.  It doesn't take longer to download 1 row, when FetchSize=1000

Tuning Consideration

Fully study following when considering tuning FetchSize:
  1. Does users/application really needs to retrieve so many columns in the SELECT statement?
  2. How big (in bytes) the column size in the SELECT statement?
  3. How big (in bytes) the total size of all columns in the SELECT statement?
  4. How many records it will return by the SELECT statement?
  5. Can user adjust the FetchSize from the application?

Scenario 1: Showing number, strings, and time

90% of regular applications design often returns these column types in their SELECT statement, and their max size in Oracle as non-Unicode (byte):
  • number - 40 bytes
  • varchar2 - 32 KB
  • datetime - 16 bytes
Let's assume SELECT statement is returning so many number of column types, with total of 10 columns:
  • number - 3
  • varchar2 - 5
  • datetime - 2
So total max bytes that the next() will download from the DB server is
Max row size in byte = 3 * 40 + 5 * 32KB + 2 * 16
= 120 bytes + 160 KB + 32 bytes
= 163992 bytes
= 160.1484375 KB

Assumes that business users is able to accept 1 sec pause to display the data (download 10 columns from the Oracle DB server), then following is the size of data download size for different network speed of 1 sec:
1 sec for 1Mbps = 1 sec * 1,000,000 bit/s = 122.070 KB
1 sec for 10Mbps = 1 sec * 10,000,000 bit/s = 1220.703 KB
1 sec for 1Gbps = 1 sec * 1,000,000,000 bit/s = 119.209 MB
1 sec for 2Gbps = 1 sec * 2,000,000,000 bit/s = 237.418 MB
1 sec for 10Gbps = 1 sec * 10,000,000,000 bit/s = 1192.093 MB

Uses above data download size for 10 columns (160.1484375 KB max per row size) and determine the acceptable FetchSize:
  • 1Mbps: FetchSize=1
  • 10Mbps: FetchSize=7
  • 1Gbps: FetchSize=762
  • 2Gbps: FetchSize=1524
  • 10Gbps: FetchSize=7622
If business users said even the varchar2 column type can hold 32KB of data, but 99% of the data is below 2 KB, then max column size in bytes (keeping other columns type as their max size):
Max row size in byte = 3 * 40 + 2 * 2KB + 2 * 16
= 120 bytes + 4 KB + 32 bytes
= 4248 bytes
= 4.148 KB

Then the acceptable FetchSize for different network download speeds are below:
  • 1Mbps: FetchSize=29
  • 10Mbps: FetchSize=294
  • 1Gbps: FetchSize=29425
  • 2Gbps: FetchSize=58851
  • 10Gbps: FetchSize=294256
Default FetchSize of 10 rows is too small for business servers which often uses 1Gbps between Oracle server and Java application.

Over Tune Warning

Be aware about maximum column size (in bytes) in the SELECT statement when adjusting FetchSize:
  • varchar2 - 32KB max
  • LONG - 2GB max
  • BLOB - 8 - 128TB max
  • CLOB - 8 - 128TB max
  • BFILE - 4GB max
Not all SELECT statement are identical, so don't simply reuse a very high FetchSize value that used by another SELECT statement.

Following are bad example of over tune FetchSize:
  1. Business users often enter 4000 or more characters for a Note column inside a table.  The application set to display 1000 rows (FetchSize=1000) which easily download 4MB of data into Java application/servlet.. The Java application is used by 2000 concurrent users.  Programmer designed above screen in a frequently used home screen, which has 2000 concurrent users (4MB * 2000 ~ 8GB)
    1. Business users should consider fetching the first 100 characters from the Note column instead of displaying entire Note.  If business users need to read specific Note, then click and display the specific Note
  2. Business users want to auto download monthly pdf when click on the "Download Monthly Check Deposit" button, similar to an invoice screen that the application has (FetchSize=50).  However, business users didn't consider that the pdf in this screen often contains images of checks that are taken with cellphone, and each are 2MB easily.  This screen is used by business customers who deposit max 100 of check daily, and the screen suppose to display 1 months of records.  This means the button will download 100 checks * 31 days = 3,100 rows of checks in pdf file.  When JDBC FetchSize was set to 100 rows in this check deposit screen, it will download 302GB of data for the entire month per business customer
    1. This functionality to download entire month's check should not be offered unless end-to-end network speed is 10Gbps (this still takes ~ 30 sec to download the data into Java application)
  3. Reporting users designed a simple monthly operation reports which are showing 5,000 columns from few tables with big data volume, and high transactions volume daily. When setFetchSize to a high value, the report always crash the Java application (run out of heap memory)
    1. setFetchSize should set to a lower value
    2. Java heap memory needs to increase
    3. Controls number of rows to display the output to the screen
    4. Encourage users to save to file
    5. Offers pagination in the screen, and set setFetchSize to match the number of rows display on the screen, or 1/5 (so next() will indirectly download 5 times from the DB server) of the total rows. Uses following Java code to determine number of rows (not all DB type supports last())
      1. ResultSet rs = stmt.executeQuery(sql);
      2. int rowCount = rs.last() ? rs.getRow() : 0;
      3. rs.beforeFirst();

No comments: