Product: Java Programming
Version: 0.1 - current
OS: Any operating system, Windows, Linux, UNIX, AIX, Solaris, HP-UX
Related Product: Any programming language that uses SELECT in the world
Java API, Java class, Java method: java.sql.Statement, java.sql.PreparedStatement
Java method: java.sql.PreparedStatement.setFetchSize, java.sql.Statement.setFetchSize
Overview
Any programming that can work with databases will offer an API to write SELECT statement. In term of Java, they are following Java classes:
- java.sql.Statement
- java.sql.PreparedStatement
- SELECT statement is returning a lots of data, such as 1000 rows and above
- java.sql.Statement
- java.sql.PreparedStatement
- java.sql.ResultSet
Architecture
In order to explains the tuning of these 2 JDBC class, first I need to explain how SELECT statement work in low level. I will use programmer's perspective to explain, and skip many low level database execution, and pick Oracle DB as an example. This is mainly I worked more in Oracle, and Oracle KB and blogs are sharing more of its internal architecture than MS SQL Server, for example.
A SELECT statement execution goes through following process:
- Crafts a SELECT statement
- In Java programming, this often be stored as a String variable, or many SELECT statements in many String variables
- Creates a String variable to store above SELECT statement
- Create a new variable of JDBC class java.sql.PreparedStatement by using above String (or SELECT statement) to that it becomes an object that is parsed (Oracle's keyword), and in programmer's terminology, it means compiled by DB
- PreparedStatement or DB SQL parsing is always required to run any SQL, including SELECT, INSERT, DELETE, CREATE, DROP, TRUNCATE, function call, store procedure call. Let's call these DDL (data definition) and DML (data manipulation)
- PreparedStatement will indirectly getting call by any JDBC if the program doesn't defined/specify/create this object when executing any SQL
- There is no way to bypass SQL parsing (Java PrepareStatement) when executing any DDL and DML
- The object in Java takes up very little memory (Java called heap memory)
- In Java documentation, you often read this will allocate some memory, but the document was written when max desktop RAM was 512MB
- The object in Oracle takes up very little memory
- In Oracle documentation, you often read this will allocate some memory, but the document was written when max desktop RAM was 2GB
- The variable will auto set its row fetch size of 10 rows, i.e. PreparedStatement.setFetchSize(10). It will download 10 rows on every ResultSet..next() method. You can call PreparedStatement.getFetchSize() to display its row fetch size
- This PreparedStatement variable is ready to run by calling PreparedStatement.executeQuery()
- The FetchSize doesn't take up memory in object PreparedStatement
- Creates a commonly used variable for JDBC class of java.sql.Statement that either
- Uses above PreparedStatement variable
- Runs above String variable contains a SELECT statement
- Runs a a SELECT statement, typically used to get current timestamp
- The variable will auto set its row fetch size of 10 rows, i.e. Statement.setFetchSize(10). It will download 10 rows on every ResultSet..next() method. You can call Statement.getFetchSize() to display its row fetch size
- Oracle DB will indirectly performs SQL parsing inside the DB, which takes up a little memory in Oracle DB server
- The FetchSize doesn't take up memory in object Statement
- Run the SQL by calling either following JDBC method:
- PreparedStatement.executeQuery()
- Statement.executeQuery()
- Oracle DB will takes time to execution the SQL statement. It will take longer time if Oracle DB has poor SQL execution plan, or high data volumes that DB takes time to load the data from hard disk, or joins them using CPU. Java program doesn't do any processing, and no data is available for the Java program to download. In other word, Java program will stop and wait until DB finished execute the SQL, and ready to return the data. executeQuery() will wait as long as the Oracle DB takes to run the SELECT statement. If the SELECT takes 50 hr to execute, then executeQuery() will wait for 50 hr
- After Oracle completed executing the SELECT statement, fetch/download the rows into Java using JDBC class ResultSet
- Creates a new variable of JDBC class ResultSet
- Download 10 rows (record) of records using ResultSet.next() method
- If you set FetchSize to be higher using setFetchSize(), then ResultSet.next() will download more rows from the DB server
- ResultSet.next() method will pause longer the higher the FetchSize set to. Imagine you are downloading files from Internet with bigger size. The bigger the file, the longer it takes to download
- The more columns the SELECT statement indicates, the bigger the data volume and longer it takes to run next() method
- So total time to fetch 10 rows (or FetchSize if set to diff value) depends on (excludes time waiting for DB to run SELECT statement):
- Number of columns defined in SELECT statement
- Number of columns with data. If no data, then no impact
- Data size for each column
- Number of rows to fetch that defined by setFetchSize. Default is 10 rows
- FetchSize can be modify in real time JDBC method ResultSet.setFetchSize(new_value) anytime
SELECT Statement Tuning
After lengthy explanation of the SELECT statement architecture, let's get into SELECT statement tuning.
We will not discuss about tuning of following as some are business requirement, and some are SQL tuning beyond Java programming tuning:
- slow running SELECT statement that DB takes time to process
- poorly written SELECT statement
- SELECT statement is joining plus expecting to retrieve all data from very large tables, such as 10GB - 300TB
The performance tuning in Java program is for following Java JDBC class in their FetchSize:
- java.sql.Statement
- java.sql.PreparedStatement
- java.sql.ResultSet
- java.sql.Statement.setFetchSize()
- java.sql.PreparedStatement.setFetchSize()
- java.sql.ResultSet.setFetchSize()
Why Tune Default Row Fetch Size
- Default of 10 rows (FetchSize) is too small by today's standard. A desktop computer often comes with 1Gbps network card, while servers are using 10Gbps network card. Back to 30 years ago, network card was running at 10Mbps. Many residential internet are between 50Mbps and 1Gbps.
- Data size are significantly bigger, such as storing image, video, audio.
- Max column size has increased. Oracle LONG column type is changed to CLOB (character) and BLOG (binary) which can hold 8TB - 128TB for 1 column.
- Business users have valid usage to display 1000 rows per screen, such as viewing reports, YouTube video, X tweets, so 10 rows is too little, and anonying
- Too frequent pause when calling next() after it finished process/display 10 rows due to network latency
- It is safe to set a proper FetchSize, such as 1000 rows when the SELECT returns 1 row. It doesn't take longer to download 1 row, when FetchSize=1000
Tuning Consideration
- Does users/application really needs to retrieve so many columns in the SELECT statement?
- How big (in bytes) the column size in the SELECT statement?
- How big (in bytes) the total size of all columns in the SELECT statement?
- How many records it will return by the SELECT statement?
- Can user adjust the FetchSize from the application?
Scenario 1: Showing number, strings, and time
- number - 40 bytes
- varchar2 - 32 KB
- datetime - 16 bytes
- number - 3
- varchar2 - 5
- datetime - 2
- 1Mbps: FetchSize=1
- 10Mbps: FetchSize=7
- 1Gbps: FetchSize=762
- 2Gbps: FetchSize=1524
- 10Gbps: FetchSize=7622
- 1Mbps: FetchSize=29
- 10Mbps: FetchSize=294
- 1Gbps: FetchSize=29425
- 2Gbps: FetchSize=58851
- 10Gbps: FetchSize=294256
Over Tune Warning
- varchar2 - 32KB max
- LONG - 2GB max
- BLOB - 8 - 128TB max
- CLOB - 8 - 128TB max
- BFILE - 4GB max
- Business users often enter 4000 or more characters for a Note column inside a table. The application set to display 1000 rows (FetchSize=1000) which easily download 4MB of data into Java application/servlet.. The Java application is used by 2000 concurrent users. Programmer designed above screen in a frequently used home screen, which has 2000 concurrent users (4MB * 2000 ~ 8GB)
- Business users should consider fetching the first 100 characters from the Note column instead of displaying entire Note. If business users need to read specific Note, then click and display the specific Note
- Business users want to auto download monthly pdf when click on the "Download Monthly Check Deposit" button, similar to an invoice screen that the application has (FetchSize=50). However, business users didn't consider that the pdf in this screen often contains images of checks that are taken with cellphone, and each are 2MB easily. This screen is used by business customers who deposit max 100 of check daily, and the screen suppose to display 1 months of records. This means the button will download 100 checks * 31 days = 3,100 rows of checks in pdf file. When JDBC FetchSize was set to 100 rows in this check deposit screen, it will download 302GB of data for the entire month per business customer
- This functionality to download entire month's check should not be offered unless end-to-end network speed is 10Gbps (this still takes ~ 30 sec to download the data into Java application)
- Reporting users designed a simple monthly operation reports which are showing 5,000 columns from few tables with big data volume, and high transactions volume daily. When setFetchSize to a high value, the report always crash the Java application (run out of heap memory)
- setFetchSize should set to a lower value
- Java heap memory needs to increase
- Controls number of rows to display the output to the screen
- Encourage users to save to file
- Offers pagination in the screen, and set setFetchSize to match the number of rows display on the screen, or 1/5 (so next() will indirectly download 5 times from the DB server) of the total rows. Uses following Java code to determine number of rows (not all DB type supports last())
- ResultSet rs = stmt.executeQuery(sql);
- int rowCount = rs.last() ? rs.getRow() : 0;
- rs.beforeFirst();
No comments:
Post a Comment