Search This Blog

2023-01-27

Oracle No more data to read from socket

Product: Oracle Database
Version: All

I often found from StackOverflow.com that people are posting following error from their application:

No more data to read from socket

The application could be Java, Springframework, etc, and they often blame the application problem, as the error is captures by the application, e.g. Java application/daemon.

Often people will suggest possible cause, or possible resolution.  Some people will even suggest not related resolution which confused the entire IT community.

The full Oracle error is below:

ORA-17410 No more data to read from socket

There are several possible causes, but many people don't put when each is applicable.  The issue is 100% not due to the application, but problem with the infrastructure (network) or Oracle database:

Cause #1: Intermittent Network issue

Behavior #1: Application will fail with this error intermittently

Behavior #2: Point of failure will be random, and won't be in the same table/index/SELECT SQL, as network failure is intermittent

Behavior #3: Program should failed when pull data into the application, such as SELECT, but should not fail with CREATE TABLE, CREATE INDEX, DELETE, DROP, ALTER

Behavior #4: Any application will be affected by intermittent network issue, instead of just 1 application

Behavior #5: Perform a few GB size file transfer between the application and remote DB server to simulate intermittent network issue. The duration must be longer than the SQL execution time. E.g. if it is running a SELECT that take 5 min (then failed), then the file transfer time must be more than 5 min, such as 10 min. So the test is not actually based on file size, but file x'fer time that last as long as the failed SQL

Problem Simulation: Simulate a 5 - 30 min long file transfer to confirm the issue is network related. The source and target server used to simulate file transfer must include the application server with the remote DB server

Workaround: Once confirmed, you can move the DB and application into the same machine to eliminate network issue.  The network admin can take longer time to provide a permanent workaround

Cause #2: Intermittent DB Failure

Behavior #1: Application could fail with this error randomly or consistently

Behavior #2: Oracle alert log will contain more detail error message

Behavior #3: Oracle alert log could throw ORA-00600 that DBA needs to raise ticket with Oracle Support due to DB patch/bug

Behavior #4: Even putting both application and Oracle database into the same machine, this error will continue to exists intermittently

Problem Simulation: Capture the failed SQL, and run it manually using SQL Developer or sqlplus.  You might need to create a shell script (PowerShell for Windows) to run it few 1000s time and last for 1 hr to simulate the error

Workaround #1: If it is ORA-00600 Oracle error, then try it with higher or lower Oracle database version as temporary workaround. There are potential Oracle bug that affect specific version and patch.

Workaround #2: There could be other OS issue that affected the Oracle database. Move Oracle database to another different machine (OS)

Workaround #3: Check RAM and storage to ensure no low storage, or extremely high RAM usage when the application hitting the error

Cause #3: Wrong JDBC driver

Behavior #1: Java application consistently failed and not intermittent

Problem Simulation: Create a simple Java program and run the same SQL by using the exact JDBC driver.  This should consistently failed

Workaround: Replace the Oracle JDBC driver with Oracle's

Cause #4: Special data or table structure that is not supported

Behavior #1: Application is creating table, or inserting data that is more advance, e.g. Spatial.  The program will consistently failed for the same SQL

Behavior #2: This is not intermittent.  If you think it is intermittent, then capture all the of failed SQL, and look for the pattern.  They must use some special data type that is not varchar2, number, date, long, clob, blob, function, partition

Problem Simulation: Capture the SQL, and create a simple Java program to run the same SQL. The Java program will consistently fail

Workaround #1: Rewrite the Java program on this SQL.  It could be wrong Java object that is not compatible

Workaround #2: Very old JDBC driver that doesn't support that new feature can be another cause.  Replace with newer Oracle JDBC driver

Cause #5: Firewall Timeout - idle connection clearing

Behavior #1: Some enterprise firewall can forcefully close idle connection, and leads to long running SQL's DB connection will dropped by firewall server after it hit its idle timeout value
Behavior #2: Find out the time between connection dropped and SQL is fired.  The time is constant, such as 30 sec

Workaround #1: Network admin to review the firewall rule, and exclude TCP port 1521 connection to & from Oracle database server

Workaround: Replace the Oracle JDBC driver with Oracle's

Cause #6: Router Timeout - idle connection clearing

Behavior #1: Some enterprise router can forcefully close idle connection, and leads to long running SQL's DB connection will dropped by firewall server after it hit its idle timeout value
Behavior #2: Find out the time between connection dropped and SQL is fired.  The time is constant, such as 30 sec

Workaround #1: Network admin to review the router rule, and exclude TCP port 1521 connection to & from Oracle database server

No comments: