Search This Blog

2021-10-12

Oracle/Tomcat - Monitoring ulimit for nofile (max open files) to Resolve max file open

Product: Any product that open files, such as Oracle DB, Tomcat
Version: Any version since using floppy disk

In all software that runs in Linux (including MacOS), a good vendors will include Linux kernel tuning parameters that their software used, or tested.  In this post, I would like to discuss maximum number of concurrent files open parameter, which call following:

  • Linux (RedHat, Fedora, Ubuntu, CentOS): nofile
  • MacOS: maxfiles (won't be discussed in this post)

This configuration falls under responsibility of OS administrator & application administrator.  For application administrators, they could be DBA, programmer, web admin, LDAP admin, SAP BODS admin, ETL developer, data migration consultant, Apache Parque, etc.  The application team needs to tell OS admin how many max files the program will access (read or write) per process concurrently, while OS admin should monitor its usage, and alert application team when congested.

Linux has a 3 places to configure this per process parameter:

  • Global limit for entire OS/machine/VM - File /etc/sysctl.conf, parameter fs.file-max, verify current value /proc/sys/fs/file-max, command to change it sysctl -w fs.file-max=[new value], monitor current usage using cat /proc/sys/fs/file-nr
  • Per user, per process soft limit, the effective limit - File /etc/security/limits.conf, parameter "soft nofile," verify current value ulimit -Sn, verify current usage cat /proc/<PID>/fd
  • Per user, per process hard limit, the value user can change to without asking SA - File /etc/security/limits.conf, parameter "hard nofile," verify current value ulimit -Hn, verify current usage is not appliable as process depends on soft limit
Here are some clarification to prevent confusion:
  • "ulimit" command is per OS user, so it will display different value when login as different OS account
  • OS user can only manually increase (or automate using .bash_profile) the soft limit of nofile, but the hard limit (configured in /etc/security/limits.conf) must be changed by SA (whoever can sudo or modify file owned by root)
  • Network socket connection count towards nofile usage, in addition of file
  • Pipe file count towards nofile usage, and minimum there are 3 per process, i.e. standard input, standard error, standard output
  • File open/write by spawn thread counts toward the nofile usage of the process. If 2 threads are spawn by the process, each read/write 100 files concurrently, then total per process usage is 200 files access concurrently
  • The parameter is concurrent file open or write operation, and doesn't count files which are closed
  • DB always need to write to all dbf files (such as Oracle), plus additional log files, so total number of dbf file will be indirectly limited by OS nofile soft limit (required DB bounce to activate new value)
  • Vendors' nofile configuration is just for reference (unless its value is unlimited). Admin must adjust according to their concurrent file access usage
  • Direct raw storage usage, such as raw device used by Oracle RAC is counted, so the more raw device, the higher the concurrent nofile usage
  • Per process monitoring is most accurately by checking /proc/<PID/fd.  ls command can display which file it is accessing (concurrently), netstat  -anp| grep <PID> can display the network port it used, lsof -p <PID> can show pipe + file + network port
  • Extremely low value, or high usage could cause application unable to display "reached max file open" error, and causes it near impossible to troubleshoot, as OS doesn't capture historical value (when process crash/hang/misbehave)
Example output of lsof for PID 25661 by showing entries related to nofile:
$ lsof -P -p 25661 | | awk '$4 ~ /[[:digit:]]/ {print}'
COMMAND     PID USER   FD   TYPE  DEVICE  SIZE/OFF      NODE NAME
al_engine 25661  sap    0r  FIFO     0,9       0t0   2188223 pipe
al_engine 25661  sap    1w  FIFO     0,9       0t0   2188224 pipe
al_engine 25661  sap    2w  FIFO     0,9       0t0   2188225 pipe
al_engine 25661  sap    3r   REG   259,1    637448 156135934 /opt/sap/dataservices/bin/BEError_message_en.bin
al_engine 25661  sap    4r   REG   259,1      5492 156135932 /opt/sap/dataservices/bin/broker_message_en.bin
al_engine 25661  sap    5r   REG   259,1     91966 156135931 /opt/sap/dataservices/bin/BETrace_message_en.bin
al_engine 25661  sap    6u  IPv4 2188262       0t0       TCP localhost:37482->localhost:4012 (ESTABLISHED)

FD (nofile entry, which means File Descriptor) 0, 1 and 3 are standard input/output/error.
There is 1 TCP socket open
lsof -P is not to translate port number to application name, in this case port 4012 and port 37482
File type FIFO means pipe file
File type REG means regular file
File type IPv$ means TCP/IP ver 4
Command is the program that read/write/talk to these file/tcp/pipe/device

Summary of useful command, or files

CommandPurposePer Process
ulimit -Sn    Shows default soft limit (effective value) of nofile for new process/command that is going to useY
ulimit -nSame as "ulimit -Sn"Y
grep file-max /etc/sysctl.confChecks current global max concurrent file for entire OS/VM. If not configured, that it won't return any valueN
sysctl fs.file-maxSame as /etc/sysctl.conf, but it will always return a value, even if not defined in /etc/sysctl.confN
sudo sysctl -w fs.file-max <new max value>Modify the global concurrent max file openN
ls -l /proc/<PID>/fdDisplay all file accessed by specific processY
grep "open files" /proc/<PID>/limitsDisplay effective max file open soft & hard limit by specific process. This override the ulimit command output, and cannot be changed unless restart the process (after changed ulimit)Y
lsof -P -p <pid>  | awk '$4 ~ /[[:digit:]]/ {print}'More verbose output than /proc/<PID>/fd. It will display the file open for read/write, TCP port, block device name, pipe open for read/writeY
wc -l /proc/<PID>/fdCount nofile usage by specific processY
lsof -p <pid>  | awk '$4 ~ /[[:digit:]]/ | wc -lSame as wc -l above but based on lsof outputY

No comments: