Search This Blog

Showing posts with label linux. Show all posts
Showing posts with label linux. Show all posts

2025-07-17

Linux: Dynamically/Realtime Changing ulimit for existing process

Product: RedHat Enterprise Linux RHEL
Version: All

Overview

Many UNIX/Linux software requesting to set process' limit using ulimit command.  This includes Oracle RDBMS, Apache httpd, Apache Tomcat, any file server, Cognos Analytics, call center servers, media server, Java base server, etc.

No vendor in any major application explain the objective of setting ulimit, and eventually the daemon/process hang or crash due to hitting such OS limit per user.

For reader who has no idea what is ulimit, the short description is that this is to configure user level's OS limit for any process run and own by the user.  This includes number of active open network connection, number of active open files, max file limit, max process RAM, etc.  This is not available in Windows, so anyone who never proper learn UNIX, they will miss this, or even miss this configuration.

Often, may IT personnel don't follow vendor's documentation to install the software.  Some vendor's document might even provided low value instead of a formula to properly tune the ulimit.  You need to understand that many developers do not understand OS, including UNIX OS, so it is challenging for a someone who don't understand OS to provide OS tuning parameter and value.

Eventually, this leads to many daemons hanging and even crash.  As an OS administrator, or software support personnel, it will be helpful to resolve the hanging, or even prevent it reach that critical state when the daemon start to behaving poorly, if you are able to catch it before it crash.

This post introduce a Linux commands which you can change the ulimit at in real time without restarting the daemon, or OS.  This command required to be run as root:
  • RedHat RHEL - prlimit
    • Documentation: https://man7.org/linux/man-pages/man1/prlimit.1.html
    • Documentation: https://linux.die.net/man/2/prlimit
  • Ubuntu - chpst
    • Documentation: https://manpages.ubuntu.com/manpages/lunar/man8/chpst.8.html
For the rest of the post, I will just refer to RHEL command of prlimit, as most company will use RHEL to run their software.

Usage

Command prlimit can be used to change the ulimit for a specific process in real time without shutting it down, or reboot OS.

You need to always specify the process ID, or called PID in "ps" command's output.  So the syntax will always to include "-p <PID" such as "prlimit -p <PID of Oracle>"

Following ulimit configuration can be configure in real time:
  • RAM related
    • max data size (RAM) - parameter -d
    • max resident set size RSS (RAM) - parameter -m
    • max stack size (RAM > stack) - parameter -s
  • Storage related
    • max file size (storage) - parameter -f
    • max number of open files - parameter -n
    • max number of file locks - parameter -x
  • Messaging related
    • max number of bytes in POSIX message queue - parameter -q
  • Process
    • max number of processes - parameter -u
Each OS resource usage has a different command to check their current utilization, so you can search in Internet to identify how much is the usage before tune up these values.  I might write a new blog post if this page is getting sufficient hit, such as more than 50,000 hit.

By adjusting the ulimit of the active process in real time, you can avoid unplan downtime during business hour, and schedule a proper maintenance window to tune the OS.

2024-06-28

Linux: How to get Parent Process (PPID)?

Product: Linux
Version: All
CPU: All

Overview

Many Linux forum and documentation often base on full Linux distribution, such as RedHat, Fedora, Ubuntu, SuSE, MacOS, and left out Linux installed in single box computer (SBC) such as WiFi router, smartphone, tablet, Raspberry Pi, Nano Pi, car's head unit, webcam, etc.  This leads to some standard UNIX design and operation slowly forgotten and replaces with newer commands.

During day-to-day troubleshooting, sometimes in addition of process ID (PID), it is required to check parent process ID (PPID) as well.  For example, what spawn the process, does the parent process terminated, etc. 

Procedure

Most Linux installed in SBC has a reduced version of Linux, such as BusyBox which is ~ 600KB used in DD-WRT and FreshTomoto firmware used in WiFi router.

These smaller footprint of Linux often bundled with "ps" command which can display limited info of a process PID.  For example, it does not display parent PID (PPID).

You can use following command to get the PPID using /proc virtual directory.

For example, assumes the PID is 2914:

root@rt-ac66u:/tmp/home/root# ls -l /proc/2914/status
-r--r--r--    1 root     root             0 Jun 28 09:20 /proc/2914/status
root@rt-ac66u:/tmp/home/root# cat /proc/2914/status
Name:   dropbear
State:  S (sleeping)
SleepAVG:       95%
Tgid:   2914
Pid:    2914
PPid:   1
TracerPid:      0
Uid:    0       0       0       0
Gid:    0       0       0       0
FDSize: 32
Groups:
VmPeak:     1156 kB
VmSize:     1156 kB
VmLck:         0 kB
VmHWM:       256 kB
VmRSS:       256 kB
VmData:      348 kB
VmStk:        84 kB
VmExe:       304 kB
VmLib:       384 kB
VmPTE:        12 kB
Threads:        1
SigQ:   0/2047
SigPnd: 00000000000000000000000000000000
ShdPnd: 00000000000000000000000000000000
SigBlk: 00000000000000000000000000000000
SigIgn: 00000000000000000000000000001000
SigCgt: 00000000000000000000000000024402
CapInh: 0000000000000000
CapPrm: 00000000fffffeff
CapEff: 00000000fffffeff

To display the PPID alone, then uses following command:
root@rt-ac66u:/tmp/home/root# grep PPid /proc/2914/status
PPid:   1

Virtual directory /proc exists as very basic design of UNIX, so it always available in all UNIX, while "ps" command might has stripped off functionality.

Above example is captured from FreshTomato firmware in WiFi router

2021-12-22

ssh: Setup Passwordless Login

Product: ssh daemon
Version: All

As SELinux become standard about 10 yr ago, but many ssh passwordless setup is still not updated, and causing many confusion.  This post is going to show the complete setup procedure.

Preparation

Enable SSH Daemon log level to DEBUG1, as this is critical to troubleshoot ssh login, as well as sharing sufficient information in the UNIX community forum, when you need to get help:

1. Login as root (or sudo su)
2. Modify /etc/ssh/sshd_config file: vi /etc/ssh/sshd_config
3. Modify

From: LogLevel INFO
To: LogLevel DEBUG1

4. Other acceptable level are DEBUG1, DEBUG2, DEBUG3
5. Restart ssh daemon: systemctl restart sshd
6. View the log while simulating ssh password-less login: tail -f /var/log/secure

Setup

Following illustration will use OS username "oracle" as this is a common example for Oracle database

1. Login as oracle user
2. Create ".ssh" directory, if doesn't exists: mkdir ~/.ssh
3. Change user permission that other user/group can't access: chmod go= ~/.ssh
4. Create authorized_keys and paste the entry from remote machine (such as PuTTY) into it: vi ~/.ssh/authorized_keys
5. Change SELinux permission for above file: restorecon -Fvv ~/.ssh/authorized_keys
6. Setup the ssh client to auto login as oracle, with local file, while another ssh session viewing /var/log/secure in real time to troubleshoot the problem, if applicable

2021-10-12

Oracle/Tomcat - Monitoring ulimit for nofile (max open files) to Resolve max file open

Product: Any product that open files, such as Oracle DB, Tomcat
Version: Any version since using floppy disk

In all software that runs in Linux (including MacOS), a good vendors will include Linux kernel tuning parameters that their software used, or tested.  In this post, I would like to discuss maximum number of concurrent files open parameter, which call following:

  • Linux (RedHat, Fedora, Ubuntu, CentOS): nofile
  • MacOS: maxfiles (won't be discussed in this post)

This configuration falls under responsibility of OS administrator & application administrator.  For application administrators, they could be DBA, programmer, web admin, LDAP admin, SAP BODS admin, ETL developer, data migration consultant, Apache Parque, etc.  The application team needs to tell OS admin how many max files the program will access (read or write) per process concurrently, while OS admin should monitor its usage, and alert application team when congested.

Linux has a 3 places to configure this per process parameter:

  • Global limit for entire OS/machine/VM - File /etc/sysctl.conf, parameter fs.file-max, verify current value /proc/sys/fs/file-max, command to change it sysctl -w fs.file-max=[new value], monitor current usage using cat /proc/sys/fs/file-nr
  • Per user, per process soft limit, the effective limit - File /etc/security/limits.conf, parameter "soft nofile," verify current value ulimit -Sn, verify current usage cat /proc/<PID>/fd
  • Per user, per process hard limit, the value user can change to without asking SA - File /etc/security/limits.conf, parameter "hard nofile," verify current value ulimit -Hn, verify current usage is not appliable as process depends on soft limit
Here are some clarification to prevent confusion:
  • "ulimit" command is per OS user, so it will display different value when login as different OS account
  • OS user can only manually increase (or automate using .bash_profile) the soft limit of nofile, but the hard limit (configured in /etc/security/limits.conf) must be changed by SA (whoever can sudo or modify file owned by root)
  • Network socket connection count towards nofile usage, in addition of file
  • Pipe file count towards nofile usage, and minimum there are 3 per process, i.e. standard input, standard error, standard output
  • File open/write by spawn thread counts toward the nofile usage of the process. If 2 threads are spawn by the process, each read/write 100 files concurrently, then total per process usage is 200 files access concurrently
  • The parameter is concurrent file open or write operation, and doesn't count files which are closed
  • DB always need to write to all dbf files (such as Oracle), plus additional log files, so total number of dbf file will be indirectly limited by OS nofile soft limit (required DB bounce to activate new value)
  • Vendors' nofile configuration is just for reference (unless its value is unlimited). Admin must adjust according to their concurrent file access usage
  • Direct raw storage usage, such as raw device used by Oracle RAC is counted, so the more raw device, the higher the concurrent nofile usage
  • Per process monitoring is most accurately by checking /proc/<PID/fd.  ls command can display which file it is accessing (concurrently), netstat  -anp| grep <PID> can display the network port it used, lsof -p <PID> can show pipe + file + network port
  • Extremely low value, or high usage could cause application unable to display "reached max file open" error, and causes it near impossible to troubleshoot, as OS doesn't capture historical value (when process crash/hang/misbehave)
Example output of lsof for PID 25661 by showing entries related to nofile:
$ lsof -P -p 25661 | | awk '$4 ~ /[[:digit:]]/ {print}'
COMMAND     PID USER   FD   TYPE  DEVICE  SIZE/OFF      NODE NAME
al_engine 25661  sap    0r  FIFO     0,9       0t0   2188223 pipe
al_engine 25661  sap    1w  FIFO     0,9       0t0   2188224 pipe
al_engine 25661  sap    2w  FIFO     0,9       0t0   2188225 pipe
al_engine 25661  sap    3r   REG   259,1    637448 156135934 /opt/sap/dataservices/bin/BEError_message_en.bin
al_engine 25661  sap    4r   REG   259,1      5492 156135932 /opt/sap/dataservices/bin/broker_message_en.bin
al_engine 25661  sap    5r   REG   259,1     91966 156135931 /opt/sap/dataservices/bin/BETrace_message_en.bin
al_engine 25661  sap    6u  IPv4 2188262       0t0       TCP localhost:37482->localhost:4012 (ESTABLISHED)

FD (nofile entry, which means File Descriptor) 0, 1 and 3 are standard input/output/error.
There is 1 TCP socket open
lsof -P is not to translate port number to application name, in this case port 4012 and port 37482
File type FIFO means pipe file
File type REG means regular file
File type IPv$ means TCP/IP ver 4
Command is the program that read/write/talk to these file/tcp/pipe/device

Summary of useful command, or files

CommandPurposePer Process
ulimit -Sn    Shows default soft limit (effective value) of nofile for new process/command that is going to useY
ulimit -nSame as "ulimit -Sn"Y
grep file-max /etc/sysctl.confChecks current global max concurrent file for entire OS/VM. If not configured, that it won't return any valueN
sysctl fs.file-maxSame as /etc/sysctl.conf, but it will always return a value, even if not defined in /etc/sysctl.confN
sudo sysctl -w fs.file-max <new max value>Modify the global concurrent max file openN
ls -l /proc/<PID>/fdDisplay all file accessed by specific processY
grep "open files" /proc/<PID>/limitsDisplay effective max file open soft & hard limit by specific process. This override the ulimit command output, and cannot be changed unless restart the process (after changed ulimit)Y
lsof -P -p <pid>  | awk '$4 ~ /[[:digit:]]/ {print}'More verbose output than /proc/<PID>/fd. It will display the file open for read/write, TCP port, block device name, pipe open for read/writeY
wc -l /proc/<PID>/fdCount nofile usage by specific processY
lsof -p <pid>  | awk '$4 ~ /[[:digit:]]/ | wc -lSame as wc -l above but based on lsof outputY

2009-04-17

Red Hat use RAM as /tmp

Put following setting in /etc/fstab to mount 512 MB of RAM as /tmp
none /tmp tmpfs size=512m,mode=1777 0 0

Isn't that cool

2008-11-13

vi Macro Sample 1 for Oracle

Product: Oracle database, UNIX vi editor

Tips for Oracle in UNIX or Linux environment. This will map F4 function key to perform following so that it helps DBA to edit the SQL script
1. Insert a blank line before each of the ALTER TABLE statement

This is helpful when you get a SQL from someone, or imp utility, which you need to find the position of ALTER TABLE fast while scrolling down the page

Following vi macro will search for ALTER TABLE, and insert a blank line before it.  Then it move the cursor down 2 lines
:map <F4> /ALTER TABLE^V<ESC>O^V<ESC>jj

2008-08-28

Compile Fedora 9 kernel 2.6.25.14-108

Fedora 9 installation does not install kernel source by default. Due to my cheap notebook sound chip is MagicMedia 256AV revision 20, in a HP OmniBook 4150, so far nobody successfully get it working in this notebook model. This cheap dual purpose audio and VGA chipset is used in several cheap HP notebook, Dell, and Sony as well. They are some sucess on those who use HP OmniBook 900, Dell and Sony, but all of them seem to encounter a lots of struggle in getting sound to work, internal mic, or external mic to work.

After 2 days of struggling in Google, and Yahoo, plus old info posted by people from Fedora 6, Debian, Fedora Forum, RedHat 5 Forum, etc. I have no success to get the sound working, except the chip is recognize as a valid PCI device (shown in lspci command).

With my knowledge of Linux since 1995 (Slackware), Fedora 3-9, digital interfacing, low level interfacing, computer architecture, and understanding of ALSA replacement of OSS sound layer, I believe it should starts from kernel source.

Preparation:

1. Get all necessary tools

$ yum install make m4 unifdef rpm-build glibc-2.8-8 glibc-devel-2.8-8 kernel-headers-2.6.25.14-108 glibc-headers-2.8-8 gcc-4.3.0-8 glibc-common-2.8-8 gcc-4.3.0-8 ncurses-devel

The version number is optional. yum will auto download latest version

I want to use GTK graphical menu to configure kernel options, so I will install additional modules:
$ yum install gtk2-devel libglade2-devel

It will:
Auto install: atk-devel, autoconf, automake, cairo-devel, docbook-style-dssssl, docbook-style-xsl, docbook-utils, glib2-devel, gtk-doc, imake, libXcomposite-devel, libXfixes-devel, libXi-devel, pango-devel, perl-SGMLSpm, pixman-devel
Auto update: glib2, gtk2, pango

2. Download source code

$ yumdownloader --source kernel
This will download kernel-2.6.25.14-108.fc9.src.rpm, or current kernel that my Fedora 9 is using. File will be download into current directory

3. Create parent directories

$ mkdir -p /usr/src/redhat

4. Install kernal source codes into /usr/src

$ rpm -i kernel-2.6.25.14-108.fc9.src.rpm

5. Build platform specify file. For my notebook, the platform is i686, which shown by command uname -m. It will uncompress files into /usr/src/redhat/BUILDS directory

$ rpmbuild -bp --target=$(uname -m) /usr/src/redhat/SPECS/kernel.spec

6. Configure using graphical interface
cd /usr/src/redhat/BUILD/kernel-2.6.25/linux-2.6.25.i686
make gconfig (if wants to use text base interface, then use "make menuconfig." It will needs ncurses-devel package to install, by using yum install ncurses-devel)

7. In the configuration, I select to embeded following 2 device driver into kernel
7.1. Cisco Aeronet 340 card wireless network card
7.2. NeoMagic Magic Media 256 AV sound card

8. Compile vmlinuz binary file
make bzImage

9. It will create file /usr/src/redhat/BUILD/kernel-2.6.25/linux-2.6.25.i686/arch/x86/boot/bzImage

10. Copy the file to here and call /boot/vmlinuz080830. I will use grub boot menu to load this kernel for testing instead of replacing old vmlinuz file

11. Continue to compile loadable modules. Due to my PC is a Intel Core 2 Duo quad core, I compile this in parallel while compiling bzImage file. This will takes longer than bzImage compilation due to it needs to compile many device drivers

make modules

12. Install modules into /lib/module using following command

make modules_install

13. To test this kernel, I reboot my PC. On the grub menu, I press space bar to stop the boot loader. Then I edit the boot command and replace the vmlinuz-2.6.25.14 as vmlinuz080830. Press "b" to boot using this option. If I like the kernel, then I will replace it in /boot directory, else I will compile again with different options
=======================
Update ALSA Files

yum install alsa-oss alsa-tools alsa-lib alsa-firmware alsa-mixergui alsa-plugins alsa-utils

Testing: /usr/bin/speaker-test, /usr/bin/alsa-info

2008-05-07

Linksys WRT55ag v2 Hacking #1

Bought this wireless router for $50 used. Found that it is highly unstable when using P2P and Skype for voice chat.

With P2P such as eMule, it hang average about every 6 hr. With Skype voice chat, it hang every 30 min about 20% of the time.

Found that DD-WRT, and OpenWRT both are under development to replace the firmware, while Linksys does not provide any later version neither.

There are some information I gather to hack this router and replace it with OpenWRT:
  1. Steps to compile Linux kernel
  2. Serial cable configuration, which use JP1 socket inside WRT55ag board
  3. Processor is 200 MHz Atheros AR50001AP
  4. Firmware bootloader is VxWork for ar531x rev 0x00005742
  5. Network interface name is /dev/ae0
  6. Network loopback name is /dev/lo0
  7. EEPROM size 4 MB, RAM 16 MB
  8. Serial cable is the only approach to control the device
  9. Serial protocol is 9600, N, 8-bit, 1 stop, no hardware handshake
  10. Only found 1 homepage who installed it successfully http://legacy.not404.com/OpenWRT/Atheros/Linksys/WRT55AGv2
I will need to custom make a serial cable to connect to socket JP1 on the WRT55ag board. I also need to get a small screw driver to open the casing as well