gosko
February 1, 2022, 3:26am
1
Hi, I am using the runtime API to add/remove/modify entries in map files as described in this blog post , and it works fine for small numbers of commands at a time, but if there are too many commands at once I usually get an error after a few hundred commands have been processed:
2022/02/01 03:15:38 socat[29468] E write(5, 0x55f4b129ee30, 67): Broken pipe
This happens when using echo | socat
from the command line, for example:
for n in {1..500}; do echo "add map /etc/haproxy/test.map /test$n -"; done | socat stdio /run/haproxy.stat
And also if using the following perl code:
foreach my $kr (sort keys %filemap) {
push( @changes, "add map $mapfile $kr " . $filemap{$kr} );
}
# open new connection to haproxy socket
my $writesock = new IO::Socket::UNIX (
Peer => '/run/haproxy.stat',
Type => SOCK_STREAM,
);
if ( ! $writesock ) {
syslog('warning',"unable to connect to haproxy socket");
return;
}
print $writesock join("\n",@changes) . "\n";
close $writesock;
If I add a sleep(3)
before the close $writesock
in the perl code the issue seems to go away, but I wonder if there’s something else I should be doing.
\Should I be limiting the number of commands given at once, or doing something else with the socket before closing it?
For now: yes, a small sleep would probably be good.
Improvements have just been applied to the development tree and will trickle down to stable branches at some point:
opened 10:19PM - 17 Jan 22 UTC
type: bug
status: fixed
2.2
2.3
2.4
2.5
### Detailed Description of the Problem
I built a system to use maps to store i… nformation about IPs. The map files are in the following format:
```
# IP timestamp/data_about_ip
10.0.0.1 1642454191/some_data
```
There's a process that runs periodically to delete old entries. I'm using the stats socket over a TCP/IP connection.
In order to be as efficient as possible, I'm sending 50+ `del map` commands at a time delimited by semicolons.
This appears to be similar to #419. `clear map` works fine on 1M+ entry map but sending say 500 `del map` commands at a time causes HAProxy to crash.
### Expected Behavior
Sending multiple `del map` commands at a time should complete successfully.
### Steps to Reproduce the Behavior
Generate a map with a 1M+ entries:
```python
#!/usr/bin/env python3
for a in range(0, 256):
for b in range(0, 256):
for c in range(0, 20):
print('10.{}.{}.{} 1642283611/some_data'.format(a, b, c))
```
Run `gen_map.py > ipinfo.map`. This generates about 1.3M entries. Load it into haproxy.
Send 500 `del map` commands at the same time:
```python
#!/usr/bin/env python
def gen_commands():
commands = []
for a in range(0, 256):
for b in range(0, 256):
for c in range(0, 20):
commands.append('del map /usr/local/etc/haproxy/maps/ipinfo.map 10.{}.{}.{}'.format(a, b, c))
if len(commands) >= 500:
return commands
commands = gen_commands()
print('{}'.format(';'.join(commands)))
```
Run `./gen_commands.py | nc localhost 8101` (whatever the stats socket is).
### Do you have any idea what may have caused this?
_No response_
### Do you have an idea how to solve the issue?
_No response_
### What is your configuration?
```haproxy
global
log stdout format raw local0
maxconn 20000
stats socket :8101
stats timeout 1h
resolvers internal_dns
accepted_payload_size 8192
parse-resolv-conf
defaults
balance roundrobin
backlog 10000
log global
mode http
option contstats
option dontlog-normal
option dontlognull
option log-separate-errors
option httplog
timeout client 30s
timeout client-fin 30s
timeout connect 5s
timeout http-keep-alive 5s
timeout http-request 5s
timeout queue 10s
timeout server 30s
timeout tarpit 5s
timeout tunnel 30s
errorfile 400 /usr/local/etc/haproxy/errors/400.http
errorfile 403 /usr/local/etc/haproxy/errors/403.http
errorfile 408 /usr/local/etc/haproxy/errors/408.http
errorfile 500 /usr/local/etc/haproxy/errors/500.http
errorfile 502 /usr/local/etc/haproxy/errors/502.http
errorfile 503 /usr/local/etc/haproxy/errors/503.http
errorfile 504 /usr/local/etc/haproxy/errors/504.http
frontend fe-test
bind :8101
maxconn 10000
no option dontlog-normal
option http-buffer-request
option forwardfor
tcp-request inspect-delay 10s
capture request header Host len 1000
capture request header X-Forwarded-For len 1000
capture request header Referer len 1000
capture request header User-Agent len 1000
http-request set-var(req.ipinfo_map) str(/usr/local/etc/haproxy/maps/ipinfo.map)
http-request set-var(req.ipinfo_time) src,map_ip(/usr/local/etc/haproxy/maps/ipinfo.map),field(1,/)
http-request set-var(req.ipinfo_age) date(),sub(req.ipinfo_time)
http-request set-var(req.crawler_id) src,map_ip(/usr/local/etc/haproxy/maps/ipinfo.map),field(2,/)
http-request set-var(req.crawler_id) str(n/a) if !{ var(req.crawler_id) -m found }
http-request return status 200 content-type text/plain string "OK\n"
```
### Output of `haproxy -vv`
```plain
# Tested with latest 2.5, 2.4, and 2.2. Using the official Docker builds and custom compiled on prod. All show same result.
HAProxy version 2.5.1-86b093a 2022/01/11 - https://haproxy.org/
Status: stable branch - will stop receiving fixes around Q1 2023.
Known bugs: http://www.haproxy.org/bugs/bugs-2.5.1.html
Running on: Linux 4.19.0-18-amd64 #1 SMP Debian 4.19.208-1 (2021-09-29) x86_64
Build options :
TARGET = linux-glibc
CPU = generic
CC = cc
CFLAGS = -O2 -g -Wall -Wextra -Wundef -Wdeclaration-after-statement -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference
OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_GETADDRINFO=1 USE_OPENSSL=1 USE_LUA=1 USE_PROMEX=1
DEBUG =
Feature list : +EPOLL -KQUEUE +NETFILTER -PCRE -PCRE_JIT +PCRE2 +PCRE2_JIT +POLL +THREAD +BACKTRACE -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H +GETADDRINFO +OPENSSL +LUA +ACCEPT4 -CLOSEFROM -ZLIB +SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL -SYSTEMD -OBSOLETE_LINKER +PRCTL -PROCCTL +THREAD_DUMP -EVPORTS -OT -QUIC +PROMEX -MEMORY_PROFILING
Default settings :
bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
Built with multi-threading support (MAX_THREADS=64, default=8).
Built with OpenSSL version : OpenSSL 1.1.1k 25 Mar 2021
Running on OpenSSL version : OpenSSL 1.1.1k 25 Mar 2021
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.3.3
Built with the Prometheus exporter as a service
Built with network namespace support.
Built with libslz for stateless compression.
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Support for malloc_trim() is enabled.
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE2 version : 10.36 2020-12-04
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 10.2.1 20210110
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
h2 : mode=HTTP side=FE|BE mux=H2 flags=HTX|CLEAN_ABRT|HOL_RISK|NO_UPG
fcgi : mode=HTTP side=BE mux=FCGI flags=HTX|HOL_RISK|NO_UPG
<default> : mode=HTTP side=FE|BE mux=H1 flags=HTX
h1 : mode=HTTP side=FE|BE mux=H1 flags=HTX|NO_UPG
<default> : mode=TCP side=FE|BE mux=PASS flags=
none : mode=TCP side=FE|BE mux=PASS flags=NO_UPG
Available services : prometheus-exporter
Available filters :
[SPOE] spoe
[CACHE] cache
[FCGI] fcgi-app
[COMP] compression
[TRACE] trace
```
### Last Outputs and Backtraces
```plain
Thread 2 is about to kill the process.
Thread 1 : id=0x7ff56bf11f00 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
1/1 stuck=0 prof=0 harmless=1 wantrdv=0
cpu_ns: poll=1702457971 now=1702483769 diff=25798
curr_task=0
*>Thread 2 : id=0x7ff56bf06700 act=1 glob=1 wq=1 rq=0 tl=0 tlsz=0 rqsz=0
1/2 stuck=1 prof=0 harmless=0 wantrdv=0
cpu_ns: poll=5100428614 now=8033195655 diff=2932767041
curr_task=0x7ff56402f7c0 (task) calls=230 last=0
fct=0x55e3bd1f5ba0(task_run_applet) ctx=0x7ff56402f6b0(<CLI>)
strm=0x7ff5640270b0,8 src=192.168.160.5 fe=GLOBAL be=GLOBAL dst=<CLI>
txn=(nil),0 txn.req=-,0 txn.rsp=-,0
rqf=c48202 rqa=0 rpf=80048202 rpa=0 sif=EST,200008 sib=EST,204018
af=(nil),0 csf=0x7ff564027060,8200
ab=0x7ff56402f6b0,4 csb=(nil),0
cof=0x7ff558026800,1300:PASS(0x7ff564026fe0)/RAW((nil))/tcpv4(29)
cob=(nil),0:NONE((nil))/NONE((nil))/NONE(0)
call trace(17):
| 0x7ff56c68e140 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x14140
| 0x55e3bd172a43 [48 89 dd 4c 39 eb 74 2f]: pat_ref_delete+0x33/0x85
| 0x55e3bd1c862f [85 c0 0f 84 39 01 00 00]: main+0x182a8f
| 0x55e3bd165316 [85 c0 0f 85 c0 00 00 00]: main+0x11f776
| 0x55e3bd165a7b [48 8b 43 10 48 63 0c 24]: main+0x11fedb
| 0x55e3bd1f5c9e [f6 43 04 10 0f 84 e8 00]: task_run_applet+0xfe/0x5d8
Thread 3 : id=0x7ff56b21d700 act=1 glob=1 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
1/3 stuck=0 prof=0 harmless=1 wantrdv=0
cpu_ns: poll=11038540 now=17561414 diff=6522874
curr_task=0
Thread 4 : id=0x7ff56a9f4700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
1/4 stuck=0 prof=0 harmless=1 wantrdv=0
cpu_ns: poll=6773842 now=13310381 diff=6536539
curr_task=0
Thread 5 : id=0x7ff56a1f3700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
1/5 stuck=0 prof=0 harmless=1 wantrdv=0
cpu_ns: poll=4447867 now=10422335 diff=5974468
curr_task=0
Thread 6 : id=0x7ff5699f2700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
1/6 stuck=0 prof=0 harmless=1 wantrdv=0
cpu_ns: poll=10040810 now=16587818 diff=6547008
curr_task=0
Thread 7 : id=0x7ff5691f1700 act=1 glob=1 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
1/7 stuck=0 prof=0 harmless=1 wantrdv=0
cpu_ns: poll=10943463 now=17364065 diff=6420602
curr_task=0
Thread 8 : id=0x7ff5689c8700 act=1 glob=1 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
1/8 stuck=0 prof=0 harmless=1 wantrdv=0
cpu_ns: poll=11443704 now=18044034 diff=6600330
curr_task=0
Aborted
Also got a segfault on one of the production boxes earlier which may be related. This has never happened before:
kernel: [7739885.813435] haproxy[23103]: segfault at 40 ip 000055a0b2520cd5 sp 00007f35f24d1380 error 4 in haproxy[55a0b2399000+1cc000]
kernel: [7739885.813449] Code: 00 00 00 48 8b 1b 48 8d 50 20 49 89 9f b8 00 00 00 48 39 d3 0f 84 c3 00 00 00 64 49 c7 46 10 00 00 00 00 41 8b b7 d8 00 00 00 <39> 73 40 75 d4 64 48 8b 3c 25 00 00 00 00 4c 8b 43 38 48 89 da 48
```
### Additional Information
_No response_
gosko
February 1, 2022, 4:22pm
3
Great, thanks.
Should I be rate-limiting my commands over the socket? The reporter of that issue said he is sending 50 del map commands per second but I haven’t seen that mentioned in the haproxy docs or blog. I don’t want these map updates to have a negative impact on the server’s performance or stability.
The current haproxy behavior it’s a bug, that’s why it’s not documented.
You can rate-limited the commands for now as a workaround, until you can use fixed code.
gosko
February 1, 2022, 5:44pm
5
If I’m running the fixed code, will it be necessary to rate-limit the commands? These map file updates are not something we’re running in production yet so I would rather wait for the haproxy update than add a rate-limiting feature to my code if it won’t be needed in a month or two.
No, rate-limiting commands will not be necessary long term, if something doesn’t work, it’s a bug that fill be fixed.
If you are running in the lab, you can try the just released v2.6-dev1
, this is a development snapshot but contains all the runtime API fixes.
The fixes will definitely be backported but with the dev snapshot you can try now.
gosko
February 1, 2022, 9:44pm
7
Excellent, thanks very much for the quick responses.
1 Like
gosko
February 18, 2022, 5:28pm
8
Hi, I saw that based on the changelog Willy’s fixes have now made it into 2.4.13, but this issue still seems to be present in 2.4.13. Is that expected?
for n in {1..5000}; do echo "add map /etc/haproxy/test.map /test$n -"; done | socat stdio /run/haproxy.stat
2022/02/18 17:16:48 socat[17569] E write(5, 0x560d34be9e30, 528): Broken pipe
This is with ‘HAProxy version 2.4.13-095275f, released 2022/02/16’
Can you file a new bug on github please?