This is driving me insane. ACL and path should be really simple but for some reason, HAproxy is not matching the ACL for path.
Local testing input: curl -k https://localhost/robots.txt --header 'Host: www.foo.com' -A "Robot"
Current results: http denys
403 Forbidden
Request forbidden by administrative rules.
Expected results: show the robots.txt file and not the 403.
Here is my config without the PII attached
global
log 127.0.0.1 len 10000 local1
maxconn 100000
user haproxy
group haproxy
daemon
#quiet
pidfile /var/run/haproxy.pid
#tune.maxrewrite 16384
#tune.bufsize 65536
#nbproc 1
#nbthread 4
stats socket /var/run/monitor-haproxy/haproxy.sock mode 0600 level admin
master-worker
defaults
log global
mode http
option httpchk GET /health/
option httplog
option dontlognull
option dontlog-normal
option log-separate-errors
option http-ignore-probes
retries 3
option redispatch
maxconn 30000
timeout connect 5s
timeout client 60s
timeout server 60s
timeout http-keep-alive 60s
timeout http-request 5s
timeout client-fin 6s
option http-keep-alive
option tcp-smart-accept
option tcp-smart-connect
option http-buffer-request
option allbackups
stats enable
stats show-legends
stats uri /lbstats
stats refresh 10
balance roundrobin
hash-type consistent djb2 avalanche
no option log-health-checks
resolvers localdns
nameserver localdns 127.0.0.1:53
frontend stats
bind *:7684
stats enable
stats refresh 10s
stats realm Monitor
stats uri /hc
frontend http-in
bind *:80
default_backend sitecode
frontend https-in
bind *:443 ssl crt /etc/ssl/private/self_signed.pem
monitor-uri /health.gne
# If you are looking for robots.txt go to sitecode
acl check-robots-txt path_reg -i -f /etc/haproxy/robots.list
acl check-robots-txt path -m sub robots.txt
acl is-a-bot req.fhdr(user-agent) -m sub -i bot
http-request deny if is-a-bot
use_backend sitecode if check-robots-txt
...
backend sitecode
option httpchk
http-check send meth GET uri /favicon.ico hdr "Host" "192.168.0.0"
server-template live-www-lb 1-12
foobar.us-east-2.elb.amazonaws.com:443 resolvers localdns resolve-prefer ipv4 init-addr last,libc,none check ssl verify none no-sslv3 inter 10s fall 4
We have an ACL that looks for “bots” in our user agent string and if it’s not a list of “good” bots we want Crawling, we want to block it from going further. Hence in my example with the User-Agent “Robots” shoudl satisify that result.
Any help would be appreciated. Thanks