HAProxy community

[HAProxy 2.2.2] Problem with L7 fetch method `base` & `url`

I am setting up a reverse proxy that handles request target.com.mirror.abc.xyz:8888 and acts as the man in the middle between a client and target.com:8888. It also respects the original scheme of request, so that HAProxy serves https://target.com.mirror.abc.xyz:8888 with backend https://target.com:8888, and serves http://target.com.mirror.abc.xyz:8888 with backend http://target.com:8888.

To do so, I plan to detect and log the original scheme in a custom header like $tell-ngx-ori-scheme, so that I can implement such a reverse proxy by NGINX configuration proxy_pass $tell-ngx-ori-scheme$real_host$request_uri;

The way I used to detect original scheme is http-request set-var(txn.l7_fetch_proto) url,regsub(\"(^[^\/:]*:\/\/)\",\"\1\",i)
It should work, like this:

(If you are having problem loading this image, please refer to:https://regex101.com/r/dN3UYR/1)

However, when I test this configuration with curl, the logged custom var l7_fetch_proto shown as empty str "".

While I dig deeper, it appears as only the request comes from curl will cause fetched url segment to be empty str, while logs of browser access history are all shown proper url and base record.

Would someone please tell me how this situation could happen? Here is the full configuration I wrote:

    stats socket                    /var/run/haproxy.sock mode 0640 expose-fd listeners level admin
    stats timeout                   2m
    log stdout format rfc5424       local0 info

    mode                            http
    option                          http-use-htx
    log                             global

    timeout client                  30s
    timeout client-fin              5s
    timeout server                  30s
    timeout server-fin              5s
    timeout queue                   30s
    timeout connect                 5s
    timeout http-request            5s
    timeout http-keep-alive         2s
    timeout tunnel                  2m

resolvers mydns
    nameserver                      quad91
    nameserver                      quad92

frontend fe_main
    bind                            :80
    bind                            :443 ssl crt-list /etc/haproxy/crt-list.txt
    option                          logasap
    log-format                      "%{+Q}o %{-Q}ci - - [%trg] %r %ST %B \"\" \"\" %cp %ms %ft %b %s %TR %Tw %Tc %Tr %Ta %tsc %ac %fc %bc %sc %rc %sq %bq %CC %CS %hrl %hsl \"striped_dom:\" %[var(txn.striped_dom)] \"ip_striped_dom:\" %[var(txn.ip_striped_dom)] \"l7_fetch_base:\" %[var(txn.l7_fetch_base)] \"l7_fetch_url:\" %[var(txn.l7_fetch_url)] \"l7_fetch_proto:\" %[var(txn.l7_fetch_proto)]"

    http-request set-var(txn.striped_dom) req.hdr(Host),regsub(\"(^.+)\.mirror\.abc\.xyz(:\d+)?\",\"\1\2\",i)
    http-request do-resolve(txn.ip_striped_dom,mydns) var(txn.striped_dom)

    http-request set-var(txn.l7_fetch_proto) url,regsub(\"(^[^\/:]*:\/\/)\",\"\1\",i)
    http-request set-var(txn.l7_fetch_url) url
    http-request set-var(txn.l7_fetch_base) base

    # redirect scheme https code 301  if !{ ssl_fc }
    # http-response set-header        Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"

    default_backend                 rp-mirror-backend

backend rp-mirror-backend
    http-request deny               if { var(txn.ip_striped_dom) -m ip ::1/128 fc00::/7 }
    http-request set-header         Host %[var(txn.striped_dom)]
    http-request set-header         tell-ngx-ori-scheme %[var(txn.l7_fetch_proto)]

    server nginx           send-proxy-v2-ssl-cn proxy-v2-options crc32c
    option                          forwardfor except

    ## http-request set-dst            var(txn.ip_striped_dom)
    ## http-request set-dst-port       int(80)
    ## server rp-mirror      

For example, while client request http://dl.google.com.mirror.abc.xyz, those custom var to be:
striped_dom: dl.google.com
ip_striped_dom: 2a00:1450:4009:808::200e
l7_fetch_base: dl.google.com.mirror.abc.xyz/
l7_fetch_url: https://dl.google.com.mirror.abc.xyz/
l7_fetch_proto: http://
$tell-ngx-ori-scheme: http or https, is the same as l7_fetch_proto

Here is the log from real world:
$ /usr/bin/curl -I http://dl.google.com.mirror.abc.xyz

00000000:fe_main.accept(0005)=0009 from [] ALPN=<none>
00000000:fe_main.clireq[0009:ffffffff]: HEAD / HTTP/1.1
00000000:fe_main.clihdr[0009:ffffffff]: host: dl.google.com.mirror.abc.xyz
00000000:fe_main.clihdr[0009:ffffffff]: user-agent: curl/7.68.0
00000000:fe_main.clihdr[0009:ffffffff]: accept: */*
<134>1 2020-08-07T14:18:12.593028+08:00 vps_unknown haproxy 9056 - - - - [07/Aug/2020:06:18:09 +0000] "HEAD / HTTP/1.1" 503 +222 "" "" 60232 579 "fe_main" "rp-mirror-backend" "nginx" 8 0 -1 -1 +3013 SC-- 1 1 0 0 3 0 0 "" "" "striped_dom:" "dl.google.com" "ip_striped_dom:" "2a00:1450:4009:808::200e" "l7_fetch_base:" "dl.google.com.mirror.abc.xyz/" "l7_fetch_url:" "/" "l7_fetch_proto:" "/"