After struggling for over 20 hours, I wanted to share the results of my investigation regarding very poor Internet upload erformance.

Setup

  • Proxmox Server with a Single 10GbE NIC
  • OPNsense VM on Proxmox
  • OPNsense uses VirtIO NICs tied to the 10GbE Linux Bridge
  • upstream Gateway is a OpenWRT router with 1GbE uplink
  • Zyxel XS1930 Switch connecting Proxmox Host and Gateway

Problem

Internet download speeds are fine (900Mbit/s) but upload speeds are not (5-15MBit/s instead of 50MBit/s)

Solution

Various OPNsense tunables (configured for 8 CPU cores)

  • hw.ibrs_disable = 1
  • net.isr.maxthreads = -1
  • net.isr.bindthreads = 1
  • net.isr.dispatch = deferred
  • net.inet.rss.enabled = 1
  • net.inet.rss.bits = 6
  • kern.ipc.maxsockbuf = 16777216
  • net.inet.tcp.recvbuf_max = 4194304
  • net.inet.tcp.recvspace = 262144
  • net.inet.tcp.sendbuf_inc = 16384
  • net.inet.tcp.sendbuf_max = 4194304
  • net.inet.tcp.sendspace = 262144
  • net.inet.tcp.soreceive_stream = 1
  • net.pf.source_nodes_hashsize = 1048576
  • net.inet.tcp.mssdflt = 1240
  • net.inet.tcp.abc_l_var = 52
  • net.inet.tcp.minmss = 536
  • kern.random.fortuna.minpoolsize = 128
  • net.isr.defaultqlimit = 2048

Enabling Multiqueue in Proxmox for the VirtIO NICs

(binary stepping, 1 Queue for 2 cores, 2 Queues for 4 cores, 3 Queue for 8 cores ect, total amount of all Queues mustn’t be greater then the VMs CPU cores)

Enabling Flow Control on all involved Network devices

  • Proxmox hardware NIC: ethtool -K nic0 rx on tx on
  • OpenWRT lan interfnace:

uci set network.lan.txpause='1'

uci set network.lan.rxpause='1'

uci commit

reload_config

  • Zyxel Switch:

Port -> Port Setup - Checked all Ports

Enabling Port Buffering

Zyxel Switch:

Port -> Port Buffer - Checked the Port with the Gateway

Reason

The Main reason for this problem seems to be the down-stepping of 10Gbit traffic to 1Gbit devices. Without Flow control enabled on all involved devices, the sending rate can’t be adjusted. But without enabling Port Buffering, the Switch won’t allocate resources for adjusting the traffic flow rate for slower devices.

This Problem should only affect people who use devices with different link speeds on the same switch.

  • litchralee@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    18
    ·
    edit-2
    17 hours ago

    Given that your original problem was related to WAN upload performance, why did your investigation lead you to Ethernet flow-control? An ISP connection generally deals in packets at Layer 3 (“network”, eg IP) of the OSI model, whereas Ethernet is a Layer 2 (“data-link”) layer technology.

    If there is a bottleneck at your WAN modem, then that will cause congestion at layer 3, but Ethernet flow-control can only deal with congestion that exists at layer 2. What has likely happened is that you have configured your gateway so that congestion at layer 3 is mirrored onto your layer 2 LAN. And if flow-control is enabled, then that would result in back-pressure propagating back to your VMs. Your VMs will then slow down their layer 2 rate, which conveniently forces the layer 3 traffic to also slow down.

    This is an incredibly round-about and inefficient way to do traffic shaping. You should not configure a network so that L3 and L2 issues bleed into each other. A major consequence of using flow-control in this way is that it reduces the capacity of your LAN, even for traffic that isn’t going out to the WAN.

    The customary approach for keeping L2 and L3 separate is to perform traffic shaping solely at the threshold where your LAN meets the bottleneck. This would be OpenWRT, since after OpenWRT would be the WAN (50 Mbps upload). OpenWRT would be configured with some sort of QoS feature so that certain L3 packets are selectively dropped.

    You cannot do effective L3 traffic shaping without dropping packets. In fact, all competent L3 protocols expect dropped packets in order to slow down their data rate: SCTP and TCP have their own exponential congestion control mechanism, UDP simply accepts that some packets won’t make it through, and QUIC has its own mechanism as well. Simply put, all L3 protocols only understand one signal that tells them to slow down, and it is to drop a few packets. They will adjust accordingly, finding the stable equilibrium where traffic flows at the very cusp of congestion.

    The Main reason for this problem seems to be the down-stepping of 10Gbit traffic to 1Gbit devices

    This is a red-herring, for the reasons I’ve outlined above. With 1+ Gbps connections on your LAN, your L2 network is an order of magnitude faster than your WAN upload. It cannot be the case that a fast LAN makes a slow WAN slower. This is not RF impedance where step-transitions cause reflections; we are dealing in packet-switched networks, where queuing theory controls.

    TL;DR: please try OpenWRT QoS instead

    • Smash@lemmy.self-hosted.siteOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 hours ago

      iperf3 between OPNsense and OpenWRT reached 900Mbit/s, so there was no bottleneck for all kinds of traffic. It only plummeted to 15Mbit/s when there were a lot of small packets to be transmitted. When I ran a speedtest on OPNsense itself, the CPU would hit 100% and I would see 50000+ Interrupts. Because OPNsense and consequently Proxmox were both connected to a 10Gbit/s Switch Port, OPNsense just flooded the OPNsense with as much traffic as possible and without buffering on the switch OPNsense was just drowning in packets. Instead of discarding all packets OpenWRT can’t handle, I think it’s a way more elegant solution to use Flow Control to throttle the transmission to what the router is capable of. I have a hard time believing that this would affect any other traffic (especially on my LAN), because the 10GbE NIC has 8 Queues which should handle different flows and TX/RX pause packets should only throttle the affected flows. Matter of fact, I just tested and I can still hit consecutive 770Mb/s transfer from my client to my NAS while running a speedtest on the OPNsense (TrueNAS is running on the same Proxmox host as OPNsense). And when disabling flow control all together, I only hit about 750Mb/s transfer speed… So I will stick with my current configuration, as it results in reliable, 0 packet loss transmissions with maximum speeds.

    • Victor@lemmy.world
      link
      fedilink
      English
      arrow-up
      8
      ·
      16 hours ago

      This is mostly going over my head, but I read all of it and it is very well written. Objective and factual (it seems to me). No belittling, just genuinely trying to deliver helpful criticism and explanation.

      🤝🙇‍♂️

      Good on you for being a good fellow human.

  • non_burglar@lemmy.world
    link
    fedilink
    English
    arrow-up
    5
    ·
    edit-2
    16 hours ago

    Wow, you diagnosed buffer bloat and applied the fix to your LAN side? Sooo much work…

    The problem is unlikely to have been on the proxmox side. Multiqueue only allows virtio to multithread TCP connections via the host CPU using more than one virtual cpu, but this is essentially like aggregating a network link; it will increase bandwidth, but not throughput. Besides, the actual limit for the proxmox internal bridge and virtio NICs is “whatever the cpu can manage”, which is sometimes over 10Gb. It’s unlikely to be slowing down traffic coming from your vms.

    • Smash@lemmy.self-hosted.siteOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      15 hours ago

      Yeah, two whole days of work 😅 The alternative would have been to install a dedicated 1GbE NIC in the servers again. The tunables and proxmox settings probably don’t do anything now but maybe in the future when I finally get FTTH. I read a lot about OPNsense performance optimization and these tweaks shouldn’t hurt anything, so might as well apply them for good measure.

      • non_burglar@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        11 hours ago

        Well, seems like it was time well spent in any case.

        If you have classic upstream buffer bloat, there are a couple of traffic shaping algorithms (cake and fq_codel) that work really well with the majority of competent routers, including opnsense/pfsense.

        Traffic shaping is definitely a can of worms, but fun to learn.

        • Smash@lemmy.self-hosted.siteOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          2 hours ago

          Yeah… well spent… not that I could just have, you know, used the ISP router and spend my time on other things ^^

          I would rather not introduce any more complexity into the routing than needed. I just hope my current setup allows the networking gear to automatically adjust itself for best performance (which it now does).

  • Yggstyle@lemmy.world
    link
    fedilink
    English
    arrow-up
    9
    arrow-down
    1
    ·
    19 hours ago

    I find that outside of chasing diminishing returns - the biggest boost to upload speeds generally involves getting service that isn’t from spectrum or comcast.

    The speeds you are suggesting seem to imply you are on some cable carrier. Be consciously aware that there is frequently fine print stating up to on your meager upload allotment. You may be optimising against unseen forces.

    It looks like you put in some effort on tuning which is awesome - and hopefully is helpful to some as a starting point… but from experience tuning is often very unique to each individual setup. That said: kudos on drawing that much out of your setup.

    • Smash@lemmy.self-hosted.siteOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      19 hours ago

      Yep, still on cable… vodafone to be exact. I should get FTTH this year, but also then only 1000 down/500 up for almost double the price (and it’s Telekom, so cloudflare services are unusable).

      But the Problem I was facing was unique to the 10GbE Adapter. I used a 1GbE adapter before for years, without this issue.

      • Yggstyle@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        18 hours ago

        But the Problem I was facing was unique to the 10GbE Adapter. I used a 1GbE adapter before for years, without this issue.

        Every home labber will eventually have that coming of age trial: ritual combat with the monster they have constructed. Glad to see you got the best of yours haha.

        • Smash@lemmy.self-hosted.siteOP
          link
          fedilink
          English
          arrow-up
          2
          ·
          17 hours ago

          Yep, I moved away from having multiple 1GbE NICs to a single 10GbE NIC and creating the network virtually using VLANs and SDNs. This freed up some PCI-E slots and cables, but also spawned a new boss

    • frongt@lemmy.zip
      link
      fedilink
      English
      arrow-up
      2
      ·
      18 hours ago

      True, but if it’s only the proxmox host with the problem, it’s probably not the cable carrier. The traffic from your different devices looks mostly the same to them.

      • Smash@lemmy.self-hosted.siteOP
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 hours ago

        And I ran a speedtest on the OpenWRT router itself to compare against. There, the upload speed also varies greatly between 35-55MBit/s, but that’s just classic cable fuckery (and still more than twice as much as OPNsense achieved in the beginning)