Views:
DEI Bit Related Speed Issues

Summary

In June 2021 Fortinet NZ approached us to enquire about an issue they had been investigating. A number of throughput issues had been raised by some of their customers deployed on various ISPs within New Zealand. The symptoms of the issue were that the end user would only get around 2Mbps download rate. For the Fortinet NZ customers there was a multifaceted issue combining FortiOS 6.2.4 or higher, a Broadcom wolfhound based switch and a Chorus UFB service. Each of these factors will get explored in detail below. The root of the issue was to do with a single bit field inside the VLAN header known as DEI.

What is DEI?
Drop Eligibility Indicator (DEI) is a 1 bit field within the VLAN header. This field was previously known as the Canonical Format Indicator  (CFI).  CFI  was  implemented  in  the  original  802.1Q  (VLAN)  standard  and was used for compatibility between ethernet and token ring networks. CFI=0 was to indicate that the contained frame was ethernet and when CFI=1 it meant that it should not be bridged to another access port, in essence stating it is a token ring frame. This field was changed in the 2011 revision to the Q standard. A DEI=0 frame indicates that it is ineligible to be dropped and DEI=1 that it is eligible. In reality this field is only considered when congestion mechanisms need to discard frames due to interface capacity limitations. When these mechanisms are engaged DEI=1 frames  are  dropped  before DEI=0 frames.

Issue Components

Broadcom Wolfhound BCM5334x Chipset

This is the core component causing the speed problem. The issue is caused when the CFI/DEI bit in the 802.1Q header is set to 1. The switch behavior differs between switches, sometimes even between the same vendor and family of switches. It seems that this is dependent on the chipset used, Broadcom Wolfhound BCM5334x chipsets have been observed dropping the packet while Marvel and Realtek chipsets do not. According to the IEEE 801.1Q-2005 standard it is perfectly acceptable for switches to drop these  frames  when  they  egress  an  access/untagged  port  as  CFI/DEI=1  should  not  be  bridged  to  one of these ports. In our  testing  there  doesn’t  appear  to  be  an  issue  with  the  traffic  egresses  a  trunk/tagged port.

Switches known to pass DEI=1 frames:
  • FortiSwitch 108D-POE
  • FortiSwtich 108E-POE
  • TP Link TL-SG105E
  • TP Link SG-108PE

Switches  known to discard DEI=1  frames:
  • FortiSwitch 124D-POE
  • FortiSwitch 224E-POE
  • Ubiquiti/Unifi us-8-150w
  • Extreme x450G2-24p-10G4
  • Cisco WS-C2960X-24PS-L (Cisco documentation isn’t conclusive but we believe this is a Broadcom)
  • Cisco WS-C3560G-24TS (Cisco documentation isn’t conclusive but we believe this is a Broadcom)
 

Voyager have replicated this issue to a 100% drop rate with a Unifi us-8-150w switch. We set all traffic to DEI=1 and all frames were dropped while egressing a port configured as access. Additionally, we identified a (hacky) workaround to this issue by way of configuring the port to be a trunk and using the native vlan configuration to tag/untag the desired VLAN. Ability to do this may vary from vendor to vendor.

I tried to reach out to Broadcom for comment on this issue, but they didn’t respond to my request. As such I’m not sure if this is intended design and the chipset is limited to 802.1Q-2005 standard or if this is a hardware/kernel software issue. It is also possible this issue may affect other chipsets by other vendors that do not support the 802.1Q-2011 standard and in general, implemented switches should not have this issue should they support that standard. The recommendation here at Voyager is to always test any hardware based on your network design and feature requirements before deploying it to a production environment.

 

Fortinet Fortigate Version 6.2.4 or Higher

This is a contributing component and is not the root cause of issues. In version release 6.2.4 a change was implemented to how DEI marking was processed. This change was intended to fix an issue with transparent mode Fortigate deployments where DEI marking was not copied between the input/output paired interfaces. Unfortunately the implemented change  extended  beyond  the  scope  of  transparent mode and also affected separate layer 3 routed interfaces. DEI and other VLAN related header information should by default not extend beyond the scope of a broadcast domain. Fortinet are currently evaluating changes to this behaviour and it looks probable a fix will be released into the 6.4 and 7.0 major versions.

The reason this is a contributing to the issue in our known cases is that the Fortigate is passing the DEI bit from the WAN to the LAN which then in turn triggers the above Broadcom chipset limitation.

Affected  releases:
  • 6.2.4+ (Latest release at time of writing: 6.2.9)
  • 6.4.0+ (Latest release at time of writing: 6.4.6)
  • 7.0.0+ (Latest release at time of writing: 7.0.1)

Check the release notes of versions beyond the ones listed above to check if they have a fix for this. Alternatively reach out to your Fortinet NZ SE/Account Manager or we should also be able to inform you.
 

Chorus UFB Services

This is also a contributing component and is not the  root  cause  of  issues.  Chorus  implemented  a  feature known as Egress Colour Marking (ECM). This feature only affects accelerate products (BS2a etc). This feature enables DEI=1 marking on some frames that transit the Chorus  network and  is  implemented  by their equipment. The Retail Service Provider (RSP) does not have control over DEI marking across the Chorus network. The only exception to this is for services where multiple VLANs are permitted to be configured, such as BS4 services.
 
The way this works in practice is as follows:
  • ECM ON: Frames will have their DEI bit set to 0 for a value equal to the CIR of the service and DEI=1 for the remainder of the capacity that is used. In essence for example on a BS2a 1000/500 Fibre Max service the first 2.5Mbps of throughput will get marked as DEI=0 and the remainder of traffic on the service will be DEI=1.
  • ECM OFF: All frames will have their DEI bit set to 0. As per the Chorus documentation on this feature this is to enable backwards compatibility with devices that only support 802.1Q-2005.

The aspect about this that I find intriguing is that the above is implemented separately to the CIR portion of the service. In the example listed of a BS2a 1000/500 Fibre Max service the following will apply:
 
      1. 2.5Mbps PCP5 tagged CIR traffic – DEI=0
      2. First 2.5Mbps PCP0 best effort EIR traffic – DEI=0
      3. Remainder of PCP0 traffic – DEI=1


At Voyager for the most part we only ultilise the EIR portion of UFB services. The only reason I can see for the implementation here by Chorus is that they want to ensure that for the services deployed that only use the EIR there is some priority given to traffic to the equivalent of the CIR value on that service. ECM is also a feature that is either enabled or disabled for the entire RSP. It is enabled by default. Voyager, like many other ISPs, have  never  modified  this  default  behaviour.  Should  we  request  to  change this to ECM OFF it would not affect any active UFB services. We would have to make subsequent requests to redeploy all in production services, something that could take some time and unfortunately also comes with a fee. 

With ECM enabled customers may also see this  issue  should  they  implement  an  affected  switch  on  the WAN side between the ONT and their CPE. If that switch must process DEI=1 packets exiting an access port anywhere in their environment, they will encounter this issue. An example of this is implementing a switch to strip off the VLAN 10 tag from the UFB service due to their CPE not supporting VLAN configuration. A workaround we can assist you with currently for this is to change the Chorus service  to  being untagged. This means there is no requirement for a VLAN on the CPE.

Example Affected Designs

  • PC <-> Access port on affected Broadcom Switch <-> Fortigate Router on 6.2.4+ <-> Chorus UFB with ECM
  • PC <-> Fortigate Router on 6.2.4+ <-> Access port on affected Broadcom Switch <-> Chorus UFB with ECM
  • PC <-> Any CPE <-> Access port on affected Broadcom Switch <-> Chorus UFB with ECM
 

Workarounds

  • Deploy your Fortigate with FortiOS version 6.2.3 or lower.
  • Request the Chorus service be changed to untagged.
  • Attempt to implement a trunk port with native vlan tag as mentioned in the Broadcom section.
 

How we can help?

Voyager  are  happy  to  assist  our  Wholesale  and  Retail  customers  with  some  engineering  direction  or implement temporary testing Chorus circuits  should  it  assist  you  in  overcoming  this  issue.  Please  reach out to your Account Manager to enquire how we can help you.

References

  • IEEE 802.1Q Documentation - https://standards.ieee.org/standard/802_1Q-2018.html
  • IEEE 802.1Q Information - https://en.wikipedia.org/wiki/IEEE_802.1Q
  • Broadcom Wolfhound Documentation - https://www.broadcom.com/products/ethernet- connectivity/switching/strataxgs/bcm5334x
  • Chorus ECM Documentation - https://sp.chorus.co.nz/download-file/2853 – Page 13