One of the fun things I love about my job is the chances I have to "re-learn" things that used to cause more issues. And since I work in a large number of different environments this can come up more frequently than if I was in a single environment all the time. Sometimes, these are the same situations that happen again and again. Sometimes, it's old situations you haven't thought of for a while that rears its head in a new way. And it's this second one that just happened yesterday. So I figured, why not write it down so maybe I'll remember it if it happens again?
I was on-site with a client installing a Juniper Mist Wired Assurance Proof of Value (PoV) switching setup yesterday and ran into a weird issue. We were installing a Juniper EX4400 2 switch Virtual Chassis (VC) with 25g uplink modules and connecting it to an Arista 7050SX3-48YC8 distribution M-LAG switch pair. The Arista pair was in production, so lots of other connections were working with it. We had staged the Juniper VC, pushed all the right config, and moved it to the comm room where it would be used for the testing. When we moved it and connected up the fiber, the links didn't come up. Both sides (Arista and Juniper) reported down/down. No errors, no logs, nothing. Fun!
We swapped the 25g optics for 10g and configured the Juniper side for 10g (since it's a 25g uplink module, you must reconfigure it for 10g operations - See this Juniper KB for details). Links came up with no issues. Even more fun! Since the client wants to use 25g optics longer term we started down the road of troubleshooting.
- Is it the fiber cables? nope, both are brand new and we even swapped one for good measure
- Is it an SFP wavelength mismatch? nope, both 850nm
- Is it the FS.com optics (usually a good bet)? nope, swapped both sides for Juniper optics still the same issue
- Is it the Juniper optics (they are from the demo pool after all)? nope, swapped both sides for fs.com, and still not coming up
- Check the optics info (
show interfaces diagnostics optics interface-name
) ? all looks good. Both sides see reasonable light.
ok, WTF? This should be stupid simple. I'm out of ideas; head to the Googles to get more ideas. I started to see some discussions around Forward Error Correction (FEC) and 25g connections. Some Arista, some Cisco, and Juniper. Now, I honestly haven't thought about FEC for a while. Especially not FEC mismatches, but it makes sense. Since I said this would be quick, I'll leave you to look up more in-depth conversations around "what is FEC?" (this Cisco article has some good details). But the short version is FEC is a layer 1 (ie, physical layer) method to help deal with errors across network transmissions. It encodes some extra data in the bits and bytes being sent across a network link, and if there are issues, those extra bits can help "correct" the problems. But why is it causing the links to not come up? This is where the fun of multivendor (and, in the Cisco article above's case, multi-version) implementations begin. There are various different encoding methods for FEC. And when the encoding types don't match the link won't come up (remember FEC is layer 1).
So, how do we fix this? Since this was a guess troubleshooting step, we started by turning FEC off on both sides (Juniper - set interfaces <interface> gigether-options fec none
; Arista - no error-correction encoding
under the interface). The switches are connected with a 1m fiber cable; if we see issues across that, we can deal with it. As soon as we applied the config, poof links came up! Since we were at the end of our window with the client for this PoV setup, we called it "good for now".
As I dug into it a little bit more, it looks like Juniper defaults to using an encoding/type of FEC74, whereas Arista uses FEC108. So since FEC108 is "superior," we'll set up a time to reconfigure both switches. We'll set the Arista back to default and configure the Juniper EX to use FEC108 ( set interfaces interface-name gigether-options fec fec108
). Since it's Mist PoV we added this to the "Other CLI Config" section for the specific switch. But when we talk with the client about templates in our next PoV call, we'll show them how to add that to the template configuration to help prevent the issue in a future rollout for all switches without configuring individual links.
So there you go. I hope this is helpful if you find your way here. If not I hope I find my way here in the future if I forget!