Why Is My SFP Not Working?

  GenericSFP

  It’s 3 am. You’ve just finished installing your new Catalyst switches into the rack and you’re ready to turn them up and complete your cutover. You’ve been fighting for months to get the funding to get these switches so your servers can run at full gigabit speed. You had to cut some corners here and there. You couldn’t buy everything new, so you’re reusing as much of your old infrastructure as possible. Thankfully, the last network guy had the foresight to connect the fiber backbone at gigabit speeds. You turn on your switches and wait for the interminably long ASIC and port tests to complete. As you watch the console spam scroll up on your screen, you catch sight of something that makes your blood run cold:

  %GBIC_SECURITY_CRYPT-4-VN_DATA_CRC_ERROR: GBIC in port 65586 has bad crc

  %PM-4-ERR_DISABLE: gbic-invalid error detected on Gi1/0/50, putting Gi1/0/50 in err-disable state

  Huh?!? Why aren’t my fiber connections coming up? Am I going to have to roll the install back? What is going on here?!?

  You will see this error message if you have a third party SFP inserted into the Catalyst switch. While Cisco (and many others) OEM their SFP transceivers from different companies, they all have a burned-in chip that contains info such as serial number, vendor ID, and security info like a Cyclic Redundancy Check (CRC). If any of this info doens’t match the database on the switch, the OS will mark the SFP as not supported and disable the port. The fiber connection won’t come up and you’ll find yourself screaming at terminal window at 3:30 in the morning.

  Why do vendors do this? Some claim it’s vendor lock in. You are stuck ordering your modules from the vendor at an inflated cost instead of buying them from a different source. Others claim it’s to help TAC troubleshoot the switch better in case of a failure. Still others say that it’s because the manufacturing tolerances on the vendor SFPs is much better than the third party offerings, even from the same OEM. I don’t have the answer, but I can tell you that Cisco, HP, Dell, and many others do this all the time.

  HP is the most curious case that I’ve run into. Their old series A SFP modules (HP calls them mini-GBICs) didn’t even have an HP logo. They bore the information from Finisar, an electroics OEM. The above scenario happened to me when I traded out a couple of HP 2848 swtiches for some newer 2610s. The fiber ports locked up solid and would not come alive for anything. I ended up putting the old switches back in place as glorified fiber media converters until I figured out that new SFPs were needed. While not horribly expensive, it did add a non-trivial cost to my project, not to mention all the extra hours of troubleshooting and banging my head against a wall.

  Cisco has an undocumented and totally unsupported solution to this problem. Once you start getting the console spam from above, just enter these commands:

  service unsupported-transceiver

  no errdisable detect cause gbic-invalid

  These commands are both hidden, so you can’t ? them. When you enter the first command, you get the Ominous Warning Message of Doom:

  Warning: When Cisco determines that a fault or defect can be traced to the use of third-party transceivers installed by a customer or reseller, then, at Cisco’s discretion, Cisco may withhold support under warranty or a Cisco support program. In the course of providing support for a Cisco networking product Cisco may require that the end user install Cisco transceivers if Cisco determines that removing third-party parts will assist Cisco in diagnosing the cause of a support issue.

  It goes without saying that calling TAC with a non-Cisco SFP in the slot is going to get you an immediate punt or request to remove said offending SFP. You’ll likely argue that your know the issue isn’t with the SFP that was working just fine an hour ago. They will counter with not being able to support non-Cisco gear. You’ll complain that removing the SFP will create additional connectivity issues and eventually you’ll hang up in frustration. So, don’t call TAC if you use this command. In fact, I would counsel that you should only use this command as a short term band-aid to get your out of the data center at 3 am so you can order genuine SFPs the next morning. Sadly, I also know how budgets work and how likely you are to get several hundred dollars of extra equipment you “forgot” to order. So caveat implementor.