Upgrading to a Broadcom 9600-24i on TrueNAS Dragonfish

·

·

Dracovish, the dragonfish Pokemon eating a Broadcom 9600-24i

Earlier this year, PG&E hiked up our electricity rates significantly in California. What last year was painful but acceptable electric bill each month rapidly became unacceptable, and so we started looking for opportunities to lower our consumption. One of the first things to come to mind here is the NAS we use for media storage/backups/etc. This NAS is slightly overbuilt for home user needs – it is a SuperMicro 2U 6028U-TR4T+ server with 2 Xeon CPUs, 128GB of RAM, 5 SSDs (3 PCI SSDs, 2 SATA SSDs), and 12 spinning disks connected to a BPN-SAS-826A backplane.

One of the things I did to try to lower power consumption is to make sure ASPM is enabled for all PCI-E devices. I had almost every PCI-E device working with ASPM, but similar to this post I ran into an issue getting ASPM enabled on my LSI 3008 HBA. This HBA is pretty old, so I decided to roll the dice on some questionable used hardware on eBay and get one of the 9600-24i that appear to have somewhat flooded the used market.

Adventures in second hand hardware

I purchased the aforementioned 9600-24i, and ordered a set of 3 SAS cables so that I could (if needed) use all 3 ports on the card, even though I only ended up needing one (and indeed, attempting to use 2 cost me a few hours of troubleshooting). My previous HBA was connected to two of the 4 ports on the backplane, and since I had enough cables I might as well hook up all of them right? I mean, at worst, it’ll just do multipath…

Yeah, don’t do that. For whatever reason, connecting all 4 cables just caused the HBA to flap between seeing the drives sometimes, and going into “safe mode” due to an enclosure error the rest of the time. Pop the lid, disconnected the extra two cables, no more safe mode.

The first thing I did after this was to upgrade the firmware to latest. The first thing you need to do in order to do that is to install storcli2 which is a new version of the MegaRAID management software. The existing storcli will not work. Since I’m doing this on TrueNAS, I knew I’d need to re-enable apt which usually required workarounds since they disabled apt, but no longer – as of TrueNAS Dragonfish, there is now a command to just enable “dev mode” and allow you to install things as needed. This is particularly great since it will (presumably) survive upgrades, unlike previous hacks. To enable this mode (and install build-essential and a number of other helpful packages, just ssh to your TrueNAS box and run install-dev-tools.

Next, grab the latest version of the StorCLI utility from the page for your HBA, in my base this was StorCLI_Avenger_8.9-008.0009.0000.0010.zip, and extract it on your TrueNAS system. At this point I made another critical error and I want to make sure you don’t: when you list the platforms that are available in this file, you’re going to see the following list:

root@clifford[....9-008.0009.0000.0010/Avenger_StorCLI]# ls -al
total 595
drwxr-xr-x 11 root root     14 Apr 29 23:39 .
drwxr-xr-x  3 root root      4 Jun 15 23:07 ..
drwxr-xr-x  5 root root      5 Apr 29 23:39 ARM
drwxr-xr-x  2 root root      3 Apr 29 23:39 FreeBSD
drwxr-xr-x  2 root root      3 Apr  2 01:31 JSON_Schema
drwxr-xr-x  2 root root      4 Apr 29 23:39 Linux
drwxr-xr-x  2 root root     10 Jun 17 13:08 Linux_Lite
-rw-r--r--  1 root root 563739 Feb  6 02:50 ThirdPartyLicenseNotice.pdf
drwxr-xr-x  3 root root      4 Apr 29 23:39 UEFI
drwxr-xr-x  2 root root      3 Apr  2 01:36 Ubuntu
drwxr-xr-x  4 root root      4 Apr 29 23:39 VMware
drwxr-xr-x  2 root root      3 Apr 29 23:39 Windows
-rw-r--r--  1 root root   4200 Jan 22 22:56 readme.txt
-rw-r--r--  1 root root    236 Jun  8  2022 storcli2conf.ini

You might think, “oh, Linux_Lite looks like it doesn’t need to be installed, I’ll just use that”. And while that is true, it is not the only difference. I don’t know why this isn’t documented better, but it seems the “lite” version of storcli2 is missing the ability to issue most commands. So you can download firmware, but if you try to do something like set a drive to JBOD mode, you’ll get:

root@clifford[....0000.0010/Avenger_StorCLI/Linux_Lite]# ./storcli2Lite /c0/e116/s0 set JBOD
CLI Version = 008.0009.0000.0010 Apr 02, 2024
Operating system = Linux6.6.29-production+truenas
Controller = 0
Status = Failure
Description = Un-supported command.

After getting frustrated at this for a few minutes, I decided to just install the proper .deb file from the Ubuntu subdirectory (wait, shouldn’t that be a snap then? 😏) with dpkg -i storcli2_008.0009.0000.0010_amd64.deb which puts the full version of storcli2 in opt as /opt/MegaRAID/storcli2/storcli2. I grabbed the latest firmware for the card (9600_24i_Pkg_8.9.1.0-00000-00002 as I’m writing this) and flashed it with:

/opt/MegaRAID/storcli2/storcli2 /c0 download file= 9600-24i_full_fw_vsn_pldm_pkg_signed.rom activationtype=offline (the activationtype=offline part is critical here). I then rebooted the server (I did a cold boot) and checked to see what things looked like with storcli2 /c0 show all, which gives you a bunch of information about the controller and the devices attached to it, but in particular this was what I was looking for:

PD LIST :
=======

----------------------------------------------------------------------------------------------------------
EID:Slt PID State Status DG       Size Intf Med SED_Type SeSz Model                Sp LU/NS Count Alt-EID
----------------------------------------------------------------------------------------------------------
116:0    53 JBOD  Good   -  10.914 TiB SATA HDD -        512B ST12000NM0117-2GY101 -            1 -
116:1    54 UConf Good   -  10.913 TiB SATA HDD -        512B ST12000NM0117-2GY101 U            1 -
116:2    55 UConf Good   -  10.913 TiB SATA HDD -        512B ST12000NM0117-2GY101 U            1 -
116:3    56 UConf Good   -  10.913 TiB SATA HDD -        512B ST12000NM0117-2GY101 U            1 -
116:4    57 UConf Good   -  10.913 TiB SATA HDD -        512B ST12000NM0117-2GY101 U            1 -
116:5    58 UConf Good   -  10.913 TiB SATA HDD -        512B ST12000NM0117-2GY101 U            1 -
116:6    59 UConf Good   -  10.913 TiB SATA HDD -        512B ST12000NM0117-2GY101 U            1 -
116:7    60 UConf Good   -  10.913 TiB SATA HDD -        512B ST12000NM0117-2GY101 U            1 -
116:8    61 UConf Good   -  10.913 TiB SATA HDD -        512B ST12000NM0117-2GY101 U            1 -
116:9    62 UConf Good   -  10.913 TiB SATA HDD -        512B ST12000NM0117-2GY101 U            1 -
116:10   63 UConf Good   -  10.913 TiB SATA HDD -        512B ST12000NM0117-2GY101 U            1 -
116:11   64 UConf Good   -  10.913 TiB SATA HDD -        512B ST12000NM0117-2GY101 U            1 -
----------------------------------------------------------------------------------------------------------

Here you can see all 12 of my drives in a UConf (unconfigured) state, which is exactly what I wanted. This means the drive is there, and the controller sees it, but because it can see the drive already has existing data on it, it isn’t going to do anything with it out of the box. You can see in the command output there that it lists EID:Slt as the first field. That is telling you the ID of the enclosure (EID) and the slot in that enclosure the drive is in. You can use this information to get even more detailed information about the specific enclosure and disks by doing commands such as:

root@clifford[~/hba/9600_24i_Pkg_8.9.1.0-00000-00002]# /opt/MegaRAID/storcli2/storcli2 /c0/e116/s1 show
CLI Version = 008.0009.0000.0010 Apr 02, 2024
Operating system = Linux6.6.29-production+truenas
Controller = 0
Status = Success
Description = Show Drive Information Succeeded.


Drive Information :
=================

----------------------------------------------------------------------------------------------------------
EID:Slt PID State Status DG       Size Intf Med SED_Type SeSz Model                Sp LU/NS Count Alt-EID
----------------------------------------------------------------------------------------------------------
116:1    54 UConf Good   -  10.913 TiB SATA HDD -        512B ST12000NM0117-2GY101 U            1 -
----------------------------------------------------------------------------------------------------------

You can also do show all to see even more information.

Now that we can see the drives, we need to tell the HBA to pass them through in JBOD (Just a Bunch Of Disks) mode so that ZFS can reimport them. I hadn’t used this controller before so I wanted to be super sure that the JBOD command was non-destructive so I initially only did it for one drive (I started with the slot 0 drive, but this example is for the slot 1 drive):

root@clifford[~/hba/9600_24i_Pkg_8.9.1.0-00000-00002]# /opt/MegaRAID/storcli2/storcli2 /c0/e116/s1 set JBOD
CLI Version = 008.0009.0000.0010 Apr 02, 2024
Operating system = Linux6.6.29-production+truenas
Controller = 0
Status = Success
Description = Set PD JBOD Succeeded.

You can then check to see what the operating system now sees with lsblk -f:

sde
├─sde1      zfs_member        5000  Sweetums       16017566545575716942
└─sde9

So we have our drive back and can confirmed it is being seen as an unimported ZFS member for the pool “Sweetums” (most of my devices/etc. are named after Muppets 🤷🏻‍♀️).

Since that was non-destructive, lets script importing the rest:

for i in {2..11}; do /opt/MegaRAID/storcli2/storcli2 /c0/e116/s${i} set JBOD; done

At this point you should have all of your ZFS member disks exposed to the operating system, at which point you just need to ZFS import them. Since TrueNAS can be a bit picky, I opted to do this through the UI. Go to the “Storage” dashboard and you should see something like this:

Annoyingly, it also picked up a flash drive I still had plugged into one of the USB ports initially, and it didn’t seem to have a way to exclude it so I just ran and unplugged it, giving me the right count of 14 (12 spinning drives, 2 ZIL drives).

Which is when I ran into the next problem: I hadn’t actually exported the pool, so while it is in a perfectly fine state to import, TrueNAS won’t let me because it still knows about the old pool in it’s internal database:

So much for doing this the GUI way, lets see if we can just do a zpool import:

root@clifford[~/hba/9600_24i_Pkg_8.9.1.0-00000-00002]# zpool import
   pool: Sweetums
     id: 16017566545575716942
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

	Sweetums                                  ONLINE
	  raidz3-0                                ONLINE
	    6e6601b4-adc0-2947-a7f8-f3209d0955a6  ONLINE
	    9db21b33-3de7-354f-b427-4d6e42f6cd74  ONLINE
	    9ed89515-9daa-2948-abed-67155c9dc9a9  ONLINE
	    cf6a7b5c-384b-9a45-be86-06bd54d4d6c8  ONLINE
	    e14c565e-5aaa-4641-9fbe-b972cddf308d  ONLINE
	    cfae0055-c07e-584b-ae8c-b658bd81025a  ONLINE
	    8f353860-8d24-ce40-b293-3d18ebbb3477  ONLINE
	    4642ca3d-b26c-a445-8a4c-697e25529178  ONLINE
	    bb1bdddf-3e56-dc43-b983-603c1fed42a5  ONLINE
	    4004126b-cb89-3647-ae1a-3a0a87023bd2  ONLINE
	    f7dc3c7d-8fd5-3448-85b4-1e69a18a95ef  ONLINE
	    557db7e0-72c9-de41-8e15-0e0e800fdd52  ONLINE
	logs
	  mirror-1                                ONLINE
	    3b4623d4-6f01-ae49-b3cc-70f008fbbdd2  ONLINE
	    b709736a-238b-ac41-a681-c30dddc5ccbf  ONLINE

Looks promising!

Next I did zpool import Sweetums, which succeeded but with a caveat:

Import was successful, but unable to mount some datasets

I suspect these will clear themselves up on a reboot, and otherwise things are looking good and Sweetums is back in the TrueNAS UI:

So all we have to do now is verify that the thing that started this whole journey is actually corrected. We can use lspci -vvv to get information on the HBA we installed including whether or not ASPM is enabled:

At this point, everything is looking good, we’re up and running on the new HBA, and ASPM is enabled on the card. Victory!



  1. Loading Mastodon feed…
More from the blog

Recommended Topics

Popular Tags

community dailyprompt dailyprompt-2014 elitistjerks family linux netflix polyamory tdov tech transgender truenas work zfs