we have the following setup in Supermicro server:
- LSI 9400 -> expander -> 10 x HDD
- LSI 9500 -> expander -> 2 x NVMe
|------------| |-----------|
| LSI 9400 | |--------------| ----->| HDD x 10 |
|------------| ---->| Expander | |-----------|
| |
|------------| ---->| | |-----------|
| LSI 9500 | |--------------| ----->| NVMe Intel|
|------------| | |-----------|
|
| |-----------|
|-->| NVMe Intel|
|-----------|
We have no problem blinking any of the bays hosting HDDs, but blinking the NVMe bays does nothing.
I would like to achieve any of these two solutions:
- Optimal solution - blink the bays containing the NVMe controller running on 9500 tri-mode
- Alternate solution - find a link/value/information that will allow me to associate an NVMe with a physical port on the LSI 9500 controller. I am thinking about something like "Look in the file /<some_path>/<some_file> and there you will find the ID of the port." More complex associations are also welcome. No problem if there are several values we have to corelate.
Operating sytem: Rocky Linux, fully under our control, we can do anything on it, no restrictions. Server configuration: It runs an ESXi with both controllers in passthrough to the Rocky Linux VM.
So far I did the following investigations and experiments.
- Try blinking it with
ledctl
-> no error, no blinking - Try blinking with
sg_ses
-> no error, no blinking. Here are some commands, trimmed to eliminate the rest of the disk.
Basically, what I want to know is: If a drive fails, which one to remove? The answer can be a blink of a led or running a command that would say "top drive" or something like that.
[root@echo-development ~]# lsscsi -g
[1:0:0:0] enclosu BROADCOM VirtualSES 03 - /dev/sg2
[1:2:0:0] disk NVMe INTEL SSDPE2KX01 01B1 /dev/sdb /dev/sg3
[1:2:1:0] disk NVMe INTEL SSDPE2KX02 0131 /dev/sdc /dev/sg4
[root@echo-development ~]# sg_ses -vvv --dsn=0 --set=ident /dev/sg2
open /dev/sg2 with flags=0x802
request sense cmd: 03 00 00 00 fc 00
duration=0 ms
request sense: pass-through requested 252 bytes (data-in) but got 18 bytes
Request Sense near startup detected something:
Sense key: No Sense, additional: Additional sense: No additional sense information
... continue
Receive diagnostic results command for Configuration (SES) dpage
Receive diagnostic results cdb: 1c 01 01 ff fc 00
duration=0 ms
Receive diagnostic results: pass-through requested 65532 bytes (data-in) but got 60 bytes
Receive diagnostic results: response:
01 00 00 38 00 00 00 00 11 00 02 24 30 01 62 b2
07 eb 55 80 42 52 4f 41 44 43 4f 4d 56 69 72 74
75 61 6c 53 45 53 00 00 00 00 00 00 30 33 00 00
17 28 00 00 19 08 00 00 00 00 00 00
Receive diagnostic results command for Enclosure Status (SES) dpage
Receive diagnostic results cdb: 1c 01 02 ff fc 00
duration=0 ms
Receive diagnostic results: pass-through requested 65532 bytes (data-in) but got 208 bytes
Receive diagnostic results: response:
02 00 00 cc 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00
01 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00
01 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00
Receive diagnostic results command for Element Descriptor (SES) dpage
Receive diagnostic results cdb: 1c 01 07 ff fc 00
duration=0 ms
Receive diagnostic results: pass-through requested 65532 bytes (data-in) but got 432 bytes
Receive diagnostic results: response, first 256 bytes:
07 00 01 ac 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 1c 43 30 2e 30 00 00 00 00 00 00 00 00
00 00 00 00 4e 4f 42 50 4d 47 4d 54 00 00 00 00
00 00 00 1c 43 30 2e 30 00 00 00 00 00 00 00 00
00 00 00 00 4e 4f 42 50 4d 47 4d 54 00 00 00 00
00 00 00 1c 43 30 2e 30 00 00 00 00 00 00 00 00
Receive diagnostic results command for Additional Element Status (SES-2) dpage
Receive diagnostic results cdb: 1c 01 0a ff fc 00
duration=0 ms
Receive diagnostic results: pass-through requested 65532 bytes (data-in) but got 1448 bytes
Receive diagnostic results: response, first 256 bytes:
0a 00 05 a4 00 00 00 00 16 22 00 00 01 00 00 04
10 00 00 08 50 00 62 b2 07 eb 55 80 3c d2 e4 a6
23 29 01 00 00 00 00 00 00 00 00 00 96 22 00 01
01 00 00 ff 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
96 22 00 02 01 00 00 ff 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 96 22 00 03 01 00 00 ff 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 16 22 00 04 01 00 00 06
10 00 00 08 50 00 62 b2 07 eb 55 84 3c d2 e4 99
70 1d 01 00 00 00 00 00 00 00 00 00 96 22 00 05
01 00 00 ff 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
96 22 00 06 01 00 00 ff 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
s_byte=2, s_bit=1, n_bits=1
Applying mask to element status [etc=23] prior to modify then write
Send diagnostic command page name: Enclosure Control (SES)
Send diagnostic cdb: 1d 10 00 00 d0 00
Send diagnostic parameter list:
02 00 00 cc 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 80 00 02 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00
01 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00
01 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00
Send diagnostic timeout: 60 seconds
duration=0 ms
[root@echo-development ~]# sg_ses -vvv --dsn=6 --set=ident /dev/sg2
open /dev/sg2 with flags=0x802
request sense cmd: 03 00 00 00 fc 00
duration=0 ms
request sense: pass-through requested 252 bytes (data-in) but got 18 bytes
Request Sense near startup detected something:
Sense key: No Sense, additional: Additional sense: No additional sense information
... continue
Receive diagnostic results command for Configuration (SES) dpage
Receive diagnostic results cdb: 1c 01 01 ff fc 00
duration=0 ms
Receive diagnostic results: pass-through requested 65532 bytes (data-in) but got 60 bytes
Receive diagnostic results: response:
01 00 00 38 00 00 00 00 11 00 02 24 30 01 62 b2
07 eb 55 80 42 52 4f 41 44 43 4f 4d 56 69 72 74
75 61 6c 53 45 53 00 00 00 00 00 00 30 33 00 00
17 28 00 00 19 08 00 00 00 00 00 00
Receive diagnostic results command for Enclosure Status (SES) dpage
Receive diagnostic results cdb: 1c 01 02 ff fc 00
duration=0 ms
Receive diagnostic results: pass-through requested 65532 bytes (data-in) but got 208 bytes
Receive diagnostic results: response:
02 00 00 cc 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00
01 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00
01 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00
Receive diagnostic results command for Element Descriptor (SES) dpage
Receive diagnostic results cdb: 1c 01 07 ff fc 00
duration=0 ms
Receive diagnostic results: pass-through requested 65532 bytes (data-in) but got 432 bytes
Receive diagnostic results: response, first 256 bytes:
07 00 01 ac 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 1c 43 30 2e 30 00 00 00 00 00 00 00 00
00 00 00 00 4e 4f 42 50 4d 47 4d 54 00 00 00 00
00 00 00 1c 43 30 2e 30 00 00 00 00 00 00 00 00
00 00 00 00 4e 4f 42 50 4d 47 4d 54 00 00 00 00
00 00 00 1c 43 30 2e 30 00 00 00 00 00 00 00 00
Receive diagnostic results command for Additional Element Status (SES-2) dpage
Receive diagnostic results cdb: 1c 01 0a ff fc 00
duration=0 ms
Receive diagnostic results: pass-through requested 65532 bytes (data-in) but got 1448 bytes
Receive diagnostic results: response, first 256 bytes:
0a 00 05 a4 00 00 00 00 16 22 00 00 01 00 00 04
10 00 00 08 50 00 62 b2 07 eb 55 80 3c d2 e4 a6
23 29 01 00 00 00 00 00 00 00 00 00 96 22 00 01
01 00 00 ff 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
96 22 00 02 01 00 00 ff 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 96 22 00 03 01 00 00 ff 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 16 22 00 04 01 00 00 06
10 00 00 08 50 00 62 b2 07 eb 55 84 3c d2 e4 99
70 1d 01 00 00 00 00 00 00 00 00 00 96 22 00 05
01 00 00 ff 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
96 22 00 06 01 00 00 ff 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
s_byte=2, s_bit=1, n_bits=1
Applying mask to element status [etc=23] prior to modify then write
Send diagnostic command page name: Enclosure Control (SES)
Send diagnostic cdb: 1d 10 00 00 d0 00
Send diagnostic parameter list:
02 00 00 cc 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 80 00 02 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00
01 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00
01 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00
Send diagnostic timeout: 60 seconds
duration=0 ms
We used dns=0
and dns=6
becuase it seems like end devices are connected to these two ports (output trim to relevant results):
[root@echo-development ~]# sg_ses -j /dev/sg2
BROADCOM VirtualSES 03
Primary enclosure logical identifier (hex): 300162b207eb5580
[0,-1] Element type: Array device slot
Enclosure Status:
Predicted failure=0, Disabled=0, Swap=0, status: Unsupported
OK=0, Reserved device=0, Hot spare=0, Cons check=0
In crit array=0, In failed array=0, Rebuild/remap=0, R/R abort=0
App client bypass A=0, Do not remove=0, Enc bypass A=0, Enc bypass B=0
Ready to insert=0, RMV=0, Ident=0, Report=0
App client bypass B=0, Fault sensed=0, Fault reqstd=0, Device off=0
Bypassed A=0, Bypassed B=0, Dev bypassed A=0, Dev bypassed B=0
[0,0] Element type: Array device slot
Enclosure Status:
Predicted failure=0, Disabled=0, Swap=1, status: Unsupported
OK=0, Reserved device=0, Hot spare=0, Cons check=0
In crit array=0, In failed array=0, Rebuild/remap=0, R/R abort=0
App client bypass A=0, Do not remove=0, Enc bypass A=0, Enc bypass B=0
Ready to insert=0, RMV=0, Ident=0, Report=0
App client bypass B=0, Fault sensed=0, Fault reqstd=0, Device off=0
Bypassed A=0, Bypassed B=0, Dev bypassed A=0, Dev bypassed B=0
Additional Element Status:
Transport protocol: SAS
number of phys: 1, not all phys: 0, device slot number: 4
phy index: 0
SAS device type: end device
initiator port for:
target port for: SSP
attached SAS address: 0x500062b207eb5580
SAS address: 0x3cd2e4dd23290100
phy identifier: 0x0
[0,4] Element type: Array device slot
Enclosure Status:
Predicted failure=0, Disabled=0, Swap=1, status: Unsupported
OK=0, Reserved device=0, Hot spare=0, Cons check=0
In crit array=0, In failed array=0, Rebuild/remap=0, R/R abort=0
App client bypass A=0, Do not remove=0, Enc bypass A=0, Enc bypass B=0
Ready to insert=0, RMV=0, Ident=0, Report=0
App client bypass B=0, Fault sensed=0, Fault reqstd=0, Device off=0
Bypassed A=0, Bypassed B=0, Dev bypassed A=0, Dev bypassed B=0
Additional Element Status:
Transport protocol: SAS
number of phys: 1, not all phys: 0, device slot number: 6
phy index: 0
SAS device type: end device
initiator port for:
target port for: SSP
attached SAS address: 0x500062b207eb5584
SAS address: 0x3cd2e4a623290100
phy identifier: 0x0
- Find the
SAS address
from the output above in the list of drives. OurSAS address: 0x3cd2e4a623290100
should be found on a drive (NVMe, SSD, HDD, whatever). At least as I understood fromsg_ses
documentation and blog posts / forums from the Internet. But the SAS address on the NVMes are different, and the indicated SAS address from the controller cannot be found on any devices.
[root@echo-development ~]# cat "/sys/bus/pci/devices/0000:04:00.0/host1/target1:2:1/1:2:1:0/sas_address"
0x00012923a6e4d25c
- Rely on HCTL -> does not work because HCTL changes after when I remove/reinsert a drive to the bay. It also resets on reboot to 1:2:0:0 and 1:2:1:0.
- Associate
/sys/bus/pci/devices/0000:04:00.0/host1/target1:2:1/1:2:1:0/sas_device_handle
with a port on the controller. -> does not work, it increments every time a device is removed and reinserted. - Try to find any other associations between an NVMe drive and the controller port. -> I couldn't find.
Please let me know if there is anything else I could try or if you need any further information.