[Gen4] qat_instance_handles potential memory violation under large number of instances in multiple-threaded case
-
Software
- QAT driver: QAT20.L.1.1.50-00003
- QAT Engine: v1.6.0
- openssl: OpenSSL 1.1.1k FIPS 25 Mar 2021
-
Hardware
- Xeon server with 2 sockets INTEL(R) XEON(R) GOLD 6554S
- QAT physical devices: 2 socket * 4 / socket
In a system with 128 QAT VFs, each with 2 CyInstances and LimitDevAccess set to 0. Totally 512 instances shall be generated either sym or asym. A Segmentation fault happens when using openssl engine -c -t -v qatengine to validate.
Steps to reproduce:
# install qat OOT driver
$ ./configure --enable-icp-debug --enable-icp-trace --enable-icp-sriov=host
$ make -j && make install
$ cat /etc/4xxx_dev0.conf
[GENERAL]
ServicesEnabled = asym;sym
...
[SSL]
NumberCyInstances = 2
NumberDcInstances = 0
NumProcesses = 1
LimitDevAccess = 0
# Crypto - User instance #0
Cy0Name = "SSL0"
Cy0IsPolled = 1
# List of core affinities
Cy0CoreAffinity = 1
# Crypto - User instance #1
Cy1Name = "SSL1"
Cy1IsPolled = 1
# List of core affinities
Cy1CoreAffinity = 2
$ cat /etc/4xxxvf_dev0.conf
[GENERAL]
ServicesEnabled = asym;sym
[SHIM]
NumberCyInstances = 2
NumberDcInstances = 0
NumProcesses = 1
LimitDevAccess = 0
# Crypto - User instance #0
Cy0Name = "SSL0"
Cy0IsPolled = 1
# List of core affinities
Cy0CoreAffinity = 0
# Crypto - User instance #1
Cy1Name = "SSL1"
Cy1IsPolled = 1
# List of core affinities
Cy1CoreAffinity = 1
$ systemctl restart qat
# install qat Engine
$ ./autogen.sh && ./configure --with-qat_hw_dir=path_to_qat_driver && make -j && make install
$ openssl engine -c -t -v qatengine
Segmentation fault (core dumped)
After debugging it, the root cause might be the predefined QAT_MAX_CRYPTO_INSTANCES to 256. In our case, it should be 512 to accommodate all instances.
After adding some log in qat_hw_init.c
for (instNum = 0; instNum < qat_num_instances; instNum++) {
/* Retrieve CpaInstanceInfo2 structure for that instance */
printf("addr ptr of pInstanceInfo2: %ld \n", (unsigned long)&qat_instance_details[instNum].qat_instance_info);
printf("sizeof CpaInstanceInfo2: %ld \n", sizeof(CpaInstanceInfo2));
printf("addr ptr of qat_instance_handles: %ld\n",(unsigned long)&qat_instance_handles);
status = cpaCyInstanceGetInfo2(qat_instance_handles[instNum],
&qat_instance_details[instNum].qat_instance_info);
before cpaCySetAddressTranslation; instNum: 255; qat_instance_handle: 0x5555562be9d0
cpaCySetAddressTranslation() - : Called with params (0x5555562be9d0, 0x7ffff60112a0)
cpaCyStartInstance() - : Called with params (0x5555562be9d0)
cpaCyInstanceGetInfo2() - : Called with params (0x5555562be9d0, 0x7fffffffd5e0)
addr ptr of pInstanceInfo2: 140737323112320
sizeof CpaInstanceInfo2: 932
addr ptr of qat_instance_handles: 140737323112376
cpaCyInstanceGetInfo2() - : Called with params (0x5555562d1d10, 0x7ffff6269780)
Hardware watchpoint 1: qat_instance_handles
Old value = (CpaInstanceHandle *) 0x555556d60150
New value = (CpaInstanceHandle *) 0x0
0x00007ffff64d0163 in __memset_avx2_unaligned_erms () from /lib64/libc.so.6
(gdb) where
#0 0x00007ffff64d0163 in __memset_avx2_unaligned_erms () from /lib64/libc.so.6
#1 0x00007ffff5c8c43a in osalMemSet (ptr=0x7ffff6269780 <qat_instance_mutex>, filler=0 '\000', count=932) at /root/QAT20/quickassist/utilities/osal/src/linux/user_space/OsalServices.c:285
#2 0x00007ffff5c78a59 in cpaCyInstanceGetInfo2 (instanceHandle_in=0x5555562d1d10, pInstanceInfo2=0x7ffff6269780 <qat_instance_mutex>) at /root/QAT20/quickassist/lookaside/access_layer/src/common/ctrl/sal_crypto.c:3064
#3 0x00007ffff6011d50 in qat_hw_init (e=e@entry=0x55555582f2b0) at qat_hw_init.c:642
#4 0x00007ffff600eff0 in qat_engine_init (e=0x55555582f2b0) at e_qat.c:607
#5 0x00007ffff75564fd in engine_unlocked_init () from /lib64/libcrypto.so.1.1
#6 0x00007ffff7556658 in ENGINE_init () from /lib64/libcrypto.so.1.1
#7 0x000055555559dd29 in engine_main ()
#8 0x00005555555a3244 in do_cmd ()
#9 0x000055555558bf59 in main ()
When instNum becomes 256, addr of pInstanceInfo2 is 140737323112320 and it will memset 932 bytes, whose addr will be overlapped with that of qat_instance_handles
After changing QAT_MAX_CRYPTO_INSTANCES to 512, the error disappears
// e_qat.h
269 # define QAT_MAX_CRYPTO_INSTANCES 512 <- 256 at default
Thanks @Kewei-Lu for reporting the issue. We will look into this.
@Kewei-Lu , I see that in the file /etc/4xxx_dev0.conf, you had got both SSL and SHIM sections.. Is it intentional ?
@Kewei-Lu , I see that in the file /etc/4xxx_dev0.conf, you had got both SSL and SHIM sections.. Is it intentional ?
No. SSL section name only exists in PF configuration and in VF it is configured as [SHIM]
Updated the maximum crypto instances macro to 2048. https://github.com/intel/QAT_Engine/commit/1ba1227438409491cbacc6dc2e4594d57ef69fee The change is part of v2.0.0 release.