qat_timer_poll_func deadlock?
nginx: https://github.com/alibaba/tengine qat: QAT_Engine-1.5.0 + QAT.L.4.23.0-00001 qat config refences: https://tengine.taobao.org/document/tengine_qat_ssl.html os: centos7u5 kernel: 5.10
run command: ./sbin/nginx -c ./conf/nginx.conf -t /var/log/messages dh895xcc 0000:60:00.0: Process 16680 nginx exit with orphan rings
nginx master process stack info:
Thread 2 (Thread 0x7f20c8d3b700 (LWP 29076)):
#0 0x00007f20cc2694ed in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f20cc264dcb in _L_lock_883 () from /lib64/libpthread.so.0
#2 0x00007f20cc264c98 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f20cac066cf in qat_timer_poll_func (ih=
@Yogaraj-Alamenda @venkatesh6911 can you help me solve or analyze the problem?
Hi @lastpepole , thanks for reporting the issue. We will have someone look into this.
@venkatesh6911 have new result?
@lastpepole , we have created internal defect to track this. Can you share the nginx.conf ?
@venkatesh6911 default nginx.conf file. don't modify any content.
nginx: https://github.com/alibaba/tengine qat: QAT_Engine-1.5.0 + QAT.L.4.23.0-00001 qat config refences: https://tengine.taobao.org/document/tengine_qat_ssl.html os: centos7u5 kernel: 5.10
@venkatesh6911 can you reproduce this case? or any new result?
We are unable to reproduce the deadlock scenario in our test env. The QAT Engine offload seems fine. Driver: QAT.L.4.23.0-00001 Openssl Version: openssl-3.0.13 Qatengine: v1.5.0 Tengine: master branch
Certs and Cipher used to test. certs: server-rsa2k protocol: TLSv1.2 Cipher: AES128-GCM-SHA256
nginx.conf used is attached. nginx_conf.txt
Please let us know if you find any deviations in the mentioned configs. Can you also let us know what cipher and certificate you are using ?
kernel: 5.10 os: centos7u5 qat driver: 4.25.0 qat engine: 1.5.0 openssl: 1.1.1w tengine: tengine-2.4.0
step 1: compiler openssl and qat engine module. qat engine: v1.5.0 openssl: OpenSSL 1.1.1w, directory is /usr/local/openssl
step 2: modify openssl.cnf(/usr/local/openssl/ssl/openssl.cnf)
openssl_conf = openssl_def [openssl_def] engines = engine_section
[engine_section] qatengine = qat_section
[qat_section] engine_id = qatengine dynamic_path = /usr/local/openssl/lib/engines-1.1/qatengine.so default_algorithms = RSA
step 3: modify auto/lib/openssl/conf file of tengine. before: 39,42c39,42
CORE_INCS="$CORE_INCS $OPENSSL/.openssl/include" CORE_DEPS="$CORE_DEPS $OPENSSL/.openssl/include/openssl/ssl.h" CORE_LIBS="$CORE_LIBS $OPENSSL/.openssl/lib/libssl.a" CORE_LIBS="$CORE_LIBS $OPENSSL/.openssl/lib/libcrypto.a"
after:
CORE_INCS="$CORE_INCS $OPENSSL/include" CORE_DEPS="$CORE_DEPS $OPENSSL/include/openssl/ssl.h" CORE_LIBS="$CORE_LIBS $OPENSSL/lib/libssl.so" CORE_LIBS="$CORE_LIBS $OPENSSL/lib/libcrypto.so"
step 4: compile tengine
./configure --prefix=/tmp/tengine --with-openssl-async --with-openssl=/usr/local/openssl --with-debug make -j 20 make install
step 5: add env variable(./sbin/nginx can find openssl library)
cd /tmp/tengine export LD_LIBRARY_PATH=/usr/local/openssl/lib:/usr/local/openssl/lib/engines-1.1:$LD_LIBRARY_PATH export OPENSSL_CONF=/usr/local/openssl/ssl/openssl.cnf
step 6: modify nginx.conf + run tengine
nginx.conf: http { ...... server { listen 443 ssl; server_name localhost; access_log off; ssl_certificate server.crt; ssl_certificate_key server.key; ssl_session_cache shared:SSL:1m; ssl_session_timeout 5m; ssl_async on; #### very important, enable async mode ssl_ciphers HIGH:!aNULL:!MD5; ssl_prefer_server_ciphers on; location / { root html; index index.html index.htm; } } }
run command:
./sbin/nginx -c ./conf/nginx.conf
step 7: pstack tengine nginx pid,and the phenomenon can reproduced.
pstack XXXX
I think/analyze the problem is qat engine, not openssl or nginx. @venkatesh6911 @Yogaraj-Alamenda Please help me to analyze this problem. Thanks
@venkatesh6911 can you reproduce this case? or any new result?
Hi @lastpepole , we see "Process
- worker processes configured in nginx.conf
- Driver config file info (no.of. instances, NumProcess)
- What client are you using and client side error messages (if any) ?
Also we observed in step-3 , you are using the OpenSSL path as "$OPENSSL/lib64" instead of "$OPENSSL/lib" . OpenSSL 1.1.1 do not have "lib64" directory.
@venkatesh6911 sorry, In step-3 for OpenSSL 1.1.1 version, right information as below:
CORE_INCS="$CORE_INCS $OPENSSL/include"
CORE_DEPS="$CORE_DEPS $OPENSSL/include/openssl/ssl.h"
CORE_LIBS="$CORE_LIBS $OPENSSL/lib/libssl.so"
CORE_LIBS="$CORE_LIBS $OPENSSL/lib/libcrypto.so"
- worker processes configured in nginx.conf
user root;
worker_processes 1;
error_log logs/info.log info;
error_log logs/error.log error;
error_log logs/debug.log debug;
#error_log "pipe:rollback logs/error_log interval=1d baknum=7 maxsize=2G";
pid run/nginx.pid;
events {
worker_connections 1024;
}
......
- Driver config file info (no.of. instances, NumProcess)
################################################################
# This file is provided under a dual BSD/GPLv2 license. When using or
# redistributing this file, you may do so under either license.
#
# GPL LICENSE SUMMARY
#
# Copyright(c) 2007-2023 Intel Corporation. All rights reserved.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of version 2 of the GNU General Public License as
# published by the Free Software Foundation.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
# The full GNU General Public License is included in this distribution
# in the file called LICENSE.GPL.
#
# Contact Information:
# Intel Corporation
#
# BSD LICENSE
#
# Copyright(c) 2007-2023 Intel Corporation. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Intel Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
#
################################################################
[GENERAL]
ServicesEnabled = cy
# Set the service profile to determine available features
# =====================================================================
# DEFAULT CRYPTO COMPRESSION CUSTOM1
# Asymmetric Crypto * * *
# Symmetric Crypto * * *
# Hash * * * *
# Cipher * * *
# MGF KeyGen * *
# SSL/TLS KeyGen * * *
# HKDF * *
# Compression * * *
# Decompression (stateless) * * *
# Decompression (stateful) * *
# Service Chaining *
# Device Utilization * * *
# Rate Limiting * * *
# =====================================================================
ServicesProfile = DEFAULT
ConfigVersion = 2
#Default values for number of concurrent requests*/
CyNumConcurrentSymRequests = 512
CyNumConcurrentAsymRequests = 64
#Statistics, valid values: 1,0
statsGeneral = 1
statsDh = 1
statsDrbg = 1
statsDsa = 1
statsEcc = 1
statsKeyGen = 1
statsDc = 1
statsLn = 1
statsPrime = 1
statsRsa = 1
statsSym = 1
# Specify size of intermediate buffers for which to
# allocate on-chip buffers. Legal values are 32 and
# 64 (default is 64). Specify 32 to optimize for
# compressing buffers <=32KB in size.
DcIntermediateBufferSizeInKB = 64
# This flag is to enable device auto reset on heartbeat error
AutoResetOnError = 0
##############################################
# Kernel Instances Section
##############################################
[KERNEL]
NumberCyInstances = 0
NumberDcInstances = 0
##############################################
# User Process Instance Section
##############################################
[SHIM]
NumberCyInstances = 1
NumberDcInstances = 0
NumProcesses = 64
LimitDevAccess = 0
# Crypto - User instance #0
Cy0Name = "UserCY0"
Cy0IsPolled = 1
# List of core affinities
Cy0CoreAffinity = 0
- What client are you using and client side error messages (if any) ? no client, only run tengine.
we are enable to reproduce the pthread deadlock scenario on more than one machine(qat).
@venkatesh6911 My machine environment configuration as below: kernel: 5.10 os: centos7u5 qat driver: 4.25.0 qat engine: 1.5.0 openssl: 1.1.1w tengine: tengine-2.4.0
we are enable to reproduce the pthread deadlock scenario on more than one machine(qat).
can you tell me about your machine environment(kernel version and os version as so on).
Thank you for sharing the requested details. We will update you shortly on the issue reproduction with OpenSSL 1.1.1.
@venkatesh6911 It is a long time since I share the issue. Can you help me to quickly analyze/solve the issue. Thank you very much.
@venkatesh6911 My machine environment configuration as below: kernel: 5.10 os: centos7u5 qat driver: 4.25.0 qat engine: 1.5.0 openssl: 1.1.1w tengine: tengine-2.4.0
we are enable to reproduce the pthread deadlock scenario on more than one machine(qat).
can you tell me about your machine environment(kernel version and os version as so on).
OS: RHEL 8.8 Kernel: 4.18.0
@venkatesh6911 Can you reproduce this case? or any new result?
Hi @lastpepole , the pthread deadlock scenario is not occuring on my setup. Below is my setup details: OS: Ubuntu 20.04 LTS Kernel: 5.4.0-26-generic QAT Engine: v1.5.0 QAT driver: QAT.L.4.23.0-00001 QAT device: DH895XCC Series tengine: 2.4.0
[root@cd-qat-36 tengine-insatall]# pstack 188421
#0 0x00007fba1b3b9e4e in sigsuspend () from /lib64/libc.so.6
#1 0x0000000000441d97 in ngx_master_process_cycle (cycle=cycle@entry=0x25bdac0) at src/os/unix/ngx_process_cycle.c:177
#2 0x0000000000413740 in main (argc=3, argv=
Can you confirm that you are using the tag "2.4.0" for tengine ? I do not see "tengine-2.4.0" tag in the repo. I will try to replicate your setup (OS, kernel) and update you.
hi @venkatesh6911 my tengine is 2.4.0. you can download from the url as below. url: : https://github.com/alibaba/tengine/archive/refs/tags/2.4.0.tar.gz
my machine environment as below: kernel: 5.10 os: centos7u5 qat driver: 4.25.0 qat engine: 1.5.0 openssl: 1.1.1w tengine: tengine-2.4.0
hi, @venkatesh6911 Have any result? Thank you very much.
@venkatesh6911 Can you reproduce this case?
@lastpepole We could not reproduce this with our existing setup. I guess the issue might be linked to the kernel version which you are using. I will try with the updated kernel and will let you know at the earliest.
Apologies for the delay.
@venkatesh6911 Looking forward to your reply. Thanks
I tried with the updated version of kernel (5.14.0) and still deadlock stack trace is not seen.
Please help me with the following :
- Update the kernel version to 5.14 or later and see if it is getting reproduced.
- Use the QAT Engine release v1.6.0 with your existing setup and see if it is getting reproduced.