[BUG] - parallel_bulk does not work in AWS lambda
OSError: [Errno 38] Function not implemented. I started seeing this error after upgrading to python3.9. The reason is opensearch bulk function is using multiprocessing module internally and python multiprocessing.pool.ThreadPool is breaking.
OSError: [Errno 38] Function not implemented
sl = self._semlock = _multiprocessing.SemLock(
SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)
File "/var/lang/lib/python3.9/multiprocessing/synchronize.py", line 57, in __init__
SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)
File "/var/lang/lib/python3.9/multiprocessing/synchronize.py", line 162, in __init__
return Lock(ctx=self.get_context())
--
It looks like:
-
synchronize.Lock doesn't work in lambda for any version of Python (lambda has no /dev/shm, and no write access to /dev in lambda - see: https://aws.amazon.com/blogs/compute/parallel-processing-in-python-with-aws-lambda )
-
ThreadPool is now using synchronize.Lock from version 3.9
To Reproduce
Steps to reproduce the behavior:
- Deploy an application using
opensearch-py==1.0.0to aws lambda - Invoke bulk function of opensearch
- See error
Expected behavior The opensearch client should work as it was working fine with python3.6
Plugins
opensearch-py==1.0.0
Screenshots
Error screenshots

Host/Environment (please complete the following information):
- OS: Aws lambda
Additional context Add any other context about the problem here.
I'm also seeing this error with Python 3.8
[ERROR] OSError: [Errno 38] Function not implemented
Traceback (most recent call last):
....
File "/var/task/opensearchpy/helpers/actions.py", line 469, in parallel_bulk
pool = BlockingPool(thread_count)
File "/var/lang/lib/python3.8/multiprocessing/pool.py", line 925, in __init__
Pool.__init__(self, processes, initializer, initargs)
File "/var/lang/lib/python3.8/multiprocessing/pool.py", line 196, in __init__
self._change_notifier = self._ctx.SimpleQueue()
File "/var/lang/lib/python3.8/multiprocessing/context.py", line 113, in SimpleQueue
return SimpleQueue(ctx=self.get_context())
File "/var/lang/lib/python3.8/multiprocessing/queues.py", line 336, in __init__
self._rlock = ctx.Lock()
File "/var/lang/lib/python3.8/multiprocessing/context.py", line 68, in Lock
return Lock(ctx=self.get_context())
File "/var/lang/lib/python3.8/multiprocessing/synchronize.py", line 162, in __init__
SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)
File "/var/lang/lib/python3.8/multiprocessing/synchronize.py", line 57, in __init__
sl = self._semlock = _multiprocessing.SemLock(
@jasongilman Did you get this error in a lambda or elsewhere?
@wbeckler It was in a lambda.
@jasongilman Yes it was in aws lambda.
Is anyone up for contributing a patch that addresses this issue when /dev/shm isn't available? There's a potential drop in replacement for the multiprocessing library: https://pypi.org/project/lambda-multiprocessing/
At a high level, is this issue about adding Python 3.9 support (starting with CI)?
@Aarif1430 @jasongilman Is the bug still persisting?
CI with Python 3.9 was added in https://github.com/opensearch-project/opensearch-py/pull/336 and it currently passes. We need a test that reproduces this problem.
I'm able the reproduce the issue:
Create lambda with python3.9:
import json
from multiprocessing.pool import ThreadPool
def lambda_handler(event, context):
print("Hello")
pool = ThreadPool()
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
}
Give error
{
"errorMessage": "[Errno 38] Function not implemented",
"errorType": "OSError",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 6, in lambda_handler\n pool = ThreadPool()\n",
" File \"/var/lang/lib/python3.9/multiprocessing/pool.py\", line 927, in __init__\n Pool.__init__(self, processes, initializer, initargs)\n",
" File \"/var/lang/lib/python3.9/multiprocessing/pool.py\", line 196, in __init__\n self._change_notifier = self._ctx.SimpleQueue()\n",
" File \"/var/lang/lib/python3.9/multiprocessing/context.py\", line 113, in SimpleQueue\n return SimpleQueue(ctx=self.get_context())\n",
" File \"/var/lang/lib/python3.9/multiprocessing/queues.py\", line 341, in __init__\n self._rlock = ctx.Lock()\n",
" File \"/var/lang/lib/python3.9/multiprocessing/context.py\", line 68, in Lock\n return Lock(ctx=self.get_context())\n",
" File \"/var/lang/lib/python3.9/multiprocessing/synchronize.py\", line 162, in __init__\n SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)\n",
" File \"/var/lang/lib/python3.9/multiprocessing/synchronize.py\", line 57, in __init__\n sl = self._semlock = _multiprocessing.SemLock(\n"
]
}
Looking at https://pypi.org/project/lambda-thread-pool/
You cannot use "multiprocessing.Queue" or "multiprocessing.Pool" within a Python Lambda environment because the Python Lambda execution environment does not support shared memory for processes.
This means we need to get rid of or be able to swap ThreadPool with LambdaThreadPool in https://github.com/opensearch-project/opensearch-py/blob/da436cbbe8dda34abd607f527d4f0bdacb9b30d8/opensearchpy/helpers/actions.py#L470.
For an immediate workaround you can copy-paste the parallel_bulk implementation and replace BlockingPool with LambdaThreadPool and see if that works. For something maintainable, I would extract BlockingPool from this implementation by adding an abstract thread pool interface, implement another one for LambdaThreadPool and add a configuration parameter to specify which thread pool to use. Anyone wants to give either a try?
I renamed this to "parallel_bulk doesn't work in AWS lambda", is there anything else that doesn't?
Thank you, in my case the ThreadPool is used by some sdk and it wouldn't be ideal to change. We started getting the issue when upgrading from python3.7 to 3.9. We might just find an alternative solution instead of using the sdk.