serverless-benchmarks Benchmark synthesis

We need the following:

[ ] Python
- [ ] computation in flosp/instructions
- [ ] memory allocation
- [ ] storage read/write
- [ ] disk read/write
[ ] NodeJS
- [ ] computation in flosp/instructions
- [ ] memory allocation
- [ ] storage read/write
- [ ] disk read/write

Jan 21 '20 20:01 mcopik

We made progress on this issue on branch meta-benchmarks and in PR #59. However, there is still work to be done - any input and help towards synthesizing benchmarks are welcome!

Feb 16 '23 21:02 mcopik

I will research on this and update here shortly

Mar 15 '23 11:03 veenaamb

i am working on it.

Apr 10 '23 13:04 AtulRajput01

@mcopik Can i get some guidance to this issue?

Mar 12 '24 15:03 octonawish-akcodes

@octonawish-akcodes Hi! The overall idea is to synthetically create Python/JS functions that perform CPU computations, memory accesses, and I/O accesses. Given a simple configuration, it should generate a function that performs selected actions with a specified frequency and intensity, e.g., calling some well-established CPU benchmark (like matrix-matrix multiplication), using our interface to make storage calls, etc.

The next step will be to make these functions more varied, e.g., with different loop complexity.

Mar 12 '24 15:03 mcopik

Can you also provide me with some resources and target files for the start?

Mar 12 '24 15:03 octonawish-akcodes

@octonawish-akcodes I'd look in what can be reused from the prior PR: https://github.com/spcl/serverless-benchmarks/pull/59/files

I wouldn't try to merge new updates into it as it's quite difficult. Instead, I'd cherry-pick some of the files you find useful.

Mar 12 '24 17:03 mcopik

Hello @mcopik,

Thank you for outlining the specific benchmarks you're interested in: computation in FLOPS/instructions, memory allocation, storage read/write, and disk read/write. I've reviewed our current benchmark suite, and here is what I found:

Computation: Our workload/python/function.py benchmark measures computational performance by executing arithmetic operations on numpy arrays.
Memory Allocation: The memory/python/function.py benchmark is designed to evaluate memory allocation performance by timing numpy array allocations.
Storage Read/Write: The storage/python/function.py benchmark assesses storage operation speeds, focusing on read/write performance.
Disk Read/Write: While we don't have a direct benchmark for disk I/O, the disc/python/function.py script performs read/write operations with numpy arrays to and from disk, which might be useful for your disk I/O performance analysis.

Could you please provide more details on the specific improvements or additional metrics you're looking to incorporate? Currently I have a few ideas but I would be appreciate if you have any additional sources I can look into

Mar 17 '24 00:03 MinhThieu145

@MinhThieu145 I think the best way forward would be to add a generator that accepts a simple config - CPU ops, memory ops, storage ops - and synthesizes a single Python function out of the components you just described. Do you think it's feasible?

I'd like to hear about other ideas you might have here :)

Mar 19 '24 17:03 mcopik

@mcopik So did you mean creating simple functions that does the following operations you're providing, cant we reuse the functions proposed in the PR #59

Mar 20 '24 02:03 octonawish-akcodes

@octonawish-akcodes Yes, please feel free to reuse the code snippets.

@octonawish-akcodes @MinhThieu145 Since you are both interested in the issue, it might be beneficial to coordinate.

Mar 20 '24 10:03 mcopik

Hi @mcopik ,

I'm leaning towards writing functions that are similar to the current, pre-written one, rather than creating a dynamic generator. From my POV,

Current functions are reliable and give consistent results, which is crucial for benchmarks.
Easier to implemented, since we already have similar functions
Then, we can add customizability for the functions by adding parameters to these functions, we can easily adjust their behavior, like changing the number of loops or the amount of data they handle, without rewriting them from scratch.

With the pre-written functions, here are sth that can be dynamic that I think would be helpful

Loop Control: Introduce parameters to adjust the number of loops in a function, helping us test different levels of computational intensity.
Data Size Adjustment: Add parameters to change the size or type of data the functions work with, allowing us to test memory usage more effectively.
I/O Intensity: Implement parameters to vary the intensity of input/output operations, giving us a better view of storage and disk performance.
Combination Operations: Develop functions that can perform a mix of CPU, memory, and I/O operations, mirroring real-world application scenarios.

This way, it can make the pre-written more dynamic. Looking forward to your feedback and any further ideas.

Mar 20 '24 14:03 MinhThieu145

Hi @octonawish-akcodes,

I totally agree with the idea of using the functions we already have. Right now, I'm trying out some different kinds of functions for the new serverless-benchmark issue mentioned here: SEBS New Serverless Benchmarks. But really, the main idea is the same as before.

I'm all for making the most of what we've got and seeing how we can adapt those functions to fit our new needs. Let's keep in touch about how the testing goes!

Mar 20 '24 14:03 MinhThieu145

@MinhThieu145 Yes, we should reuse those functions. What I meant by the generator is that we should glue together the functions that already exist in the PR, and synthesize functions that combine different behaviors, e.g., a function that does compute, then some I/O accesses, etc.

It should be reproducible - if user specifies the same config, they should receive exactly the same function and observe the same behavior :)

Mar 20 '24 15:03 mcopik

Thank you for your input, @mcopik. I've been exploring how functions work together and found a really helpful paper, ServerlessBench: ServerlessBench Paper

This paper dives deep into how serverless functions interact, which is just what we need for our project. Based on this and our existing setup, here's what I'm thinking:

Improving Our Benchmarks

We have four experiments in our toolkit right now: Current Experiments But they don't fully cover how functions flow and work with each other. The ServerlessBench paper suggests focusing on areas like:

Communication Performance: This is about how well functions talk to each other and to other services, which is key for complex applications.
Startup Latency: Since serverless functions start on-demand, it's important to know how quickly they get going, especially when many functions start at once.
Stateless Execution: This looks at how the lack of saved state affects data sharing and performance.
Resource Efficiency and Isolation: It's crucial to use resources wisely and ensure that different functions or workloads don't interfere with each other.

These areas could really enhance how we measure and understand our benchmarks.

Bringing in Function Flows

ServerlessBench outlines two ways to orchestrate functions:

Nested Function Chain: This is similar to what I've done with AWS Step Functions, where one function's output directly influences the next.
Sequence Function Chain: This could add a fresh perspective, allowing functions to operate in order, but without depending directly on each other.

Ideas for Our Benchmarks

Thumbnail and Compression Workflow: We could start with creating a thumbnail (using the Thumbnailer benchmark) and then compress it (using the Compression benchmark). This mirrors a common process in handling media files.
Dynamic HTML and Uploader Workflow: First, generate HTML content using the 110.dynamic-html benchmark, and then upload it using the 120.uploader. This simulates creating and storing web content.

Thinking About AWS Tools

AWS Step Functions: It's a powerful tool for managing function flows but adds complexity. It's worth a deeper look to see how it might fit into our benchmarks.
ECR and Docker Containers: Using ECR could help with large benchmarks like 411.image-recognition. We need to balance this with the need to manage containers in ECR carefully to avoid extra costs. Maybe using AWS CDK could help automate this, setting up and removing resources as needed.

I'm actively developing these concepts and would greatly value your insights, particularly regarding the use of Step Functions and ECR. If you have any additional resources or suggestions, please feel free to share. I’m eager to hear your perspective and incorporate your feedback into our ongoing work

Mar 21 '24 20:03 MinhThieu145

@mcopik I raised a PR #194 here, have a look

Mar 22 '24 03:03 octonawish-akcodes

@MinhThieu145 Thanks - yes, I know the paper, and it complements our Middleware paper in some aspects.

We already have communication performance benchmarks (unmerged branch using FMI benchmarks), and the invocation-overhead benchmark covers startup latency. Regarding the stateless execution and resource efficiency, I'm happy to hear proposals in this aspect.

Workflows - we have a branch with results from a paper in submission, and I hope we will be able to merge it soon :) It supports Step Functions, Durable Functions, and Google Cloud Workflows. I don't think we have a workflow covering typical website use cases, but adding something like this could be a good idea; there are also similar ideas for website-based workflows in #140.

ECR and containers - this is a feature we definitely need, but we should also support it on other platforms where possible (Azure also supports this).

Mar 22 '24 11:03 mcopik

@mcopik So to have a small recap we need the following type of computation:

CPU Computation - Functions which make heavily use of the CPU. Ex: MMM with specified sizes
GPU Computation - We could use ML Training or Torch Tensor Multiplication on the GPU. The config file could specify the model which has to be used or the size of the tensors to be multiplied
Memory Allocation - The memory/python/function.py benchmarks it by allocating numpy arrays
Disk Read/Write - We could dynamically generate some random text and write it on disk and then read it again and mesure the speed

Since most of these are already implemented we could add support for the config which lets you select how many loops you want for the MMM for example or more fine grained control. Any suggestions?

As for your suggestion to have a single config file, when you say CPU ops you mean the number of FLOP we make, memory ops the amount of data we store and use on RAM and storage ops the number of bytes we read and write from disk? For example if we have a specific config file we need a generated python script which has 3 functions calls inside, one for CPU ops, one for memory ops and one for storage ops? Example: fun1(input1) //CPU Intensive fun2(input2) //Memory Intensive fun3(input3) //Disk Intensive

Mar 26 '24 10:03 entiolliko

@entiolliko @octonawish-akcodes @MinhThieu145 Linear algebra as a replacement for CPU is a good idea; we can use LAPACK for that. It can be quite flexible. I'd put GPU as the next feature, which is a different category.

Yes, this is what I mean by it. I think the ideal result would be to have a single serverless function that does specified configuration for computations (which can, of course, be composed of many local functions).

Mar 27 '24 17:03 mcopik