Questions about performance testing
Hello, I encountered some issues while conducting performance testing. I reviewed previous issues, but they did not resolve my problem. Could you please help me with a detailed explanation? I would greatly appreciate it.
I have implemented a simple HTTP proxy using Nginx (OpenResty and Nginx-Rust) and Pingora. Below is the code I have implemented based on the example [modify_response]:
pub struct Json2Yaml {
addr: std::net::SocketAddr,
}
impl ProxyHttp for Json2Yaml {
type CTX = MyCtx;
fn new_ctx(&self) -> Self::CTX {
MyCtx { buffer: vec![] }
}
async fn upstream_peer(
&self,
_session: &mut Session,
_ctx: &mut Self::CTX,
) -> Result<Box<HttpPeer>> {
let peer = Box::new(HttpPeer::new(self.addr, false, "".to_string()));
Ok(peer)
}
}
fn main() {
env_logger::init();
let opt = Opt::parse();
let mut my_server = Server::new(Some(opt)).unwrap();
my_server.bootstrap();
let mut my_proxy = pingora_proxy::http_proxy_service(
&my_server.configuration,
Json2Yaml {
// hardcode the IP of ip.jsontest.com for now
addr: ("172.24.1.1", 80)
.to_socket_addrs()
.unwrap()
.next()
.unwrap(),
},
);
my_proxy.add_tcp("0.0.0.0:6191");
my_server.add_service(my_proxy);
my_server.run_forever();
}
config:
---
version: 1
threads: 8
My testing was conducted on an Ubuntu system with 8 cores and 16 GB of MEM. Nginx started 8 worker processes.
1. Using wrk for testing:
wrk -t10 -c1000 -d30s http://172.24.1.2:6191
The result of Nginx:
Thread Stats Avg Stdev Max +/- Stdev
Latency 206.88ms 307.06ms 1.88s 81.37%
Req/Sec 3.02k 1.07k 9.78k 74.21%
903397 requests in 30.10s, 4.27GB read
Socket errors: connect 0, read 0, write 0, timeout 748
Requests/sec: 30014.11
Transfer/sec: 145.21MB
The total CPU usage is around 50%, and the memory usage of each worker can be ignored.
The result of Pingora:
Thread Stats Avg Stdev Max +/- Stdev
Latency 180.33ms 288.71ms 1.81s 83.00%
Req/Sec 2.99k 0.87k 5.78k 67.27%
893573 requests in 30.02s, 4.22GB read
Socket errors: connect 0, read 0, write 0, timeout 795
Requests/sec: 29766.67
Transfer/sec: 144.01MB
The total CPU usage is around 70%, and the memory usage increases by 0.3% after each test (0->0.9->1.2).
-
Q1: In terms of throughput, Nginx performs slightly better than Pingora, while Pingora shows slightly lower latency compared to Nginx. (Isn't that a bit strange?) Overall, the overall conclusion is that the differences between the two are not significant. Does this align with your expectations?
-
Q2: In terms of CPU usage, the overhead of Pingora is significantly greater than that of Nginx. Is this in line with your expectations? Regarding memory, I’ve noticed that memory usage increases after each test and does not recover. Could this indicate a memory leak?
2. Using ab for testing:
ab -n 10000 -c 100 http://172.24.1.2:6191/
When I perform testing with ab, Pingora times out:
Benchmarking 172.24.19.185 (be patient)
apr_pollset_poll: The timeout specified has expired (70007)
The packet capture analysis is as follows:
It can be seen that a GET request was sent at the beginning, but Pingora did not return a response.
Nginx can be tested normally using the same command, and the packet capture shows that it responded properly.
ab is using HTTP/1.0, but after verification, this is not the cause of the problem.
Additionally, I also used Siege for testing, and the results were similar to those obtained with wrk.
3. Summary
Pingora is a remarkable project, and I’m very interested in its potential improvements over Nginx. However, I would like to know:
-
Am I missing any configurations, or how can I improve it to enhance performance and reduce CPU and memory usage?
-
Is it unfair to compare Pingora with Nginx in this simple scenario? In other words, is Pingora's advantage more apparent in more complex scenarios? (If so, I will use Pingora in more complex scenarios.)
I really appreciate your support.
maybe, can you try to increase upstream_keepalive_pool_size to 1000? , and set tcp_keepalive in peer options?
maybe, can you try to increase
upstream_keepalive_pool_sizeto1000? , and settcp_keepalivein peer options?
Thank you for your response! I seem to have discovered some issues:
Initially, my server was configured for short connections (with keepalive_timeout set to 0), and under those conditions, Pingora did not perform well. Later, I tested the server with long connections, and Pingora demonstrated its advantages. I also tested the configuration changes as you suggested. The detailed results are as follows:
Nginx test results are as follows:
Thread Stats Avg Stdev Max +/- Stdev
Latency 260.76ms 434.25ms 7.20s 84.93%
Req/Sec 3.07k 1.20k 7.16k 73.84%
909551 requests in 30.02s, 4.30GB read
Requests/sec: 30296.15
Transfer/sec: 146.75MB
cpu: 49%
The previous Pingora test results are as follows:
Thread Stats Avg Stdev Max +/- Stdev
Latency 98.75ms 190.03ms 3.43s 90.47%
Req/Sec 4.95k 1.34k 11.83k 74.45%
1475976 requests in 30.03s, 6.97GB read
Requests/sec: 49156.43
Transfer/sec: 237.83MB
cpu: 80%, In each test, the memory still increases irreversibly.
The improved Pingora test results are as follows:
Thread Stats Avg Stdev Max +/- Stdev
Latency 72.02ms 126.64ms 3.20s 88.49%
Req/Sec 5.15k 1.39k 11.51k 73.82%
1534099 requests in 30.10s, 7.25GB read
Requests/sec: 50968.27
Transfer/sec: 246.61MB
In summary, thanks for the response; it has resolved some of my issues. However, the memory increase and other problems still persist. I will continue to monitor this.
Hey! I've been trying to debug ever-increasing memory utilization in our Pingora proxy service (HTTP proxy with TLS and h2), which is described in this issue, and similarly in this: https://github.com/cloudflare/pingora/issues/447, which indicates other Pingora users have similar problems.
I can easily reproduce the issue with k6 load tests, and we can see that at the start of the test memory utilization increases quickly. Then, hours after the test, the memory utilization remains high. It keeps growing indefinitely until the service goes OOM, or until we restart it. In the below image you can see the load test run for 5 minutes at ̃20:00. This is on an AWS ECS Fargate service, with 0.5 vCPU and 1GB memory.
First I tried to see if we had written any memory leaks in our code, but if we do, I haven't been able to find it. I've tried using valgrind memcheck with leak detection, as well as valgrind massif for heap profiling.
Then I tried to figure out if there was some connection pool in Pingora that was ever-growing. The service is behind an AWS network load balancer, and we can see in its metrics that the downstream connections are not held open, so I don't believe that is the cause. I tried to disable the upstream connection pool as instructed here: https://github.com/cloudflare/pingora/blob/main/docs/user_guide/pooling.md, but the default size for that pool is 128, so it doesn't make sense that it would be ever-growing and enough to drive the service OOM. And after disabling it and re-running the test, it did not resolve the issue of ever-growing memory.
To summarize, I realize this is most likely an error on our end, since I know you run Pingora in production yourselves, and I assume you don't have this problem. However, perhaps you have seen this behavior before? Do you have any recommendations for what config I might tweak to resolve it? Any advice is highly appreciated, but I fully understand if you don't have time to help me with this. I'll tag you for visibility @drcaramelsyrup, apologies in advance!
If you have time to take a look, here is our setup code:
pub fn start() {
std::thread::spawn(|| {
// don't drop the rt
let rt = tokio::runtime::Runtime::new().unwrap();
rt.block_on(async move {
setup_tracing(tracing_subscriber::registry());
info!("started tracing subscriber with otel exporter in tokio rt");
// keep this runtime running, so that the otel exporter keeps running
std::future::pending::<()>().await;
});
});
let args = Args::parse();
info!(args = ?args, "Starting proxy");
let mut server = Server::new(None).unwrap();
server.bootstrap();
// Attach the Prometheus service
let mut prometheus_service = Service::prometheus_http_service();
let prometheus_address = format!("0.0.0.0:{}", args.metrics_port);
info!("Serving prometheus metrics on address {prometheus_address}");
prometheus_service.add_tcp(&prometheus_address);
server.add_service(prometheus_service);
// XXX: is it fine to just have a runtime like that ?
// It might mess up with the autoreload feature of Pingora, but I don't think we're going
// to use that.
let rt = tokio::runtime::Runtime::new().unwrap();
let aws_config = rt.block_on(
aws_config::defaults(BehaviorVersion::latest())
.timeout_config(
TimeoutConfig::builder()
// Increase the connection timeout
// See https://github.com/awslabs/aws-sdk-rust/issues/871#issuecomment-1690842996
.connect_timeout(Duration::from_secs(10))
.build(),
)
.load(),
);
let conn_opts = PgConnectOptions::new()
.host(&args.db.host)
.port(args.db.port)
.username(&args.db.user)
.password(&args.db.password)
.database(&args.db.name);
let pool = rt
.block_on(sqlx::Pool::connect_with(conn_opts))
.expect("connect to postgres");
let db = Database::new(pool);
// HTTPS server
let tls_resolver = Box::new(TlsResolver::new(
db.clone(),
args.wildcard_fqdn,
Arc::new(ChainAndPrivateKey::new(
args.cert.wildcard_cert_full_chain,
args.cert.wildcard_cert_private_key,
)),
Duration::from_secs(args.certificates_ttl_seconds),
));
let host_resolver = CachingHostResolver::new(
CloudmapHostResolver::new(ServiceDiscoveryClient::new(&aws_config), db),
Duration::from_secs(args.resolver_ttl_seconds),
);
let mut proxy = pingora::prelude::http_proxy_service(
&server.configuration,
EcsProxy::new(args.user_app_port, host_resolver, args.max_rps),
);
let proxy_address = format!("0.0.0.0:{}", args.proxy_port);
info!("Running proxy with TLS on address {proxy_address}");
let mut tls_settings = TlsSettings::with_callbacks(tls_resolver).unwrap();
tls_settings.enable_h2();
proxy.add_tls_with_settings(&proxy_address, None, tls_settings);
server.add_service(proxy);
server.run_forever();
}
Try using tikv-jemallocator. It helped me reduce memory usage growth in cases involving a large number of new upstream connections. I think this improvement is related to reduced memory fragmentation.
@SsuiyueL @oddgrd @ermakov-oleg @github2023spring
I keep doing different benchmarks and it seems that the best for Pingora is when it does not use all cores "0 threads“ config setup. Example if the server has 8 cores it is best to set ”6’ threads, the req/s results are much higher. It looks like it needs pingora some free resources for some other background task.
Server with 8 cores amd64, AlmaLinux 9, disk cache Benchmarks results: All cores 8 "0 threads" config - Success 1.283.569 req in 60s, Bandwidth Received 30.91 GB 6 cores "6 threads" config - Success 1.830.414 req in 60s, Bandwidth Received 44.19 GB
I use https://loader.io/ to test, and webserver Pingap based on Pingora which creates @vicanso
Hi @SsuiyueL, how did you added tcp_keepalive option. I am using below code to set tcp_keepalive, but when i request to server, it never respond back. Working fine with session.set_keepalive(Nonw);
use async_trait::async_trait;
use pingora::{
prelude::{HttpPeer, Opt},
server::Server,
Error,
};
use pingora_http::ResponseHeader;
use pingora_proxy::{ProxyHttp, Session};
pub struct MyProxy {}
pub struct MyCtx {}
#[async_trait]
impl ProxyHttp for MyProxy {
type CTX = MyCtx;
fn new_ctx(&self) -> Self::CTX {
MyCtx {}
}
async fn request_filter(
&self,
session: &mut Session,
_ctx: &mut Self::CTX,
) -> Result<bool, Box<Error>> {
session.set_keepalive(Some(75));
let header = ResponseHeader::build(200, None).unwrap();
session
.write_response_header(Box::new(header), true)
.await?;
session
.write_response_body(Some(bytes::Bytes::from_static(b"Hello Dakia!")), true)
.await?;
Ok(true)
}
async fn upstream_peer(
&self,
_session: &mut Session,
_ctx: &mut Self::CTX,
) -> Result<Box<HttpPeer>, Box<Error>> {
let addr = ("127.0.0.1", 3000);
let peer = Box::new(HttpPeer::new(addr, false, "one.one.one.one".to_string()));
Ok(peer)
}
}
fn main() {
let mut opt = Opt::parse_args();
opt.conf = Some(
"./pingora.conf.yaml".to_string(),
);
// Read command line arguments
let mut my_server = Server::new(opt).unwrap();
my_server.bootstrap();
let mut my_proxy: pingora::services::listening::Service<pingora_proxy::HttpProxy<MyProxy>> =
pingora_proxy::http_proxy_service(&my_server.configuration, MyProxy {});
my_proxy.add_tcp("0.0.0.0:8080");
my_server.add_service(my_proxy);
println!("Started Server on port 8080");
my_server.run_forever();
}
Config
threads: 8
daemon: false
upstream_keepalive_pool_size: 1000