Playing around with a simple benchmarking workload. Its a highly simplified generic order processing workload. We read a random customer from the customer table then pick between 1 to 10 products for them to order, reading each product from the product table. Finally we create records in the orders and orders_detail tables. The workload is generated by a custom java multi threaded app that I wrote.
For this test we have a small number of customers (1000) and products (800) which means our reads are 100% cached in ram. If they are not in the postgres buffers they will be in the filesystem buffer on this sytem. Disk is almost never accessed except when postgres needs to flush stuff out to disk. We have a quad core with 16 GB ram.
IO is not the limiting factor here. CPU workload is split between the java stuff and the postgres processes.
What we see above is that our max throughput per thread is hit when the number of threads equals the number of processor cores. However the max system throughput is hit when we have one more java thread/database connection than processors.
I repeated the process cycle a second time to see if the results held. Overall a slight increase in throughput across the boards but the peaks at # of processors for max per thread, and # of processors plus one for total throughput still held.
One item of note is the consistent large jump when going from 3 to 4 threads. Its very non linear and as of right now I cannot explain it.