Object Storage/S3 users, how is your I/O performance?

Fullmental

Active member
We're currently in the process of migrating away from VPS attached storage due to our storage needs exceeding the needs of the rest of our resources by a price factor of 6-8.

We are trying to figure out what expectations are reasonable for I/O performance on S3 storage, vs block storage. Our host's block storage is $0.10/GB/month, but S3 storage through Backblaze B2 is only $0.005/GB/Month, literally 95% cheaper. As we are entirely funded by donations from our members, we'd like to avoid an extra cost where possible, but we did note during a test of S3 storage that attachment fetching in particular seemed rather slow. We also aren't 100% sure what bandwidth and transaction charges will be for normal operation, but we don't expect it to be close to block storage costs.

Curious what folks here have done given the same situation to improve speed, or if they just deal with the slower IO? We're looking into a way to get the content served via CDN, but it does not seem straightforward with Cloudflare, our current CDN provider and of course all the guide's we've found seem to use outdated configurations based on an older version of the cloudflare dashboard...

Any advice or anecdotes are welcome.

Thank you.
 
Have you considered a dedicated server (as opposed to a VM) where bulk storage (even SSD) is often far cheaper than it is on VMs?
 
Have you considered a dedicated server (as opposed to a VM) where bulk storage (even SSD) is often far cheaper than it is on VMs?

Let me put it to you this way, our entire VPS server costs without storage factored in are $25 a month, and that includes managed backups and simple 1-click node scaling and instant cloning for test purposes, and a bunch of other QoL options. If we were to even consider dedicated servers, that quickly balloons to at least $70 a month for an entirely self-managed instance with similar CPU/RAM and a 1TB hard drive. We don't have any full time staff to manage the additional tasks and our site is entirely donation funded, so it would either put an extra time/financial strain on volunteers or we'd have to roughly double that for managed hosting. It quickly becomes a nonviable option.
 
I see your predicament then. I can't offer too much advice on S3 provider comparison unfortunately.
 
This is what i did:
 
For Cloudflare, check out @digitalpoint's addon. The CDN part itself is pretty straightforward. However, if you have attachment permissions or care about hit metrics, you simply can't use a CDN.
 
We are trying to figure out what expectations are reasonable for I/O performance on S3 storage, vs block storage. Our host's block storage is $0.10/GB/month, but S3 storage through Backblaze B2 is only $0.005/GB/Month, literally 95% cheaper. As we are entirely funded by donations from our members, we'd like to avoid an extra cost where possible, but we did note during a test of S3 storage that attachment fetching in particular seemed rather slow. We also aren't 100% sure what bandwidth and transaction charges will be for normal operation, but we don't expect it to be close to block storage costs.
One problem with S3 is you get charged for more than just storage. It's storage + requests + data transfer (bandwidth), and then on top of it, all that varies depending on the region you are in.

If you want to go the route of cloud-based storage, I'd definitely take a look at Cloudflare's R2 service. There are no data transfer costs, no variable pricing based on region, no minimum billed amount and they give you a pretty generous amount that is purely free. Specifically, you can use it for free for up to 10GB. Beyond 10GB, it's $0.015 per GB per month (again, no other costs like bandwidth usage to worry about like you have to deal with using S3).

See this post where someone just posted a screenshot the other day where their total cost of 121GB worth of data on R2 cost them $1.68 per month:

Also, to answer your original question about the CDN, R2 integrates nicely into Cloudflare's CDN system for the public facing stuff (things like avatars and attachment thumbnails), so that's taken care of too as a bonus.
 
Yes, multiple folks have suggested the cloudflare addon! We didn't even know R2 was a thing, we're already using cloudflare but we set it up many years ago before that existed. We're looking into it now. Thank you very much for all the suggestions!
 
If you want to go the route of cloud-based storage, I'd definitely take a look at Cloudflare's R2 service. There are no data transfer costs, no variable pricing based on region, no minimum billed amount and they give you a pretty generous amount that is purely free. Specifically, you can use it for free for up to 10GB. Beyond 10GB, it's $0.015 per GB per month (again, no other costs like bandwidth usage to worry about like you have to deal with using S3).

Yup, using Cloudflare R2 storage more these days myself :D Updated all my backup scripts to support R2 including my AWS CLI configuration tool https://awscli-get.centminmod.com/ :)

Curious what folks here have done given the same situation to improve speed, or if they just deal with the slower IO?

Local caching to speed up S3 I/O. I use Cloudflare R2 with JuiceFS local mounting with sqlite3 local metadata caching though you can use other local databases to store the metadata https://github.com/centminmod/centminmod-juicefs

JuiceFS implements an architecture that seperates "data" and "metadata" storage. When using JuiceFS to store data, the data itself is persisted in object storage (e.g., Amazon S3, OpenStack Swift, Ceph, Azure Blob or MinIO), and the corresponding metadata can be persisted in various databases (Metadata Engines) such as Redis, Amazon MemoryDB, MariaDB, MySQL, TiKV, etcd, SQLite, KeyDB, PostgreSQL, BadgerDB, or FoundationDB.

JuiceFS Benchmarks with Cloudflare R2 storage locally mounted on server with 2x240GB SSD raid 1 with ~400-450MB/s rated speeds :)

Bash:
juicefs bench -p 4 /home/juicefs_mount/                    
  Write big blocks count: 4096 / 4096 [======================================================]  done  
   Read big blocks count: 4096 / 4096 [======================================================]  done  
Write small blocks count: 400 / 400 [========================================================]  done  
 Read small blocks count: 400 / 400 [========================================================]  done  
  Stat small files count: 400 / 400 [========================================================]  done  
Benchmark finished!
BlockSize: 1 MiB, BigFileSize: 1024 MiB, SmallFileSize: 128 KiB, SmallFileCount: 100, NumThreads: 4
Time used: 29.5 s, CPU: 51.7%, Memory: 1317.1 MiB
+------------------+------------------+---------------+
|       ITEM       |       VALUE      |      COST     |
+------------------+------------------+---------------+
|   Write big file |     253.86 MiB/s |  16.13 s/file |
|    Read big file |     418.69 MiB/s |   9.78 s/file |
| Write small file |    312.3 files/s | 12.81 ms/file |
|  Read small file |   5727.4 files/s |  0.70 ms/file |
|        Stat file |  29605.6 files/s |  0.14 ms/file |
|   FUSE operation | 71271 operations |    1.95 ms/op |
|      Update meta |  1289 operations |   74.78 ms/op |
|       Put object |   204 operations | 1214.46 ms/op |
|       Get object |   143 operations | 1032.30 ms/op |
|    Delete object |     0 operations |    0.00 ms/op |
| Write into cache |  1567 operations | 1808.73 ms/op |
|  Read from cache |  1286 operations |   62.66 ms/op |
+------------------+------------------+---------------+

FIO tests with prewarmed up local metadata caching

Pre-warmed up cache directory fio test
Code:
ls -lah /home/juicefs_mount/fio                             
total 4.1G
drwxr-xr-x 2 root root 4.0K May 26 01:23 .
drwxrwxrwx 3 root root 4.0K May 26 01:15 ..
-rw-r--r-- 1 root root 1.0G May 26 01:16 sequential-read.0.0
-rw-r--r-- 1 root root 1.0G May 26 01:20 sequential-read.1.0
-rw-r--r-- 1 root root 1.0G May 26 01:24 sequential-read.2.0
-rw-r--r-- 1 root root 1.0G May 26 01:23 sequential-read.3.0
warmup
Code:
juicefs warmup -p 2 /home/juicefs_mount/fio                
Warmed up paths count: 1 / 1 [==============================================================]  done  
2022/05/26 01:38:00.362641 juicefs[45285] <INFO>: Successfully warmed up 1 paths [warmup.go:209]

FIO benchmark :)
Bash:
fio --name=sequential-read --directory=/home/juicefs_mount/fio --rw=read --refill_buffers --bs=4M --size=1G --numjobs=4

sequential-read: (g=0): rw=read, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=psync, iodepth=1
...
fio-3.7
Starting 4 processes
Jobs: 4 (f=4)
sequential-read: (groupid=0, jobs=1): err= 0: pid=47804: Thu May 26 01:38:12 2022
   read: IOPS=179, BW=716MiB/s (751MB/s)(1024MiB/1430msec)
    clat (usec): min=1688, max=15592, avg=5571.03, stdev=1390.95
     lat (usec): min=1689, max=15592, avg=5572.39, stdev=1390.89
    clat percentiles (usec):
     |  1.00th=[ 2278],  5.00th=[ 3884], 10.00th=[ 4359], 20.00th=[ 4621],
     | 30.00th=[ 4948], 40.00th=[ 5276], 50.00th=[ 5473], 60.00th=[ 5669],
     | 70.00th=[ 5932], 80.00th=[ 6325], 90.00th=[ 6783], 95.00th=[ 7439],
     | 99.00th=[ 9241], 99.50th=[14615], 99.90th=[15533], 99.95th=[15533],
     | 99.99th=[15533]
   bw (  KiB/s): min=704512, max=720896, per=24.30%, avg=712704.00, stdev=11585.24, samples=2
   iops        : min=  172, max=  176, avg=174.00, stdev= 2.83, samples=2
  lat (msec)   : 2=0.78%, 4=4.69%, 10=93.75%, 20=0.78%
  cpu          : usr=0.14%, sys=46.61%, ctx=2730, majf=0, minf=1055
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=256,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
sequential-read: (groupid=0, jobs=1): err= 0: pid=47805: Thu May 26 01:38:12 2022
   read: IOPS=180, BW=721MiB/s (756MB/s)(1024MiB/1420msec)
    clat (usec): min=2722, max=12203, avg=5530.93, stdev=1193.63
     lat (usec): min=2723, max=12204, avg=5532.24, stdev=1193.64
    clat percentiles (usec):
     |  1.00th=[ 3490],  5.00th=[ 4080], 10.00th=[ 4359], 20.00th=[ 4686],
     | 30.00th=[ 4948], 40.00th=[ 5145], 50.00th=[ 5407], 60.00th=[ 5604],
     | 70.00th=[ 5866], 80.00th=[ 6128], 90.00th=[ 6849], 95.00th=[ 7635],
     | 99.00th=[11994], 99.50th=[12125], 99.90th=[12256], 99.95th=[12256],
     | 99.99th=[12256]
   bw (  KiB/s): min=696320, max=737280, per=24.44%, avg=716800.00, stdev=28963.09, samples=2
   iops        : min=  170, max=  180, avg=175.00, stdev= 7.07, samples=2
  lat (msec)   : 4=3.52%, 10=95.31%, 20=1.17%
  cpu          : usr=0.00%, sys=47.71%, ctx=2751, majf=0, minf=1054
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=256,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
sequential-read: (groupid=0, jobs=1): err= 0: pid=47806: Thu May 26 01:38:12 2022
   read: IOPS=179, BW=716MiB/s (751MB/s)(1024MiB/1430msec)
    clat (usec): min=1880, max=13391, avg=5570.19, stdev=1200.55
     lat (usec): min=1881, max=13393, avg=5571.52, stdev=1200.50
    clat percentiles (usec):
     |  1.00th=[ 2540],  5.00th=[ 4113], 10.00th=[ 4424], 20.00th=[ 4752],
     | 30.00th=[ 5014], 40.00th=[ 5211], 50.00th=[ 5473], 60.00th=[ 5735],
     | 70.00th=[ 5997], 80.00th=[ 6259], 90.00th=[ 6849], 95.00th=[ 7177],
     | 99.00th=[ 8717], 99.50th=[12387], 99.90th=[13435], 99.95th=[13435],
     | 99.99th=[13435]
   bw (  KiB/s): min=688128, max=737280, per=24.30%, avg=712704.00, stdev=34755.71, samples=2
   iops        : min=  168, max=  180, avg=174.00, stdev= 8.49, samples=2
  lat (msec)   : 2=0.39%, 4=3.52%, 10=95.31%, 20=0.78%
  cpu          : usr=0.56%, sys=46.61%, ctx=2806, majf=0, minf=1055
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=256,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
sequential-read: (groupid=0, jobs=1): err= 0: pid=47807: Thu May 26 01:38:12 2022
   read: IOPS=179, BW=719MiB/s (754MB/s)(1024MiB/1425msec)
    clat (usec): min=2478, max=11410, avg=5550.24, stdev=1014.45
     lat (usec): min=2480, max=11411, avg=5551.59, stdev=1014.37
    clat percentiles (usec):
     |  1.00th=[ 3392],  5.00th=[ 4146], 10.00th=[ 4424], 20.00th=[ 4817],
     | 30.00th=[ 5080], 40.00th=[ 5276], 50.00th=[ 5473], 60.00th=[ 5669],
     | 70.00th=[ 5866], 80.00th=[ 6259], 90.00th=[ 6718], 95.00th=[ 7111],
     | 99.00th=[ 8225], 99.50th=[ 9241], 99.90th=[11469], 99.95th=[11469],
     | 99.99th=[11469]
   bw (  KiB/s): min=720896, max=761856, per=25.28%, avg=741376.00, stdev=28963.09, samples=2
   iops        : min=  176, max=  186, avg=181.00, stdev= 7.07, samples=2
  lat (msec)   : 4=4.30%, 10=95.31%, 20=0.39%
  cpu          : usr=0.14%, sys=46.98%, ctx=2771, majf=0, minf=1054
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=256,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=2864MiB/s (3003MB/s), 716MiB/s-721MiB/s (751MB/s-756MB/s), io=4096MiB (4295MB), run=1420-1430msec

JuiceFS has their own benchmarks too at https://juicefs.com/docs/community/benchmark/

1683142959919.png

1683142971783.png

JuiceFS architecture overview https://juicefs.com/docs/community/architecture

1683143213883.webp
 
Last edited:
FYI, setup JuiceFS with Cloudflare R2 s3 object storage on my other server which has 2x 960GB NVMe raid 1.

JuiceFS allows you to shard the R2 buckets for storage for better performance which seems to have helped a bit for big file reads and for 1MB big file writes at least :) Though still relatively slower PUT/GET object latencies from my Dallas server due to R2 locations available. But still adequate for my needs so far :D

The table below shows a comparison between 10x Cloudflare R2 sharded JuiceFS mount vs 5x Cloudflare R2 sharded JuiceFS mount vs 1x Cloudflare JuiceFS mount (default). All R2 storage locations are with location hint North American East as Cloudflare R2 doesn't have a Dallas/Mid-USA location right now.

For 1024MB big file size

ITEMVALUE (10x R2 Sharded)COST (10x R2 Sharded)VALUE (5x R2 Sharded)COST (5x R2 Sharded)VALUE (1x R2 Default)COST (1x R2 Default)
Write big file906.04 MiB/s4.52 s/file960.47 MiB/s4.26 s/file1374.08 MiB/s2.98 s/file
Read big file223.19 MiB/s18.35 s/file174.17 MiB/s23.52 s/file152.23 MiB/s26.91 s/file
Write small file701.2 files/s5.70 ms/file777.4 files/s5.15 ms/file780.3 files/s5.13 ms/file
Read small file6378.3 files/s0.63 ms/file7940.0 files/s0.50 ms/file8000.9 files/s0.50 ms/file
Stat file21123.7 files/s0.19 ms/file29344.7 files/s0.14 ms/file27902.2 files/s0.14 ms/file
FUSE operation71555 operations2.16 ms/op71597 operations2.67 ms/op71649 operations3.06 ms/op
Update meta6271 operations9.01 ms/op6041 operations4.09 ms/op6057 operations2.50 ms/op
Put object1152 operations403.23 ms/op1136 operations428.27 ms/op1106 operations547.32 ms/op
Get object1034 operations278.61 ms/op1049 operations299.50 ms/op1030 operations301.80 ms/op
Delete object316 operations124.32 ms/op60 operations120.73 ms/op29 operations234.02 ms/op
Write into cache1424 operations24.92 ms/op1424 operations83.12 ms/op1424 operations12.91 ms/op
Read from cache400 operations0.05 ms/op400 operations0.05 ms/op400 operations0.04 ms/op

For 1MB big file size

ITEMVALUE (10x R2 Sharded)COST (10x R2 Sharded)VALUE (5x R2 Sharded)COST (5x R2 Sharded)VALUE (1x R2 Default)COST (1x R2 Default)
Write big file452.66 MiB/s0.01 s/file448.20 MiB/s0.01 s/file230.82 MiB/s0.02 s/file
Read big file1545.95 MiB/s0.00 s/file1376.38 MiB/s0.00 s/file1276.38 MiB/s0.00 s/file
Write small file682.8 files/s5.86 ms/file792.5 files/s5.05 ms/file675.7 files/s5.92 ms/file
Read small file6299.4 files/s0.63 ms/file7827.1 files/s0.51 ms/file7833.1 files/s0.51 ms/file
Stat file21365.2 files/s0.19 ms/file24308.1 files/s0.16 ms/file28226.1 files/s0.14 ms/file
FUSE operation5757 operations0.42 ms/op5750 operations0.38 ms/op5756 operations0.41 ms/op
Update meta5814 operations0.72 ms/op5740 operations0.74 ms/op5770 operations0.70 ms/op
Put object107 operations282.68 ms/op94 operations286.35 ms/op118 operations242.35 ms/op
Get object0 operations0.00 ms/op0 operations0.00 ms/op0 operations0.00 ms/op
Delete object133 operations116.84 ms/op59 operations117.93 ms/op95 operations83.94 ms/op
Write into cache404 operations0.12 ms/op404 operations0.12 ms/op404 operations0.14 ms/op
Read from cache408 operations0.06 ms/op408 operations0.05 ms/op408 operations0.06 ms/op
 
Last edited:
Also did some benchmarks with 10x Cloudflare R2 sharded JuiceFS mount but switching from sqlite3 to Redis server for metadata caching https://github.com/centminmod/centm...-10x-r2-sharded-mount--redis-metadata-caching :)

Default 1024MB big file.

ITEMVALUE (10x R2 Sharded + Redis)COST (10x R2 Sharded + Redis)VALUE (10x R2 Sharded)COST (10x R2 Sharded)VALUE (5x R2 Sharded)COST (5x R2 Sharded)VALUE (1x R2 Default)COST (1x R2 Default)
Write big file1904.61 MiB/s2.15 s/file906.04 MiB/s4.52 s/file960.47 MiB/s4.26 s/file1374.08 MiB/s2.98 s/file
Read big file201.00 MiB/s20.38 s/file223.19 MiB/s18.35 s/file174.17 MiB/s23.52 s/file152.23 MiB/s26.91 s/file
Write small file1319.8 files/s3.03 ms/file701.2 files/s5.70 ms/file777.4 files/s5.15 ms/file780.3 files/s5.13 ms/file
Read small file10279.8 files/s0.39 ms/file6378.3 files/s0.63 ms/file7940.0 files/s0.50 ms/file8000.9 files/s0.50 ms/file
Stat file15890.1 files/s0.25 ms/file21123.7 files/s0.19 ms/file29344.7 files/s0.14 ms/file27902.2 files/s0.14 ms/file
FUSE operation71338 operations2.23 ms/op71555 operations2.16 ms/op71597 operations2.67 ms/op71649 operations3.06 ms/op
Update meta1740 operations0.27 ms/op6271 operations9.01 ms/op6041 operations4.09 ms/op6057 operations2.50 ms/op
Put object1083 operations390.88 ms/op1152 operations403.23 ms/op1136 operations428.27 ms/op1106 operations547.32 ms/op
Get object1024 operations294.63 ms/op1034 operations278.61 ms/op1049 operations299.50 ms/op1030 operations301.80 ms/op
Delete object754 operations125.28 ms/op316 operations124.32 ms/op60 operations120.73 ms/op29 operations234.02 ms/op
Write into cache1424 operations4.85 ms/op1424 operations24.92 ms/op1424 operations83.12 ms/op1424 operations12.91 ms/op
Read from cache400 operations0.05 ms/op400 operations0.05 ms/op400 operations0.05 ms/op400 operations0.04 ms/op


Default 1MB big file.

ITEMVALUE (10x R2 Sharded + Redis)COST (10x R2 Sharded + Redis)VALUE (10x R2 Sharded)COST (10x R2 Sharded)VALUE (5x R2 Sharded)COST (5x R2 Sharded)VALUE (1x R2 Default)COST (1x R2 Default)
Write big file530.10 MiB/s0.01 s/file452.66 MiB/s0.01 s/file448.20 MiB/s0.01 s/file230.82 MiB/s0.02 s/file
Read big file1914.40 MiB/s0.00 s/file1545.95 MiB/s0.00 s/file1376.38 MiB/s0.00 s/file1276.38 MiB/s0.00 s/file
Write small file2715.4 files/s1.47 ms/file682.8 files/s5.86 ms/file792.5 files/s5.05 ms/file675.7 files/s5.92 ms/file
Read small file10069.0 files/s0.40 ms/file6299.4 files/s0.63 ms/file7827.1 files/s0.51 ms/file7833.1 files/s0.51 ms/file
Stat file16545.3 files/s0.24 ms/file21365.2 files/s0.19 ms/file24308.1 files/s0.16 ms/file28226.1 files/s0.14 ms/file
FUSE operation5767 operations0.09 ms/op5757 operations0.42 ms/op5750 operations0.38 ms/op5756 operations0.41 ms/op
Update meta1617 operations0.19 ms/op5814 operations0.72 ms/op5740 operations0.74 ms/op5770 operations0.70 ms/op
Put object37 operations290.94 ms/op107 operations282.68 ms/op94 operations286.35 ms/op118 operations242.35 ms/op
Get object0 operations0.00 ms/op0 operations0.00 ms/op0 operations0.00 ms/op0 operations0.00 ms/op
Delete object48 operations103.83 ms/op133 operations116.84 ms/op59 operations117.93 ms/op95 operations83.94 ms/op
Write into cache404 operations0.11 ms/op404 operations0.12 ms/op404 operations0.12 ms/op404 operations0.14 ms/op
Read from cache408 operations0.06 ms/op408 operations0.06 ms/op408 operations0.05 ms/op408 operations0.06
 
Last edited:
  • Like
Reactions: fly
Top Bottom