Backblaze Drive Stats for Q1 2021

Backblaze is awesome. Anyone not using it should give it a try.

But I have a question - why do they share this info? Is it to show they’re reliable or just for curiosity? Or some other reason?

5 years ago by atYevP

Yev from Backblaze here! We do it because it's interesting! Initially, since we're pretty transparent, we were hoping that others would join in the fun and share their stats so we'd know how our environment and hard drive stacked up against others - but no one at our scale has really done that yet. We do have a lot of off-the-record confirmation that our experience is somewhat similar to others, which is neat - but we were just trying to be transparent and share something interesting from our infrastructure. A lot of folks jumped on it and found it interesting so we keep it going!

Plus along the way some folks find out about us and sign up for the services we offer (B2 Cloud Storage and Computer Backup) and that's nice too! Plus we also like these conversations and at the end of the day, it's fun!

5 years ago by storrgie

The consistent transparency has guided many of my acquisition decisions personally and professionally. It also drove me to seriously examine B2, which is used personally and professionally as well.

Would like to encourage your organization to keep publishing these works, and the works like your POD. It’s really spurned on a lot of innovation and sharing.

5 years ago by jonplackett

Hey Yev thanks for the info! Yet more transparency from Backblaze!

Thanks for a great product. I've signed up so many people.

5 years ago by jw1224

It's marketing (and effective, too). I became a Backblaze customer after reading one of these reports (many years ago!).

5 years ago by mbotner

Me too!

5 years ago by cortesoft

Me three. I appreciate knowing the way they operate.

Loved the service so far, too. Have had a few times over the years where I needed to recover data from the backup, and it worked perfectly every time.

5 years ago by bayindirh

> why do they share this info? Is it to show they’re reliable or just for curiosity? Or some other reason?

It has many facets. It's attractive for us, nerds. It also helps their providers to spot problematic models. They also show their technical prowess, and lastly I always take a look to the most failing models and try to avoid them in the data center, if I can.

5 years ago by radicality

Is it, have you had good experiences loading the backups? I was on the lookout recently for some offsite backup solution and came across this blog post that didn’t inspire confidence in backblaze.

https://messengergeek.wordpress.com/2018/03/09/backblaze-rev...

5 years ago by treesknees

They've added a separate backup client that is supposed to be better than the web interface.

5 years ago by guerby

What would be interesting for SSD is percentage of advertized TBW when (or just before) the SSD failed, ie 100% if the SSD fails at exactly its advertized TBW, 50% if it fails at half the TBW, and 200% if it lasts two times the advertized TBW.

In case someone from backblaze read this :)

5 years ago by londons_explore

A well written SSD firmware simply slows down with age. It will never get to failure because the slowdown gets so extreme it becomes unusable. The drive also gets slightly smaller (by passing write failures to the OS to mark sectors bad).

Thats because a "worn out" flash sector is never fully worn out - it can still store some data, just less than the error correction can correct. It is possible to combine two sectors to have extra error correction data to still recover a sector. Now you have less than half the performance.

Worn out flash also doesn't hold data long - perhaps only a few hours before too many bits have flipped and it is unreadable. To fix that, you need to rewrite the data, which slows everything down more.

And now that you have a bunch of unreliable sectors, you also need "super-sectors" which can do sector-based hierarchical erasure coding to recover data from sectors where even the methods above have caused data to be lost. This slows down writes even more.

In the worst case, reading a single sector requires reading every sector on the drive to reconstruct. Clearly thats going to be slow enough the drive will have stopped being used long before that.

Sadly some drive firmware doesn't implement some or all of the above, so they appear to have "failed" and become unreadable, which IMO is inexcusable when it's very easy to design so that worn out drives become slow instead.

5 years ago by labawi

While I agree SSD's should either slow down or fail read-only, the rest seems like wishful thinking and/or extreme exaggeration.

Do you have any examples or references that drives that implement your suggested algorithms?

I wouldn't expect drives to have sector-splicing and super-sectors (though multi-level cells regularly store less bits/cell), infinitely degrade their size via write-failures or rewrite data every hour. Especially frequent rewriting would self-destruct the drive if it weren't already preceded by catastrophic data loss.

5 years ago by capableweb

Super interesting. Do you have any examples of drives that have the proper firmware in your experience? Sounds much better to own than something that suddenly fails.

5 years ago by londons_explore

Sadly not. Black-box testing an SSD for these kind of features takes months and hundreds of drives, and manufacturers will never talk about the inner workings of their firmware. Many big SSD users develop their own SSD hardware and firmware partly for this reason.

SSD firmware is also a spectrum between "correct" and "performance", and I know of no SSD's that for example maintain all of the acknowledged data on a power failure. Sure many SSD's may typically do that, but that isn't a guarantee when the power fails in worst-case conditions.

Old Apple SSD's for example have a special extra wire on the connector specifically for "impending power failure" to help them do that. PC SSD's don't even have a standard message to mean "power failure expected in 250 milliseconds".

5 years ago by bl4ckneon

Wow this was a really in depth comment!

5 years ago by wing-_-nuts

One thing I've always wondered, are these drives what people would recommend that you stick in a desktop or nas or are these 'datacenter' drives that are overkill for consumer use?

5 years ago by robotmay

I would definitely recommend skipping any current large WD drives for home use. < 6TB are SMR and problematic in a NAS, and 6TB+ have a very irritating noise that no-one seems to be able to diagnose (sounds like a scanner when idle). I bought some 8TB WD Golds, and I was expecting them to be louder, but they do something very weird when idling and it's an extremely penetrating sound. It seems to be present on most large models: https://community.wd.com/t/strange-noise-coming-from-10tb-dr...

5 years ago by elorant

Backblaze has said in the past that they don't buy enterprise drives because they didn't notice any difference from consumer ones. I don't know if that's still their policy.

5 years ago by dcm360

The large Toshiba drives they're using are enterprise drives. A couple of weeks ago these drives were among the cheapest (€/TB) drives to get here in the Netherlands, so I assume Backblaze just bought them because of their price.

5 years ago by bayindirh

Disclaimer: This is my personal experience from being an HPC sysadmin and old school computer enthusiast.

If there's one trend I've seen from using generations of HDDs, excluding some problematic generations like first SATA Seagate Barracudas and early WD Caviars which died for no reason at all, newer generation HDDs are always more reliable from previous generation, regardless of their class (datacenter / consumer).

For the last 10 years or so (starting with the introduction of first WD Green / Blue / Black series), the HDDs are exceptionally reliable unless you abuse them on purpose (like continuous random read/write benchmarking).

I've replaced two 11 year old WD Blacks w/o any problems this year to upgrade to two IronWolf Pro NAS drives, because I wanted something dense and PMR. At office, I changed an old Seagate Constellation ES.2 (aka Barracuda enterprise) drive since it started to develop bad sectors (which I removed from an old disk storage unit anyway). IIRC, it was around ~10 years old too with a much heavier workload history.

Looks like the most differentiating factors between enterprise and consumer drives are the command sets they support and features they bundle. NAS and other enterprise drives have features to make them more reliable in harsher conditions (heat, vibration, operational knocks induced by hot swapping, etc.).

If you're getting an enterprise disk with a storage unit, you're probably also getting disks with special firmware developed for this brand anyway, so they're not off the shelf enterprise drives.

At the end of the day, for normal operating conditions, device class doesn't matter for the home user, but for density and speed, you might need to get an enterprise drive anyway.

5 years ago by OldTimeCoffee

I can only speak about the Exos drives, but the Ironwolf (Pro) drives are basically just the Exos drives relabeled for the consumer market. I'm actually about to buy 6 14TB Exos x16 drives to replace 7 6TB Exos 7E8 drives that are 5 years old. Frequently you can get the Exos drives cheaper than Ironwolf anyway.

5 years ago by lostlogin

> Frequently you can get the Exos drives cheaper than Ironwolf anyway.

You can. I’ve been getting the 16s and they are great but with caveats. The size makes volume creation and expansion insanely long (like a week per added drive). Not Seagate’s fault I know. The noise. They are loud. They are either chirping away or grinding away.

5 years ago by louwrentius

I once had a 20-drive NAS with 1 TB Samsung Spinpoints F1s.

The NAS is replaced but I still have the drives just for labbing / testing purposes.

I never had a drive failure during the lifetime of the NAS. Probably because it was off most of the time and only powered-on with wakeonlan when needed.

So those drives don’t have many hours on them. But recently they started dying. I lost 3 of them this year during some tests.

Imagine that these drives are probably 10+ years old.

Age does seem to matter.

Obviously this is a small uncontrolled sample but it seems that you really should keep this in mind when you run a NAS at home. Keep an eye on the SMART parameters as suggested by Backblaze and really consider replacing drives at some point. I would be afraid that drives do start dying at the same time due to age.

5 years ago by londons_explore

Does backblaze ever power cycle the drives (either on a schedule or due to planned/unforeseen circumstances)?

If so, it would be interesting to know how many drives failed on the day of a power cycle vs days with no power cycle.

I know other providers have found that "power cycle days" can be 100x more deadly for drives than "non-power-cycle days". It can have a massive impact when estimating data loss probabilities - since unforseen power cycle days tend to impact more than one drive at a time...

5 years ago by andy4blaze

Andy for Backblaze here: I looked at that 3-4 years ago. It looks like power cycling increased failure rates, but we don't power cycle our systems very often, maybe 1-2 a year, so not the best use case. This is on my list for a relook one of these days, if we find anything interesting we'll let folks know.

5 years ago by drexlspivey

I bought a toshiba NAS drive mainly due to backblaze’s stats and it came bricked out of the box :(

5 years ago by tinus_hn

On the bright side it didn’t lose you your data!

5 years ago by dragontamer

That very well could be on the delivery person / packing of the hard drive.

Hard drives are quite fragile: if they're dropped hard by the delivery person (ex: dropped onto your concrete steps), they could break.

5 years ago by undefined

[deleted]

5 years ago by agumonkey

black swan spotted

5 years ago by londons_explore

Think of the classic "bathtub" curve (which says that young drives fail more frequently, old drives fail more frequently, and mid-age drives are most reliable).

That curve doesn't seem to match the data here. Or if it does, it says the "old" increase in failure rate happens at over 5 years.

I would guess backblaze will replace these old drives because they are too small/too slow/use too much power before they replace them for being too unreliable.

5 years ago by andy4blaze

Andy for Backblaze here: A while back we did an analysis of drive failure over time, i.e. the bathtub curve. It is probably a good idea to update that, as I believe we are seeing lower failure rates upfront these days.

5 years ago by nikisweeting

I feel like SSD failure has little correlation with hours running, and more to do with TBW. Would be nice to see some read/write totals on these stats going forward.

5 years ago by toast0

Depends on the SSDs and the cause of failure. There have been high profile cases where a firmware bug meant an absolute cap on hours running.

I've been involved in server farms with thousands of (mostly Intel) SSDs and (mostly WD) spinning drives; the spinning drives tended to have pre-failure indicators, but we couldn't figure out any indicators before SSD failure and generally they would just completely disappear from the bus when they did fail. The failure rate was signficantly less though. Our write rate wasn't very high and tended to be small writes; for busy disks, more than what we could do with a spinning disk, but usually not near the capability of the drives.

5 years ago by sekh60

Just a homelabber, but in my 40+ drive homelab I've noticed the same. HDDs normally toss a couple SMART errors when dying. For SSDs I've gone through maybe 10 of them so far, and they just suddenly kick the bucket without warning.

5 years ago by nikisweeting

My experience with WD SSDs is that they just slow down more and more as they get close to the TBW limits, without failing fully / dropping off the bus (as another user describes here: https://news.ycombinator.com/item?id=27040491).

Backblaze Drive Stats for Q1 2021

Daily Digest