Fuzzing between the lines in popular barcode software

8 months ago/58 comments/blog.trailofbits.com

> You might ask: how do you know whether or not software has been fuzzed?

zbar has great barcode reading performance! I've seen far newer software that's nowhere near as good in terms of real-world performance.

But it seems the original developer hasn't updated it since 2009 [1] - and fuzz testing only rose to prominence in ~2012 with the rise of tools like afl-fuzz.

I would be absolutely astonished if it had ever been fuzzed.

> Cut out any unnecessary features to limit attack vectors. ZBar by default scans all code types, which means that an attacker can trigger a bug in any of the scanners. If you only need to scan QR codes for instance, then ZBar can be configured to do so in the code

Absolutely sensible, yes.

Not just for security, but also because packages sometimes have extra barcodes. If you're scanning an EAN-13 on a pack of pasta, decoding a QR code for a pasta recipe website is just going to confuse things :)

[1] https://sourceforge.net/projects/zbar/files/zbar/

8 months ago by zootboy

I've seen the "overzealous barcode scanner" issue happen with some gas station POS systems, to the point where the seasoned cashiers know to cover the QR codes with their fingers before attempting to scan an item.

8 months ago by neilv

Sounds like the POS software isn't controlling the reader well, maybe because it wasn't adjusted for this model of reader. Or the reader's firmware could have been misconfigured, from what it's supposed to be for that POS setup.

The modern reader firmware tend to have multiple modes and many options. Some modes are as simple as "scan whatever you see out of the many formats you support, and spit out the decoded value of something as USB Serial". Or, worse, "...as USB Keyboard".

You can imagine how easy those modes are to integrate with POS software, without implementing the proprietary protocol for that device, and you can also imagine how poorly that can work out.

If you owned a store with a POS setup with flaky reader behavior like this, and were stuck with it, you could try reconfiguring the reader (to, say, disable QR support). This reprogramming can sometimes be done via documented protocol, via sketchy Windows software, or via... barcode... Careful you don't make it worse.

(Our startup used modern readers (multiple 1D formats, QR, NFC) for a factory station, and had to do a lot of experimenting with different brands and models, to get the behavior and speed we needed. We even managed to brick a reader, just with configuration changes, not flashing firmware.)

8 months ago by kevincox

The shop may use QR codes for coupons or loyalty programs even if the merchandise doesn't use it. So being able to scan these items without switching mode is often an important feature.

8 months ago by masfuerte

I went to a meeting the other day in a building with a touch screen registration system. The woman in front of me was struggling with it. Every time she tapped the register button the system decided that some part of her was a badly formed barcode, printed an error message and exited back to the menu. She eventually got it working by moving to the side until it wanted to take her picture.

8 months ago by undefined

[deleted]

8 months ago by EvanAnderson

Absolutely. I helped with a physical inventory count project using smartphones as the "terminals". The barcode app we didn't allow us to selectively turn off symbologies. We ended up with a ton of links to recipes, websites, etc in the data.

8 months ago by 01HNNWZ0MV43FF

Reminds me of the Jurassic Park novel where they ask the computer to find 10 velociraptors on the island and it finds 10. And they actually have 20.

8 months ago by devmor

It's also a common annoyance in grocery store apps.

Kroger, for example, has an app that allows you to scan items to add them to a virtual cart as you shop and avoid scanning them at the register... however the same app is used to read QR codes on in-store coupons, which are "helpfully" placed very close to the price tags with UPC barcodes on them.

If I want to scan one of those coupon QR codes, I need to either start with the camera very close to the QR code or cover the barcode with my finger.

8 months ago by bragr

It appears to have been forked: https://github.com/mchehab/zbar

8 months ago by DylanSp

That's the repo Trail of Bits was working with; the PR they ended up submitting is at https://github.com/mchehab/zbar/pull/294.

8 months ago by billpg

I once reported a bug to a barcode decoding library, reporting that it crashed when the barcode contained a zero byte. They responded that they wouldn't fix it because barcodes aren't supposed to contain zero bytes.

"But it crashed. That's bad. I can't stop people scanning bad barcodes."

8 months ago by unnouinceput

"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Universe is winning." - Rick Cook

8 months ago by lazide

Combined with the all to human reflex of engineers to insist that it isn’t their implementation/design that is wrong, it is reality which is wrong. Clearly.

Because if we just didn’t do that, then it would all work.

In particular, see folks talking about Self Driving hah.

8 months ago by TacticalCoder

> They responded that they wouldn't fix it because barcodes aren't supposed to contain zero bytes.

Sad. What a poor understanding of our field.

The number one rule of them all is: "Never trust (user) input".

A slightly more powerful variation being: "assume all input is malicious until proven otherwise".

I mean: on one hand there are people who fuzz, who test, who think about edge cases, who think about security, who think about uptime, etc. And OTOH you have people saying "such input shouldn't happen". It's just really pathetic.

8 months ago by adolph

I think a difference between an application and a library (or module, etc) is that it is ok for the latter to expect sanitized input and be wrapped in try/catch blocks. The world is less finite than code and a module might be deployed in a variety of contexts which might make some checks undesirable.

In computing, the robustness principle is a design guideline for software that states: "be conservative in what you do, be liberal in what you accept from others". It is often reworded as: "be conservative in what you send, be liberal in what you accept". The principle is also known as Postel's law, after Jon Postel, who used the wording in an early specification of TCP.

https://en.wikipedia.org/wiki/Robustness_principle

8 months ago by david422

If that's the case, the library should also have another function or method that can validate the barcode if the application should so choose. The library is the barcode expert, the app is the business logic expert. Expecting every app to now become barcode experts doesn't make sense.

Also, that law gets quoted, and IMO is a rather large design mistake.

8 months ago by bitexploder

The library also has the best chance to fix and prevent security issues systemically. I have played this game for a while now. Library engineers often want to pass the buck onto users of their tools. That is not good developer or user experience. Also crashing is the opposite of robust.

8 months ago by 0cf8612b2e1e

Malformed data is a fact of life. A parser should gracefully fail when this eventuality happens.

8 months ago by alex_suzuki

Do you by chance remember which library, and which barcode symbology? (barcode library developer here :-)

8 months ago by billpg

I do remember it was a large 2D barcode. Like QR but with a square in the middle. (AZTEC?)

I was trying random barcodes I had lying around to test my own component. The one with the zero byte happened to be a large one they had added to my passport when I visited the USA. It had "US-VISIT" printed next to it in big letters.

The device was a rugged industrial handheld device with a screen and a camera, designed for mailrooms and warehouses. This was around 20 years ago and I remember the OS (including the barcode component) was completely bespoke and it ran without any process protections. This meant that the barcode would crash the whole device and you had to perform a hard reset.

8 months ago by alex_suzuki

Square in the middle sure sounds like Aztec. It‘s used alot for airline boarding passes. What‘s more common with zero bytes instead of crashes is truncation… some part of the code assumed the zero byte terminates a string. Thanks for replying!

8 months ago by mmsc

> Surprisingly, libFuzzer struggled to figure out that input should be of size 1024 and couldn’t start fuzzing.

Is this surprising? Does libFuzzer support Redqueen or laf-intel like AFL++ [0][1] which will pick up on any comparisons (like a comparison to size=1024) and fuzz with the intention of changing that comparison to become true or false (to put it overly simple)?

0: https://github.com/AFLplusplus/AFLplusplus/blob/stable/instr...

1: https://github.com/AFLplusplus/AFLplusplus/blob/stable/instr...

8 months ago by circus1540

libfuzzer has features to solve comparisons including a comparison table and value profile. in either case, it should be pretty easy to find that a 1024 size input unlocks new coverage without any of those fancy features. i doubt that was the problem here.

8 months ago by firefax

If I wanted to learn more about fuzzing, does anyone have suggestions?

I'd love to get to a point I could fuzz a program but the gulf of execution is vast -- I enjoyed attempting OSCP, but I can't keep paying for lab extensions.

(I also have a gut feeling there's a lot of unfuzzed apps which people don't look at because they're utilitarian and don't use the network much. So if I can phish you, then leverage some innocuous tool for RCE or whatever... useful.)

But I've struggled to find resources on this topic -- anyone know of a book, course, or wiki?

8 months ago by woodruffw

The authors of this blog (FD: my company) have a testing handbook[1], which has a full chapter dedicated to fuzzing[2]. We're always open to feedback on it!

[1]: https://appsec.guide/

[2]: https://appsec.guide/docs/fuzzing/

8 months ago by djoel

This is great - thanks for posting!

8 months ago by rwmj

I would start with the AFL++ documentation (https://aflplus.plus/features/), and an open source program that you want to fuzz. The easiest programs to fuzz with AFL are ones that parse a file format from the command line, the smaller the better and written in C or C++ (just for ease of recompiling with instrumentation).

Parsing network protocols and ABIs is possible, but usually requires a fair amount of coding.

8 months ago by firefax

>The easiest programs to fuzz with AFL are ones that parse a file format from the command line, the smaller the better and written in C or C++ (just for ease of recompiling with instrumentation).

Thanks, this is useful context -- it's easy to get overwhelmed and quit early on with these sorts of things. It looks like someone else posted a set of exercises[1] using AFL that seem to be aimed at smaller programs like you describe.

[1] https://github.com/antonio-morales/Fuzzing101

8 months ago by JonChesterfield

LLVM ships with a fuzzing library, docs at https://llvm.org/docs/LibFuzzer.html. I get the impression that AFL is considered better. The authors of llvm fuzz stopped working on it in favour of some other thing, which they then stopped working on in favour of https://github.com/google/fuzztest, which seems to be broadly useless as a fuzzer implementation. But whatever, the llvm fuzzer lives on and has uses in tree and occasional updates. I found it much easier to get started with than AFL.

I wrote a program that takes a byte array as input and drives the library under test with it, attached that to llvm's fuzzer and left it running. You end up with a lot of files containing some bytes that did something vaguely interesting with the program. Good experience overall.

You might get some meaning out of https://github.com/JonChesterfield/bigint/tree/trunk/fuzz_bi... but ymmv, I got sidetracked by interesting stuff at work ~3 months back and don't currently remember what state that repo was in when I paused work on it.

8 months ago by firefax

> get the impression that AFL is considered better. The authors of llvm fuzz stopped working on it in favour of some other thing, which they then stopped working on in favour of https://github.com/google/fuzztest

Thanks, this kind of social stuff can be useful -- it looks like all the resources folks shared seem to favor AFL.

8 months ago by r9295

https://github.com/antonio-morales/Fuzzing101

Is a good course

8 months ago by grumbelbart2

I don't quite follow the input - does this mean they created Barcodes or Data Codes that crashed the library? I.e. something that I can print out and that might break a few devices if printed on, for example, my luggage before checking it in?

8 months ago by michaelt

Crashing the library - and potential arbitrary code execution!

However, zbar isn't used all that widely in industry. The airport's baggage handling system is much more likely to have a self-contained scanner from Cognex or Omron or Zebra running propriety, closed-source software.

8 months ago by EvanAnderson

You got it. Crashing the device where the barcode is being interpreted (and possible getting arbitrary code execution).

Secondarily, there's probably also a rich vein to be mined scanning barcodes like "'); DROP TABLE Item" that would exploit systems further up the chain. That's not what this article is covering (since they're just looking at the barcode scanning library).

There would be some fun in carrying around a bunch of "edge case" barcodes ("programming" barcodes for various kinds of scanners, SQL injection attacks, etc) and feeding them to unsupervised barcode scanners "in the wild" to see what happens.

8 months ago by OkGoDoIt

My interpretation of the original article is they use the fuzzer to find an arbitrary very small bitmap input which when passed to the library causes it to crash. It’s unclear if the input image is even a valid bitmap image format that would correctly open in an image viewer.

This is definitely still a problem because there might be situations where you’re allowing an end user to pass an image file in and are then passing it unmodified to this library to interpret the barcode in it, but it’s not the same as some special barcode that encodes data that crashes the library.

So for example this blog entry does not describe a situation where you can just print out a barcode and when you scan the barcode then the library crashes or has the opportunity for arbitrary code execution. That would be a very exciting exploit. They don’t actually rule out the possibility, but they didn’t get anywhere near fuzzing at that level in this blog post.

8 months ago by azeirah

I'm working with barcode scanners and difficulties handling a variety of inputs.

My boss keeps telling me "it's not that difficult". I keep telling him "it's more difficult than you believe".

8 months ago by bspammer

I think this really demonstrates how valuable nixpkgs is. It’s the Wikipedia of building packages, and 10 years ago I wouldn’t believe it could exist, or be this good.

8 months ago by orng

Only slightly related but on the topic of barcodes and security I'd like to recommend this excellent talk by Felix Lindner, it is quite a few years old but I'd guess stuff like barcode scanners are not the most frequently updated things:

Toying with barcodes - https://www.youtube.com/watch?v=QCtdEYnlykA

Daily Digest

Get a daily email with the the top stories from Hacker News. No spam, unsubscribe at any time.

Home About GitHub Kaggle

AI Blog Deep Learning Apps Security Checklist

Bookmarks Hacker News My Stack