ComputerGuru 1 day ago

Image as in “filesystem snapshot” not as in “media file”.

  • ValentineC 1 day ago

    Speaking of images (media file), I found out that iCloud double-counts storage space for rotated photos, so I need to subscribe to a higher tier just because the iPhone often can't get orientation right, especially for food photos.

    Would be nice if they moved to storing diff sidecars.

saagarjha 1 day ago

I have to admit that using C syntax as a string to parse something from Python is definitely a choice. I'm not even sure I would use C structs to lay things out in C…

  • goeiedaggoeie 1 day ago

    in video/image space most code we deal with day to day is still C, lots more rust plugins in gstreamer ecosystem, but 90%+ still C

    • saagarjha 1 day ago

      Sure, but it's a greenfield Python script

    • flohofwoe 1 day ago

      The article is about disk images though :)

      But yeah, while I think the `cstruct` helper function to describe a binary data layout in Python is more elegant than the builtin alternatives, it would have been much less painful to just go with a minimal C command line program (or any other programming language where a struct directly maps to memory). Python and most other scripting languages have been built for manipulating text data, but suck when working with binary data.

  • quietbritishjim 1 day ago

    I asked an LLM to rewrite it for me using the Python built-in struct module, and it gave me this:

       import sys
       import struct
       from collections import namedtuple
       
       # Bake the layout once into a reusable, precompiled object.
       HEADER = struct.Struct(">4sIIIQQ16sQQIIII")
       
       # struct only knows positions, not names — pair it with a namedtuple
       # to recover the named-field access that cstruct gives you for free.
       Header = namedtuple("Header", [
           "magic", "field4", "field8", "fieldC",
           "field10", "field18", "field20",
           "field30", "field38",
           "field40", "field44", "field48", "field4C",
       ])
       
       with open(sys.argv[1], "rb") as fh:
           header = Header._make(HEADER.unpack(fh.read(HEADER.size)))
       
       print(header)
    

    To me, this seems significantly less readable... less Pythonic, even. The printed output is also less readable.

    • saagarjha 1 day ago

      No I think Python's struct module is also really bad. My point is if you are making a new DSL for laying out arbitrary formats why not do something better than what we have

      • toast0 1 day ago

        I would assume dissect.cstruct was written for interopt with c programs using C structs, or to use formats documented as C structs. Not as a greenfield tool for arbitrary formats.

        C structs seem less bad than python structs, so why not use them? Especially why write a struct parser and create a DSL for it, when there's already one that you can use that uses a well known DSL you might already understand.

      • schamper 1 day ago

        Author here, this is a valid point but there are also valid reasons to choose C structures. The larger framework that this is a part of is primarily targeted towards people working in cybersecurity, not software engineers. Cybersecurity people are very often not great software engineers and there is a high throughput of “throwaway” scripts, or “make a quick hacky change”. C is commonly already well understood, a bespoke DSL usually is not and requires a learning step. You can “hit the ground running”, so to say.

        And, as a bonus, creating, say, a filesystem implementation is now often as easy as copy/pasting existing C structure definitions, either from the original source (which is usually C) or from reversing tools such as IDA/Ghidra.

        There’s no right or wrong way in my opinion, just preferences.

        • a_e_k 1 day ago

          I became a fan of Kaitai Struct [0] when doing some amateur sleuthing last year. It has a web-based IDE [1] for writing and testing structure definitions against hex dumps, and can generate binary parsers in Python (and many other languages) right from the Web IDE.

          [0] https://doc.kaitai.io/user_guide.html

          [1] https://ide.kaitai.io/devel/

      • quietbritishjim 1 day ago

        OK so what's your alternative then? It's easy to say you don't like something but the onus is on to show there's something actually better.

        The library used in the author's post seems perfectly readable to me, enough that it didn't even register until I read your comment. Could it be tweaked slightly to not use C syntax? Sure, but it's still going to need roughly the same pattern of identifier + type (including size). Types in C are straightforward so long as you don't have functions/pointers (which have the "inside out" problem, but they're not needed for binary encodings), so you're going to be looking at pretty trivial changes to syntax. Certainly not enough to warrant this level of quibbling.

        • saagarjha 17 hours ago

          idk just spitballing I would maybe do something like

            from parser import struct, packed, array, u8, u32, u64
            
            @struct(packed)
            class ASIF:
                magic: array[u8, 4]
                field4: u32
                field8: u32
                fieldC: u32
                field10: u64
                field18: u64
                field20: array[u8, 16]
                field30: u64
                field38: u64
                field40: u32
                field44: u32
                field48: u32
                field4C: u32
            
            let asif = ASIF.from_bytes(...)
            print(asif.fieldC)
          • quietbritishjim 3 hours ago

            I'll admit I do really like that.

            I still think it proves my point: your original objection was about the syntax being C-like and, as I predicted, the differences in syntax in your idea (where the type goes, colon vs positional, etc.) are all trivialities that don't affect usability.

            What's better about your idea is that it's actual Python code rather than being embedded in a string. Maybe that was your point originally and I misunderstood.

            Looks like this package works like this: https://harrymander.xyz/dataclasses-struct/

fragmede 1 day ago

I like a good jaunt with IDApro as much as the next RE, but my question is what does ASIF do that Qcow2 doesn't?

My other question is why does it take so long to copy an app out of a dmg and into /Applications. Like, just change some pointers to pointers to data on disk and shit.

  • donatj 1 day ago

    > what does ASIF do that Qcow2 doesn't

    Mount natively in macOS

    > why does it take so long to copy [...] out of a dmg

    Compression mostly. DMG contents can optionally be compressed using zlib, lzfse, or slow as molasses bzip2.

    Also Gatekeeper.

    • 1e1a 1 day ago

      Additionally, while I don't know much about APFS, I don't think it would be beneficial to point the extracted app to blocks that are also part of the dmg file, i.e. some copying has to happen anyway.

      • fragmede 1 day ago

        in a perfect theoretical filesystem, copy-on-write means copying is as cheap as moving a file, though uncompressing time makes sense.

        • galad87 1 day ago

          Of course, but that works only if the files are already in the same partition. A dmg is a virtual image, even if it's stored in the same partition, once mounted it acts like another partition.

        • xoa 1 day ago

          A perfect theoretical filesystem can still have subjective user configurable choices though right? Like case sensitivity, UTF normalization, checksum hash function, extra copies of data/metadata to store for redundancy/healing, etc (as well as compression/encryption). I think ZFS is a pretty strong real world example of a CoW FS, but you can still set a lot of different properties between sub-fs and then need to copy when you go between them to get the structural changes.

          Disk images are supposed to function as if they're attached storage I think, and have different properties from what FS you're running on boot or your home folder (which themselves can be different, I run my home folder on my main Mac off a NAS via iSCSI). I'm not sure any underlying FS would avoid a copy operation there in general?

ARTKILL 1 day ago

Worth noting ASIF's compression tradeoff also affects Spotlight indexing — since the content is opaque until mounted, you lose searchability on unmounted disk images that you'd get with a regular folder structure.