With the recent release of VirtualBox 7 and Siege beginning trial evaluations of our new R4NG3R framework for cyber-experimentation, I want to take a look back at a fun technique that we used for getting files transferred into VirtualBox virtual machines when using the VirtualBox API exposed through the web service bindings.
Background
During our cyber-experimentation process, we create and provision virtual machines (VMs) with whatever files they need for the current test run. In its primitive form, the contents are:
- an archive (TAR file or ZIP file, depending on whether the VM runs a Unix or Windows operating system); and
- one or more extraction scripts (
sh
-compatible on Unix, and a mix of.bat
,.vbs
, and PowerShell on Windows).- Some of what these scripts do could probably be done more easily with more fully featured programming languages, but our aim is for maximum compatibility. These scripting environments are very widely available in default installations.
At the end of a test run, more scripts (that arrived in the archive) are used to bundle up collected data into archives that are then transferred back out from the VM. The archive to be pulled back from each VM is a TAR on Unix and a ZIP on Windows.
The Odyssey
VirtualBox has a remotely accessible API, but some of the operations, such as those that transfer files to and from VMs, essentially require local access to the host, since files can only between the VirtualBox host and the VM. The VirtualBox API doesn’t have methods for transferring arbitrary files to and from a remote client to the VirtualBox host. What’s a desperate developer to do? And is there a solution that can work so as to make the experience the same working on either a local or a remote VirtualBox?

Attempt One: A Shared Fileserver
In our first solution to the problem, we just set up a shared fileserver, mounted it from both the VirtualBox host and the client machine, and added some logic to map between “as seen by the client” and “as seen by VirtualBox” paths. For instance, after setting up a network attached storage (NAS) device:
- a share on the NAS is connected as
V:\vboxshare
on the client - the same share is connected at
/mnt/vboxshare
on the VirtualBox host
Then, when the client wants to transfer some new file, archive.tar
, into a VM, it copies it to V:\vboxshare\archive.tar
, and then invokes the method fileCopyToGuest with /mnt/vboxshare/archive.tar
as the source
argument.

This works, but it requires setting up file share, and there’s potentially twice as much network overhead in moving files around as what’s needed to get a file from one place to another. A simple optimization when the client code is running on the VirtualBox directly is to skip the intermediate transfer to the NAS. That saves the network overhead, but it makes the implementation more complex.
Despite its shortcomings, this approach got us moving along quickly, It wasn’t ideal though, because of the network overhead, and because it required additional configuration at the endpoints and connecting network shares, but we’d really like to minimize the amount of configuration that end users have to do.
Attempt Two: A Series of Clever Tricks
The bare minimum necessary to accomplish the behavior the desired behavior was a mechanism to transfer arbitrary files between the client and the VirtualBox host. We scoured the API documentation to make sure we hadn’t missed something, but didn’t find any methods for arbitrary file management. Some methods, like those that create VMs, do result in new files being created, but not with arbitrary content.
Getting Files to Virtual Machines
Before a deeper dive, the short version is that:
- The API doesn’t support sending arbitrary files to the VirtualBox host, but it does allow creating disk media files.
- The disk media files have to be legal disk images though, not just arbitrary bytes.
- Except that RAW floppy images can be arbitrary bytes, but have to be sector aligned–their size must be a multiple of 512 bytes.
- Scripts and ZIP files can be padded (with whitespace and comments, respectively), and it turns out that the TAR format ensures that TAR files are always multiples of 512 bytes.
- End result: pad files as needed, write their content to RAW floppy images on the VirtualBox host using the API, and then transfer them into VMs.

We started looking for any methods for reading and writing sequences of bytes. We ended up at (emphasis added):
IMedium
, which “represents virtual storage for a machine’s hard disks, CD/DVD or floppy drives. It will typically represent a disk image on the host, for example a VDI or VMDK file** representing a virtual hard disk, or an ISO or RAW file representing virtual removeable media”.IMediumIO
, an interface “used to access and modify the content of a medium”
This piqued our interest, because there are methods for creating, reading, and writing from IMedium
instances:
IVirtualBox::createMedium
– allows specifying the path on the host of a file to createIMediumIO::read
,IMediumIO::write
– allows reading and writing the contents of those files
A plan was born! To transfer files to a VM, we’d use createMedium
and write
to create a file on disk, and then transfer it into a VM. It shouldn’t really matter if the content of the “medium” wasn’t actually a legal ISO, or VDI, or disk image, right?
Turns out it does matter. The createMedium
starts creating the medium, and the actual file on disk is created by a subsequent call to createBaseStorage
(or createDiffStorage
), and ISOs are expected to be real ISOs, and VDIs real VDIs, and so on.
What was needed was a medium format that could match big blobs of bytes. Fortunately, the RAW format(for “floppies”, broadly speaking) allows just that. A couple of quick tests transferring some TAR files into some Linux VMs seemed to work just fine, and it looked like we were all set.
As soon as we started trying to move some ZIP files and the scripts around, things screeched to a halt. Luckily, the exceptions and documentation made the problem clear: RAW floppy images have to multiples of 512 bytes. This is because disk sectors are traditionally 512 bytes, and the RAW images must be sector aligned. It was a happy coincidence that the tar file format just so happens to ensure that TAR files will, in fact, have sizes that are multiples of 512 bytes:
The file data is written unaltered except that its length is rounded up to a multiple of 512 bytes. The original tar implementation did not care about the contents of the padding bytes, and left the buffer data unaltered, but most modern tar implementations fill the extra space with zeros. The end of an archive is marked by at least two consecutive zero-filled records. (The origin of tar’s record size appears to be the 512-byte disk sectors used in the Version 7 Unix file system.) The final block of an archive is padded out to full length with zeros. –tar (computing)
We still needed a way to transfer scripts and ZIP files. The scripts were relatively easy: the scripts are text files in which whitespace is not significant, and can be padded with whitespace (e.g., spaces, or newlines) with no change in behavior.
The ZIP files were a bit trickier, but have a feature that allowed them to be padded as well. ZIP files are common in polyglots (files that are legal instances of multiple file formats), because of some useful features in the format. Of particular interest to us, a ZIP file ends with an “end of central directory (EOCD) record”. The last field in the EOCD record is a comment of n bytes, where the length n is specified by the two bytes preceding the comment. With two bytes to specify length, the comment can be up to 65535 bytes. That clearly provides enough room to pad out any ZIP file with some comment bytes out to a multiple to 512, making the ZIP file a legal RAW floppy image.
To recap:
- The VirtualBox API doesn’t have generic file management methods, but does allow creating arbitrary files for medium storage, as long as they conform to the right file format.
- The RAW floppy format allows for arbitrary bytes, so long as the total file size is a multiple of 512 bytes.
- TAR files are always multiples of 512 bytes.
- Scripts can be padded with whitespace to multiples of 512 bytes.
- ZIP files can be padded to multiples of 512 bytes, so long as the “comment length” field is also updated.
This is enough to get almost arbitrary files onto the VirtualBox host, and then into VMs.
Getting Files from Virtual Machines
Starting again with a quick introduction before a deeper dive:
- The API doesn’t support receiving arbitrary files from the VirtualBox host, but it does allow reading the content of disk media files.
- The disk media files have to be legal disk images though, not just arbitrary bytes.
- Except that RAW floppy images can be arbitrary bytes, but have to be sector aligned–their size must be a multiple of 512 bytes.
- The archives that need to be pulled back from VMs are created inside those VMs with very limited programming environments.
- We found a way to pad ZIP files using Windows Batch scripting up to sizes that are 512-byte multiples.
- The result is a slightly malformed ZIP file, but one that can be corrected once it’s pulled back.
- End result: use Windows Batch to pad ZIP files as needed, read their content as RAW floppy images on the VirtualBox host using the API, and then transfer them back to the client.

As mentioned earlier, we try to keep the scripts that are used within the VMs to a bare minimum, and using only the most common tools that are available on the widest range of platforms. It’s occasionally a tradeoff between elegance and portability, and we want to maximize portability.
At the end of a test run, the various data files within the VM are packaged up into a TAR or ZIP file and retrieved from the VM. Any file can be transferred out of the VM and into the VirtualBox host, but the VirtualBox API only provides methods for reading files that can be treated as storage media. We assumed that RAW floppy images would again be the safest bet, but needed a way within the VMs, using only the limited shell scripting environments available, to ensure that the TAR and ZIP files generated within the VMs would have 512-byte multiple sizes.
The TAR files, by virtue of their format, automatically comply. ZIP files were only a little bit more complicated. There are few options in Windows batch for appending bytes to files, and even fewer for overwriting the length field in the EOCD record. However, we did find that using Windows cmd
we could use these commands:
echo . >> file.zip
adds 4 bytes to a file (any character could be used in place of.
)echo. >> file.zip
adds 3 bytes to a file. This adds one fewer bytes becauseecho.
, with no space betweenecho
and.
, is used to print a newline
Combinations of 3 and 4 bytes can’t necessarily pad every file to the next multiple of 512 bytes. For instance, they can’t be used to pad a 510-byte file to 512 bytes. But, because they have a difference of 1, they can achieve some multiple of 512. In fact, even though the next multiple of 512 might not always be attainable (e.g., 512 from 510), the next one after that will be, so this method will never waste much more than 512 bytes. The proof (it’s not all that difficult) is an exercise for the reader.
Though the ZIP file will now have a size that aligns with disk sectors, its comment length field was never changed, so it might not be a strictly legal ZIP file anymore. But do we care? Should we? The file-format-purists among us say, “yes, of course we should! We don’t want malformed files around!”
It turns out that Windows Explorer doesn’t seem to care. When you click on a downloaded ZIP file, you’re not shown the comment, so does it really matter whether it’s there, and whether the length field is right? Well, Windows Explorer is a nice test, but we have Java code that needs to process the ZIP file, too. We were a bit surprised that even Java APIs that provide some access to ZIP comments, like java.util.zip.ZipInputStream and java.util.zip.ZipFile don’t seem to mind either!
We also appreciate Java’s Zip File System Provider, but it chokes when the comment and the comment length field don’t match. Fortunately, we know that ZIP as originally created had a length of 0, and we padded with non-zero characters, so it’s easy enough to fix the comment length field in the EOCD record post hoc (or even just truncate the comment bytes off the end of the file).

Recap
The VirtualBox API does not support arbitrary file transfers, but it does allows creating, reading, and writing disk images on the VirtualBox host. To accomplish file transfers using the VirtualBox API between a remote client and virtual machines, we turned scripts and ZIP archives into polyglot files that also work as RAW floppy images. In one direction, this required padding files using Windows Batch and some modular arithmetic. These files are then transferred into virtual machines, priming our cyber-experimentation process and wrapping up data collection for later analysis.