Embedded WAVE, thanks HP 👋

Digital Preservation is all about identifying risks. This is done through a process which includes identification, validation, and metadata extraction. The more you know about the digital data you need to preserve over time, the more you can do to minimize those risks with the goal of making the data accessible over time.

Many formats are pretty straight forward, they are identifiable through a header and then have some binary bits or plain text that is readable by certain software. Others are more complicated. A common practice for more complex needs is to use a container. Word processing programs started out with plain text with maybe some formatting codes mixed in, then many moved to the Microsoft OLE container so you could have additional content embedded in a single file. Today file formats such as DOCX use a ZIP container, which houses all the text, images, formatting and anything else the format supports. Knowing what the format is and knowing what it may contain is important to preservation.


I collect older digital cameras, specifically cameras with unique file formats, raw and otherwise. When I picked up a HP (Hewlett-Packard) point and shoot camera awhile back, I was initially unimpressed as it would only capture in a JPEG format and only 3 quality settings. While looking at a copy of the manual, I saw the camera was capable of capturing audio clips or voice memos for each photo taken. This can be handy when taking many photos and need a reminder about the context. This was not unique to HP, as many cameras could do this, normally a JPG was captured and the Audio would have the same name connecting the two. But when I recorded some audio on my little HP, placed the SD card in my computer, I couldn’t find the additional audio file. I also not the only one to ask about this.

There are many types of JPG files. Raw Streams, JPEG File Interchange Format (JFIF), and Exchangeable Image File Format (EXIF). Normally these formats have raster image data sprinkled with metadata. I have seen JPEG files embedded into other formats and containers, such as MP3, PDF, etc, but JPEG’s are not container formats. Or so I thought…..

View of HP Photosmart 433 folder in HP Photo & Imaging Gallery

Lets take a look at an image I took with my HP Photosmart 433. We’ll start with identification:

siegfried   : 1.10.1
scandate    : 2023-05-25T12:27:04-06:00
signature   : default.sig
created     : 2023-05-22T08:43:02-06:00
identifiers : 
  - name    : 'pronom'
    details : 'DROID_SignatureFile_V112.xml; container-signature-20230510.xml'
filename : 'GitHub/digicam_corpus/HP/Photosmart 433/IM000959.JPG'
filesize : 178922
modified : 2023-05-25T11:23:32-06:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'x-fmt/391'
    format  : 'Exchangeable Image File Format (Compressed)'
    version : '2.2'
    mime    : 'image/jpeg'
    class   : 'Image (Raster)'
    basis   : 'extension match jpg; byte match at [[0 16] [366 12] [178907 2]] (signature 2/2)'
    warning : 

IM000959.JPG was identified as x-fmt/391 which is a compressed Exchangeable Image File Format. version 2.2. Pretty straight forward. Next lets look at validation:

Jhove (Rel. 1.28.0, 2023-05-18)
 Date: 2023-05-25 12:35:16 MDT
 RepresentationInformation: GitHub/digicam_corpus/HP/Photosmart 433/IM000959.JPG
  ReportingModule: JPEG-hul, Rel. 1.5.4 (2023-03-16)
  LastModified: 2023-05-25 11:23:32 MDT
  Size: 178922
  Format: JPEG
  Status: Well-Formed and valid
  ErrorMessage: Tag 41492 out of sequence
   Offset: 606
  MIMEtype: image/jpeg
   CompressionType: Huffman coding, Baseline DCT
    Number: 1
      FormatName: image/jpeg
      ByteOrder: big_endian
      CompressionScheme: JPEG
      ImageWidth: 640
      ImageHeight: 480
      ColorSpace: YCbCr
      DateTimeCreated: 2021-11-16T09:04:04
      ScannerManufacturer: Hewlett-Packard
      ScannerModelName: hp PhotoSmart 43x series
      DigitalCameraManufacturer: Hewlett-Packard
      DigitalCameraModelName: hp PhotoSmart 43x series
      FNumber: 4
      ExifVersion: 0220
      FlashpixVersion: 0100
      ColorSpace: sRGB
      ComponentsConfiguration: 1, 2, 3, 0
      CompressedBitsPerPixel: 1.568
      PixelXDimension: 640
      PixelYDimension: 480
      MakerNote: 0, 97, 48, 101, 114, 32, 78, 111, 116, 101, 115, 0, 0, 0, 0, 0
      DateTimeOriginal: 2021:11:16 09:04:04
      DateTimeDigitized: 2021:11:16 09:04:04
   ApplicationSegments: APP1, APP2, APP2, APP2, APP2, APP2, APP2, APP2, APP2, APP2, APP2, APP2, APP2, APP2, APP2, APP2

I removed a few lines to show important parts, but we get some similar information about the format, a JPEG with EXIF version 2.2. We also learn that HP improperly ordered their tags and put Tag 41492 out of sequence, but we can ignore that for now. Looking close at the output does not give us any indication of audio formats. There is a clue when we see the mention of a Flashpix version and additional Application Segments.

Since this is an image with EXIF data, lets also take a look at the output of Exiftool.

ExifTool Version Number         : 12.62
File Name                       : IM000959.JPG
Directory                       : .
File Size                       : 179 kB
File Modification Date/Time     : 2023:05:25 11:23:32-06:00
File Access Date/Time           : 2023:05:25 11:24:42-06:00
File Inode Change Date/Time     : 2023:05:25 11:24:39-06:00
File Permissions                : -rwxr-xr-x
File Type                       : JPEG
File Type Extension             : jpg
MIME Type                       : image/jpeg
Exif Byte Order                 : Little-endian (Intel, II)
Image Description               : IM000959.JPG
Make                            : Hewlett-Packard
Camera Model Name               : hp PhotoSmart 43x series
Orientation                     : Horizontal (normal)
X Resolution                    : 72
Y Resolution                    : 72
Resolution Unit                 : inches
Software                        : 1.400
Modify Date                     : 2021:11:16 09:04:04
Y Cb Cr Positioning             : Co-sited
Copyright                       : Copyright 2002-2003
Exposure Time                   : 1/29
F Number                        : 4.0
ISO                             : 100
Exif Version                    : 0220
Date/Time Original              : 2021:11:16 09:04:04
Create Date                     : 2021:11:16 09:04:04
Components Configuration        : Y, Cb, Cr, -
Compressed Bits Per Pixel       : 1.567552083
Shutter Speed Value             : 1/30
Aperture Value                  : 4.0
Exposure Compensation           : 0
Max Aperture Value              : 4.0
Subject Distance                : 1 m
Metering Mode                   : Average
Light Source                    : Unknown
Flash                           : Auto, Did not fire
Focal Length                    : 5.7 mm
Warning                         : [minor] Unrecognized MakerNotes
Flashpix Version                : 0100
Color Space                     : sRGB
Exif Image Width                : 640
Exif Image Height               : 480
Interoperability Index          : R98 - DCF basic file (sRGB)
Interoperability Version        : 0100
Digital Zoom Ratio              : 1
Subject Location                : 0
Compression                     : JPEG (old-style)
Thumbnail Offset                : 2046
Thumbnail Length                : 7112
Code Page                       : Unicode UTF-16, little endian
Used Extension Numbers          : 1, 31
Extension Name                  : Audio
Extension Class ID              : 10000100-6FC0-11D0-BD01-00609719A180
Extension Persistence           : Always Valid
Audio Stream                    : (Binary data 117820 bytes, use -b option to extract)
Image Width                     : 640
Image Height                    : 480
Encoding Process                : Baseline DCT, Huffman coding
Bits Per Sample                 : 8
Color Components                : 3
Y Cb Cr Sub Sampling            : YCbCr4:2:2 (2 1)
Aperture                        : 4.0
Image Size                      : 640x480
Megapixels                      : 0.307
Shutter Speed                   : 1/29
Thumbnail Image                 : (Binary data 7112 bytes, use -b option to extract)
Focal Length                    : 5.7 mm
Light Value                     : 8.9

Ohh, what do we have here? Exiftool mentions an audio stream. An audio stream inside the JPEG? How is this possible? The Flashpix format was originally developed by Kodak in which collaborated with HP. This was later added to the EXIF specifications. Below is an screenshot from the Exif Version 2.2 spec.

Exiftool mentioned Flashpix and additional APP2 segments. Lets take a look at the raw file in a hex editor.

Ahhh….. In one of the App2 segments we can see something familiar. A RIFF WAVE header! Lets see if we can extract the WAVE file.

exiftool -b -AudioStream IM000959.JPG > IM000959.WAV

mediainfo IM000959.WAV
Complete name                            : IM000959.WAV
Format                                   : Wave
Format settings                          : WaveFormatEx
File size                                : 115 KiB
Duration                                 : 10 s 681 ms
Overall bit rate mode                    : Constant
Overall bit rate                         : 88.2 kb/s

Format                                   : ADPCM
Codec ID                                 : 11
Codec ID/Hint                            : Intel
Duration                                 : 10 s 681 ms
Bit rate mode                            : Constant
Bit rate                                 : 88.2 kb/s
Channel(s)                               : 1 channel
Sampling rate                            : 22.05 kHz
Bit depth                                : 4 bits
Stream size                              : 115 KiB (100%)

MediaInfo can give us details on the embedded WAVE file, which is pretty terrible quality but is a PCM audio stream.

Embedded audio inside a raster image is not common. Most software which can render a JPEG image will most likely ignore the embedded WAVE and not even give a warning it exists. IM000959.JPG opens fine in Adobe Photoshop, but saving to a new format or making any edits will delete the WAVE file. Imagemagick also will remove the WAVE with any editing with no warning.

In order to ensure the embedded audio stream is preserved we first need to know it is there, this is where tools like exiftool can be used to extract metadata from the file and the image can be associated with having an audio stream and handled differently than any other JPEG file. More work is needed, Exiftool may mention an Audio Stream, but currently does not have the ability to pull any data from the stream.

Leave a Reply

Your email address will not be published. Required fields are marked *