LSB for audio

A method to hide data in audio frame bits

This article covers audio steganography using Least Significant Bits (LSB) method.

Audio steganography

In images we had pixels where i divide them into 3 channels where we hid the data but when it comes to audio , there are no pixels here, there are audio frames. Before explaining frames, let's talk about audio formats first.

There are many types of formats of audio such as, MP3, AAC, WAV, FLAC, OGG, AIFF, etc But we will be working with the WAV format. Almost all type of audio steganography uses WAV format.

Why WAV?

Because it's uncompressed, lossless and easy to manipulate at the sample level without losing hidden data. Other formats, some are lossy , meaning they can loose their data when sending to someone, some are compressed meaning bit level access is not straightforward.

Here's the thing if you go to the internet you will see that there is less resource for wav audio file, like if you search for songs or tone or ringtone or any kind of audio, you will encounter them as mp3 mostly, if not mp3 then other formats but you will rarely see any audio file in WAV format. Not only this, I don't know if you have noticed or not but most AI you use daily , like chatgpt, claude, gemini or grok, etc, won't let you upload wav file. Eventually, if you want to use a wav file, you will either have to convert your audio file to wav or you can download them (but they would be random audio file).

But why are there fewer resource for wav audio file and why do Ai platforms avoids them?

Actually, WAV files are uncompressed and large. For example, a 1 minute stereo WAV file at 44.1kHz, it would be approx of 10MB. Stereo? We will talk about this later , you will understand as you read this article. Back to wav file topic, One thing is that WAV files are large. Second, since wav files are large , uploading them and process them will increase server load, latency and cost. Additionally, WAV files are used for data hiding ( that's what we doing) and instead of hiding secret file , we can also hide malware or any malicious file in audio. That's the reason there is less resource for wav files.

Here's another question: why can't we just ban or eliminate WAV?

Well, the WAV format is the standard for uncompressed audio, it provides raw PCM audio which is perfect for professional use like editing, mastering and analysis. Studios, broadcasters, researchers and engineers need uncompressed formats and WAV is bit perfect and simple. WAV has a simple structure (RIFF format) and is universally supported across Operating systems such as windows, linux or macOS. Furthermore, wav is also used for research and development such as signal processing , machine learning , and audio forensics.

Overall, you now understand what WAV is and why we are going to use it, let's move on to next topic. Since wav file has large size than other audio format, the larger your WAV file, the larger the amount of data you can embed.

Channels

Mono and Stereo. Although, these are the most common channel types you will see in an audio file, mono has 1 channel while stereo has 2 channels, left and right. There are more channel types you can encounter like Dolby Atmos, 5.1 Surround, 7.1 Surround, 2.1, Quadraphonic. the more channels you have, the more data you can embed in them. Now, here is the thing you can use any type of audio whether it is mono, stereo, Dolby Atmos or any type the audio channel, But the more channel you provide the more space you get to embed the data. For example, if you use mono which has 1 channel, you can embed less amount of data, if you use stereo which has 2 channel, you get more space than mono channel, and if you use Quadraphonic which has 4 channels you get more space than stereo channel to embed more data.

The quantity of data you can embed is directly depend on audio channel and audio size.

The capacity can be calculated in bytes as follow:

Mono(1 channel) : total bytes = sample_rate x duration x sample_size x 1

Stereo(2 chanenl) : total bytes = sample_rate x duration x sample_size x 2

Quadraphonic( 4 channel) : total bytes = sample_rate x duration x sample_size x 4

For Dolby Atmos, the channels can be increased proportionally from 6 to 8 channels.

Sample rates are frequency (measured in Hz) that is how many samples per second the audio stores, an audio can have like 44,100 Hz, 48,000 Hz. Duration can be varies such as 10 seconds, 30 seconds , 1 minute, etc Sample size is 16 bits or 2 bytes which is most common standard for PCM WAV files.

Frames: frames are the combination of sample size and channel

Mono = 1 channel x sample(16bits) = 16 bits

Stereo = 2 channels x sample(16bits) = 32 bits

Quadraphonic = 4 channels x sample(16bits) = 64 bits

Now, we will use LSB method to embed data in audio files.

LSB, which stands for Least Significant Bit. If you have read the previous section, you must have an idea how we are going to use LSB on audio. Instead of pixels we have frames here.

Frames, as mentioned earlier can be different depending on the channel type of an audio, the difference in quantity of frames in an audio is only thing which differ here.

Implementing the code

Before starting, make sure you have:

  • Read our AES Encryption guide.

  • The aes.py file from the AES encryption page

We will be working with inbuilt modules here, so no need to install external dependencies.

We need an audio file (wav format) as cover file, a secret file to hide and a key. The key serves as a password for encryption and frame randomization which is crucial for making our tool more secure. Without this key, it becomes impossible for anyone to extract that hidden data from stego audio file.

Embedding process
Extraction process

This workflow diagram shows how our tool operates.

Embedding

Let's prepare our payload first — we import wave module (inbuilt module) which is big library for researching and analysis of an wave file. Next, we have random module for frame randomization. Finally, we have aes module (Our aes.py from AES Encryption) for encryption, decryption and generating seed.

We start with an embed() function that takes 3 arguments : cover_path for cover file, payload_path for secret file and key for encryption and randomization.

First, we open cover file using wave module and read the file in bytes as song. Then we calculate and get all the frames in bytes from cover file. We are dividing the audio in frames(bytes) here. Since we are reading frame bytes the output will appear in bytes. We have 'params' variable(it's a tuple) which holds the actual audio settings , it has 6 values only nchannels - number of channels sampwidth - sample width in bytes framerate - sample rate(Hz) nframes - total number of frames comptype - compression type (usually 'None' for PCM WAV, that's why its uncompressed) compname - Human readable compression name which is ('not compressed') for wav. The ouput of params for a wave is like :

Although, we don't need to think more about 'params' since we only extract this to copy the same setting when we create stego file, so both original and stego file have no difference.

Next, we open secret file and read it in bytes and assign it to payload variable. Now, we have the payload bytes we can encrypt this so no one can read it without the key. The encryption function takes 2 arguments: payload and the key. The imported aes module was my customized script from AES Encryption. It encrypts our secret file's byte using AES encryption from aes module.

Now, we have to set markers so the extractor can know from where our payload starts and where does it ends. Since the markers are hardcoded and fixed , the extractor will be able to tell if the stego file has the payload or is it corrupted. The starting and ending marker will be attached before and after the encrypted payload. We have the full payload now which is starting marker + encrypted payload + ending marker.

Next, we have a for loop where i convert each byte of full_payload into 8 bits. Then we check the capacity of the audio file, If the payload size larger than cover audio file, it will throw an error saying payload is too large.

Moving on last phase of code:

Since we have frames bytes which are itself a capacity, we can check if our payload can be embedded in that cover file. If payload is too large, it will throw an error.

We finally arrived at the most crucial part of the code, the randomization of frame bytes so we can hide our data without much worry. First we use to_seed() function which takes one argument : the key which i have described earlier in the AES Encryption section. Then we use that seed in random.Random() function to create pseudo-random object. If the same key is used, the same seed will generate the same pseudo random sequence. This ensure consistent frame byte order for both embedding and extraction.

In short, instead of embedding bits in a predictable order like 0,1,2,3 indexes, we embed them in a key dependent random order, making extraction impossible without the correct key. For example:

Now, the main part of our tool where we actually hide secret file's data in audio frame's data. So here is the for loop where i mentioned bit_idx (bit index) and bit which gets the value from enumerate(payload_bits). Payload bits is the list of bits which is made up of 0s and 1s, when you use enumerate function on it, you get two things out, first the bit index and second that bit. That what we are extracting from enumerate(payload_bits) then we get the shuffled position from 'i = indexes[bit_idx]' , next we have 'frame_bytes[i] = (frame_bytes[i] & 0xFE) | bit' this line. Here we are replacing the last bit of the frame byte from the shuffled position with our secret file's bit. Each loop iteration hides one payload bit inside one shuffled frame position. This loops will run untill our all secret file's bit fully embedded.

Next, we open a new file using wave.open function and write the stego bytes we get after the loop and set 'params' the original audio settings and return the stego audio. That's all , we finally succedded in hiding the secret file in an audio file.

Extracting

Let's start implementing extraction function. If you want to create separate file then make sure to import the same modules you imported during embedding function or you can create this function after embedding function.

Extract function takes 2 arguments: stego_path as stego file and the key for decryption and for the same sequence order when you embed the secret file. Using wave module, open the stego_path and read it in bytes. Then, we extract the frame bytes as we did in embed function. Next, we generate the seed using the same key we used for embedding. The generated seed will be same as long as the key is correct. We use generated seed in random.Random function to create pseudo random sequence order, the sequence order will also be same as long as the key is correct. Even a slight change in key can create different sequence order which will lead you to get the error. The process of shuffling indexes is same as was in embed function. Eventually, we will get the same sequence order of shuffled position indexes.

Now, we have for loop to extract the last bit of every shuffled position frame byte and appending in the list of 'extracted_bits' which will have a list of last bit of every byte or you can say the list will have the original secret file's bits.

Next, we will have to convert that list of secret file's bits into bytes so we can combine it and get the actual readable secret file. So First we have a empty bytearray() function which store the reconstructed bytes then a for loop which runs from 0 to length of extracted_bits and skips every 8 bits so we can have the a perfect byte. In that for loop, there is a if condition which ensures there are at least 8 bits left to make one full byte. Then 'extracted_bits[i:i+8]' slices out exactly 8 bits for this byte, for example if i = 0 then byte will have [1,0,1,1,0,0,1,0], if we have i =8 , then byte will have [0,1,1,0,1,1,0,1]. It continues till we create all bits like this. Then there byte_value which starts at zero and build an integer from the bits. now, there is nested loop here which extract position and actual 0 or 1 from enumerate(byte_bits) function. So j carries positions and bit carries the actual bits. Next, there is this 'byte_value |= bit << (7-j)' now here is '|=' this thing, mostly people does not know but it adds the bit to the byte_value without removing existing bits. Finally, we append the byte_value in extracted_bytes.

After getting the bytes, we will have to find the starting and ending marker to make sure the payload is there and is not corrupted. The starting and ending markers are hardcoded and of fixed length as they were in embed function. Then using try and except, we try to extract the starting marker first which ensures that the payload is there or not by using find function on extracted_bytes. If it dont find the starting marker then we raise error telling the user that the file may no contain hidden data. Same thing we do with extracting the ending marker, if it fails to extract the ending marker and we already found the starting marker, it means the hidden data corrupted.

After extracting both marker we can extract the actual payload from extracted_bytes. Now all we need is to decrypt the encrypted bytes using the same key. That's it we can use those byte to create a new file use those bytes to reconstruct the original secret file. The extracted file not have an extension will tells the OS that the secret file is a pdf , image or an audio, but extracted file also carries the original metadata of the secret file, if you go to properties of that extracted file you will know what was the original extension and edit the name of the file. But people are so lazy to do that so i created a customized script to find the extension which is 'detect_file_type' which basically tries to match the original metadata and hardcoded extension in the function and extract the original extension to put it in the file name. Now we can save those bytes in the new file with original extension using wave.open function and write bytes (wb) and return the original secret file. It's done we successfully embedded the secret file in an audio and extracted it from stego audio.

To use this tool, check my repository:

I named it ShadowBIts because we hide secret data in shadows. This tool also includes image steganography using LSB method, I have also added better error handling here and the detect_file_type script to get the original extension. Clone the repository to use it.

Thanks.

Last updated