LSB for Image

A method to hide the data in pixel bits.

This article covers image steganography using the Least Significant Bit (LSB) method.

You know the core of everything a computer handles - be it text, image, audio, video — it's all binary, a stream of 0s and 1s. This isnt just a convenience, it's fundamental to how computers work at the electrical level.

When it comes to images or audio files, how does the machine determine the file type?

The machine doesn't understand data the way humans do—it simply follows protocols we have defined:

For a PNG image or wav file, it starts with a binary header that tells the OS or software or a tool, "Hey I am an image" or "I am an audio", Essentially files itself carry their own identity.

Image Steganography

LSB, which stands for Least Significant Bit, is widely used for image steganography. This method changes the last few bits of a byte in a pixel of an image to hide the data. Obviously, the more bits you use in a single byte to hide the data, the more the image deteriorate, increases the chance that a person can see the difference between the original and stego image. Therefore, we will work with only the last bit of the each byte. Now there are tons of modes of an image if you haven't heard of , simply put a pixel can contain 1 or more than 1 type of color. For example:

RGB - Red, Green, Blue - 3 Bytes/ 24 bits
RGBA - Red, Green, Blue, Alpha (transparency) - 4 Bytes/ 32 bits
CMYK - Cyan, Magenta, Yellow, Black - 4 Bytes/ 32 bits

These are just examples of image modes—there are obviously more types. Every mode contains specific number of bytes in a pixel. A Byte contains 8 bits and we will embed our data in the last bit of each byte.

This is how a byte looks like, the binary you see in this byte represents the data that computers read. As mentioned, a byte contains 8 bits, we will replace the highlighted bit with our data's bit. We will use RGB mode because it has 3 colors channels per pixel, which gives us 3 bytes or 24 bits. We can replace the last bit of every byte in a single pixel to hide our data.

As shown in the image above, each pixel in RGB mode of an image is look like this. Each block represent the color value of its respective channel. The channels are R, G and B, which stands for Red, Green and Blue.

Each byte carries the total value from 0 to 255. Changing the last bit in red, green and blue channels from 11111111 to 11111110 only changes the color value from 255 to 254. To the naked eye, this creates a nearly imperceptible change in color while still allowing us to encode data inside of the image.

The LSB method works well for media files where slightly changing byte value creates only imperceptible changes to the file. However, for text using ASCII value, this approach is problematic because one character is equal to one byte (8 bits). If a single bit changes, it could completely alter the character.

It also affect the color intensity, even if it's tiny. The more bits replaced , the more storage capacity becomes available, and larger the image, the more data can be stored in the image. However, more bits that are replaced, the more image will deteriorate.

We will use only PNG format in RGB mode as i have mentioned before and apply the LSB method to hide any type of file.

Why only PNG?

PNG uses lossless compression, meaning no pixel data is discarded during saving. In case of JPEG/JPG format, it may discard pixels during saving to reduce file size which would corrupt our hidden data.

PNG supports true color channels, which gives us full control over individual RGB values. The control is everything in our case for bit-level manipulation.

PNG supports 8 bits per channel. In our case of RGB mode, this totals 24 bits per pixel, providing us more room to embed data.

Let's begin implementing the code.

Before starting, ensure you have:

Read our AES Encryption guide.
The aes.py file from the AES encryption page

Install dependencies first:

pip install pillow pycryptodome

we will need an Image file as cover file, a secret file to hide, and a key.

The key serves as a password for encryption and pixel randomization which is crucial for making our tool more secure. Without this key, it becomes impossible for anyone to extract that hidden data from image.

This workflow diagram shows how our tool operates.

from PIL import Image
from aes import encryption, decryption, to_seed
import random

def embed(cover_path, payload_path, key):

    img = Image.open(cover_path)
    mode = img.mode

    if mode != 'RGB':
        img = img.convert('RGB')

    pixels = list(img.getdata())

    with open(payload_path, 'rb') as f:
        payload = f.read()

    payload = encryption(payload, key)

We import pillow module which we installed earlier. Pillow is a big library for image that can create image and manipulate images, but we specificially need Image function. The "aes" module is a custom script we will discussed earlier in AES encryption section. We also import the random module for value randomization.

Next, we creates embed() function with 3 arguments: cover_path for the cover file, payload_path for the secret file and a key, as shown in the work flow.

First, we open the cover file using Image function and check the mode of the image with "mode = img.mode". The mode could be RGB, RGBA, CMYK, etc. So we add if condition if mode is not in RGB, we will convert any non-RGB mode to RGB as explained above.

Next, we list all the pixels from image in RGB mode, if you are thinking how pixels looks like here is a tiny result of debugging :

[(114, 113, 113), (189, 188, 189), (255, 253, 254), (254, 254, 254), (254, 254, 254), (254, 254, 253), (254, 254, 254)]

Each tuple represents a pixel which has the value of R, G and B.

Now lets prepare our secret file. We open payload_path (secret file) in 'rb' mode, meaning we read the secret file as bytes. Next, we encrypt the bytes of secret file's byte using custom encryption function. The imported aes module was my customized script from AES Encryption.

Now you understand what "payload = encryption(payload, key)" does. It encrypts our secret file's byte using AES encryption from aes module. Lets move on to the next stage.

Continuing from "payload = encryption(payload, key)":

        payload = encryption(payload, key)

        starting = b'###START###'
        ending = b'###END###'

        length = len(payload).to_bytes(4, 'big')
        data = starting + length + payload + ending

        bits = ''.join(f'{byte:08b}' for byte in data)

        max_bits = len(pixels) * 3 ### 3 color channels per pixel
        if len(bits) > max_bits:
            raise ValueError('Payload too large to embed in cover image.')

        seed = to_seed(key)
        prng = random.Random(seed)
        indexes = list(range(len(pixels)))
        prng.shuffle(indexes)
        new_pixels = [None] * len(pixels)

The starting and ending variables are markers which will be attached before and after the encrypted payload. Length plays an important role by embedding the total length of the payload, telling the extractor how long the payload is. The "to_byte()" method converts the length into a 4 byte binary representation and the "big" argument means big-endian byte order (most significant byte comes first).

Now we have data, which is the combination of all pieces we prepared to embed in the Image. We convert this entire data into a single string of bits(i.e., 0s and 1s) with 8 bits per byte.

Next, we add an if condition to check if the payload can be embedded in the cover image.

Here, comes the most important part of this tool: 'randomization'. what are we doing here? First, we get a seed from to_seed() function as described earlier, then use that seed in random.Random() function to create pseudo-random generator (PRNG) object. If the same key is used , the same seed will generates the same random sequence. This ensures consistent pixel order for both embedding and extraction.

Next. we create a list of pixel indexes, not pixel values, pixel "indexes". Then we shuffle these indexes. Why shuffle them? It is to prevent predictable embedding. It randomizes the pixel positions to hide patterns which enhance security and makes it impossible to extract without the key. Now we create a list of new pixels with same length as the original pixels, the list is initially empty, we will fill it and new_pixels will eventually become the new image data (stego image).

We have reached to the last section of our tool where we will replace the last bit of every byte and embed our bits one by one.

        bit_idx = 0
        
        for i in range(len(indexes)):
            pixel_idx = indexes[i]  
            r, g, b = pixels[pixel_idx]
        
            if bit_idx < len(bits):
                r = (r & ~1) | int(bits[bit_idx])
                bit_idx += 1
            if bit_idx < len(bits):
                g = (g & ~1) | int(bits[bit_idx])
                bit_idx += 1
            if bit_idx < len(bits):
                b = (b & ~1) | int(bits[bit_idx])
                bit_idx += 1

            new_pixels[pixel_idx] = (r, g, b)


        img.putdata(new_pixels)
        file = img.save("steg_file.png")

        return file

Here is a for loop that starts with the list of shuffled pixel indexes which were pseudo randomly generated based on the key the user provided. pixel_idx gets the actual pixel position from the shuffled order, not sequentially which makes it unpredictable.

we extract the RGB values in r,g and b from the original pixels, then start the actual embedding, i have set bit_idx = 0 when bit_idx get equal to len(bits) the loop breaks. For red channel, (r & ~1) means we clear the last significant bit (LSB) of r channel. Then | int(bits[bit_idx]) sets the LSB to the current bit from our secret data. Finally, bit_idx += 1 , moves to the next bit. This process repeats for green and blue channel untill bit_idx gets equal to len(bits).

new_pixels[pixel_idx] = (r, g, b)

This final piece of code in loop writes those newly modified RGB tuple to the new_pixels list at the same pixel position (pixel_idx), then we put all the data of new pixel in image and save it. That's it for Embedding.

now let's move on to Extraction process now:

def extract(stego_path, key):

        img = Image.open(stego_path)
        pixels = list(img.getdata())
        
        seed = to_seed(key)
        prng = random.Random(seed)
        indexes = list(range(len(pixels)))
        prng.shuffle(indexes)

        bits = ''
        for i in range(len(pixels)):
            pixel_idx = indexes[i]  
            r, g, b = pixels[pixel_idx]
            bits += str(r & 1)
            bits += str(g & 1)
            bits += str(b & 1)

Import the same module if you are creating separate file for extraction or you can just add this new function after embed function.

As you can see we are following the workflow. This extract function takes 2 arguments: stego_path for the stego file and the key used to embed the secret file. Next, we fetch the seed using our key , open the image and list the data of image which are pixels. We create pseudo random number generator object since the key is same, the pseudo random object will be the same. If the key is even slightly different, the object wont match and will produce the error when trying to extract the secret file. The process for randomizing positions of pixel is the same, but randomization depends on the key. Unless the correct key is entered, the randomization will pick the pixel in different order when extracting the last bit. If the correct key is used, the randomization order will match the embedding process, so the data could be extracted in that order and reconstruct the secret file.

Let's discuss this loop, the bits variable starts empty. The loop runs from 0 to len(pixels) range, then we set the randomized indexes on original pixel and extracts the value of r, g and b from stego image's pixels. Then we collect the last bit of every r, g and b channel and append it to bits variable , which will have the string of bits eventually. Easy to extract, right?

Now we have the payload (embedded secret file) in bits form. We need to convert it to bytes and then extract the actual payload. Wait, what do I mean by actual payload? Remember the markers we added? We need to find them in the extracted bits first, then separate them from the payload to recover our original secret file.

        if len(bits) % 8 != 0:
            bits = bits[:-(len(bits) % 8)]  # Remove incomplete byte
        
        bytes_list = bytes([int(bits[i:i+8], 2) for i in range(0, len(bits), 8)])
        
        starting = b'###START###'
        ending = b'###END###'

        start_point = bytes_list.find(starting)
        if start_point == -1:
            raise ValueError("Starting point of the payload not found in image or key is incorrect")
        
        data_start = start_point + len(starting)

        if len(bytes_list) < data_start + 4:
            raise ValueError("Not enough data to extract length")
        
        length = int.from_bytes(bytes_list[data_start:data_start + 4], 'big')

        payload_start = data_start + 4
        payload_end = payload_start + length    
    
        if len(bytes_list) < payload_end:
            raise ValueError(f"Not enough data to extract payload of length {length}")
        
        payload = bytes_list[payload_start:payload_end]

        ### verify the ending 
        end_start = payload_end
        end_end = end_start + len(ending)

        if len(bytes_list) < end_end:
            raise ValueError(f"Not enough data to verify the ending.")
        
        extract_end = bytes_list[end_start:end_end]
        if extract_end != ending:
            raise ValueError(f"Ending not found or corrupted")
        
        ### decryption
        payload = decryption(payload, key)
            
        extension, mime_type = detect_file_type(payload)
        out_path = f"extracted_file.{extension}"

        with open(out_path, 'wb') as f:
            f.write(payload)

        return out_path

Now we actually need some error handling here so that we can get the actual data of our secret file.

First, we create bytes from bits, since each byte contains 8 bits, we divide our string of bits into 8 bits of pieces, if there are extra bit or incomplete bytes, we remove them. Using a for loop we divide all bits to create a list of bytes. Again, we set markers in our extraction function which will help us to know from where our payload starts. We look for the starting marker first in our list of bytes, if the starter marker isn't there, either the key is incorrect or there was no payload to begin with. If we find the starter marker, we set a variable to point from where the data starts, which comes after the bytes of start marker. Remember, we embedded a 4 byte length value which indicates how long our secret file data is. We extract the value of length by converting the bytes which carries the length information.

Now that we know the length, we set the payload start and end variable which holds the actual payload of secret file's byte. We could directly decrypt here and get the original file but we should also verify the end marker. If there is no end marker in the list of bytes then the data is corrupted. After verifying the end marker, we can decrypt our payload using tje decryption() function which i explained earlier and write those decrypted bytes in a file to save it, Yeah thats it. It's done we successfully embedded the secret file in an image and extracted it from stego image. Note: The function detect_file_type() which I created to detect the file type. We can recover the original secret file, but without determining it's extension, it will always appear as an unknown file to the OS. However. it still has metadata of it's original file, so i read those metadata to detect the extension and return the oirginal file with its proper extension.

To use this tool, check my repository:

GitHub - kaizoku73/ShadowBits: A powerful steganography tool that allows you to hide files within images and audio files using LSB (Least Significant Bit) techniques.GitHub

I named it ShadowBIts because we hide secret data in shadows. This tool also includes audio steganography using LSB method, I have also added better error handling here. Clone the repository to use it.

Thanks for reading.

PreviousAES Encryption NextLSB for audio

Last updated 4 months ago