read file bytewise and find information

i am a hobby php-dev and have the following problem: i have a rar-file, whose header is damaged, therefor extracting is not fully possible. the contents are non-compressed, and can be read with eg. a hex-editor. the archive contains jpeg-files, but because of the damaged header, some of them are not extractable in winrar. i have tried repairing, but a good portion of the file remains broken. i want to read the file in, and look for the bytes that indicate start and end of a jpeg (from what i know they are FFD8FF for start and FFD9FF for end). using a hex-editor i have managed to find some of those bytes, and also extracting the image into a file and view it does work. since the file is 500mb, i want to do this automatically, and since i am php-friendly, i would like to do it there :-) i know how to read a file bytewise (fread), what i am having difficulties is, parsing the file correctly in hex, so i can identify the starts and ends. what i am thinking about is something like this (in pseudocode): while( READ FILE UNTIL EOF ){ if( CURRENTBYTES == FFD8FF ){ $jpeg_file = READ FILE UNTIL CURRENTBYTES == FFD9FF fwrite($jpeg_file, "xyz.jpg"); // return to while, looking for next FFD8FF } } could someone give me a hint, how the reading and identifying would be done best? the main question is, how can i read a file until a start-byte, then save the following, then look for the next start-byte? efficiency, security, code beauty are no concern, i just want all the pics :-) many thanx for any helps

Apr 12, 2025 - 05:22
 0
read file bytewise and find information

i am a hobby php-dev and have the following problem:

i have a rar-file, whose header is damaged, therefor extracting is not fully possible. the contents are non-compressed, and can be read with eg. a hex-editor. the archive contains jpeg-files, but because of the damaged header, some of them are not extractable in winrar. i have tried repairing, but a good portion of the file remains broken.

i want to read the file in, and look for the bytes that indicate start and end of a jpeg (from what i know they are FFD8FF for start and FFD9FF for end). using a hex-editor i have managed to find some of those bytes, and also extracting the image into a file and view it does work. since the file is 500mb, i want to do this automatically, and since i am php-friendly, i would like to do it there :-)

i know how to read a file bytewise (fread), what i am having difficulties is, parsing the file correctly in hex, so i can identify the starts and ends. what i am thinking about is something like this (in pseudocode):

while( READ FILE UNTIL EOF ){
    if( CURRENTBYTES == FFD8FF ){
        $jpeg_file = READ FILE UNTIL CURRENTBYTES == FFD9FF
        fwrite($jpeg_file, "xyz.jpg");
        // return to while, looking for next FFD8FF
    }
}

could someone give me a hint, how the reading and identifying would be done best? the main question is, how can i read a file until a start-byte, then save the following, then look for the next start-byte? efficiency, security, code beauty are no concern, i just want all the pics :-)

many thanx for any helps