sed - Extract data between two matched patterns in a binary file containing non-ASCII characters using bash -


i trying extract jpeg image binary text file. want extract data between 0xff 0xd8 (start of image) , 0xff 0xd9 (end of image) inclusive. earlier, have run following command desired image.jpg single paragraph file received.txt:

sed 's/.*\xff\xd8/\xff\xd8/; s/\xff\xd9.*/\xff\xd9/' received.txt > image.jpg 

but when tried run same operation on different file, didn't work. tried using

sed -n '/\xff\xd8/,/\xff\xd9/p' received.txt > temp.txt sed 's/.*\xff\xd8/\xff\xd8/; s/\xff\xd9.*/\xff\xd9/' temp.txt > image.jpg 

to remove lines before or after matched lines got no success.

although file large, pasted hex dump of relevant portion below:

0a 55 57 5d 50 cf ff d8 ff fe ff ff ff d9 df 47 fe e7 c9 3b e9 9b 6b 55 c4 57 9b 98 73 fd 15 f7 77 7e f7 95 dd 55 f7 55 05 cc 55 97 55 dd 62 d1 1f 51 ef f1 ef fb e9 bf ed 5f bf f2 9d 75 af fe 6b fb bf 8f f7 f7 7e ff d3 bf 8e d5 5f df 57 75 fe 77 7b bf d7 af df 5d fb 0a 47 de d5 ff c1 23 9b 20 08 20 65 3c 06 83 11 05 30 50 a0 20 55 20 84 41 04 c2 59 50 89 64 44 44 10 05 20 87 28 1d a9 

the hex dump of desired output in case is:

ff d8 ff fe ff ff ff d9 

update

while trying resolve issue, found sed command removes characters before or after matched pattern upto non-ascii character (0x80 - 0xff) not go beyond non-ascii character. example, if try:

echo 55 57 5d 50 cf 50 65 7f ff d8 ff fe ff ff ff d9 | xxd -r -p | sed 's/.*\xff\xd8/\xff\xd8/' > output 

the hex dump of output can seen as:

xxd output 

which is:

55 57 5d 50 cf ff d8 ff fe ff ff ff d9 

as can seen, characters between non-ascii character , matched pattern removed characters before non-ascii character not.


alternative solution (not perfect)

i used following commands resolve problem:

sed 's/\xff\xd8/\x0a\xff\xd8/; s/\xff\xd9/\xff\xd9\x0a/' received.txt > temp.txt 

then run following command (which work if there no new line character (0x0a) somewhere between 0xff 0xd8 , 0xff 0xd9):

sed -n '/\xff\xd8/{/\xff\xd9/p}' temp.txt > image.jpg 

but if image.jpg file empty (after execution of above command), run following command:

sed -n '/\xff\xd8/,/\xff\xd9/p' temp.txt > image.jpg 

these commands desired job except puts 0x0a @ end of image.jpg file (i.e., after 0xff 0xd9). in case, did not create issue jpeg file automatically discards data after 0xff 0xd9 marker.

i stuck @ implementation of 'if image file empty' condition when @chaos came perfect solution. so, following solution. lot @chaos!

please follow link below chaos solution! https://unix.stackexchange.com/questions/231289/extract-data-between-two-matched-patterns-in-a-binary-file


notes:

here how can actual data hex dump can pipe sed command:

echo 0a 55 57 5d 50 cf ff d8 ff fe ff ff ff d9 df 47 fe e7 c9 3b e9 9b 6b 55 c4 57 9b 98 73 fd 15 f7 77 7e f7 95 dd 55 f7 55 05 cc 55 97 55 dd 62 d1 1f 51 ef f1 ef fb e9 bf ed 5f bf f2 9d 75 af fe 6b fb bf 8f f7 f7 7e ff d3 bf 8e d5 5f df 57 75 fe 77 7b bf d7 af df 5d fb 0a 47 de d5 ff c1 23 9b 20 08 20 65 3c 06 83 11 05 30 50 a0 20 55 20 84 41 04 c2 59 50 89 64 44 44 10 05 20 87 28 1d a9 | xxd -r -p 

and can see hex dump of file by:

xxd file.txt 


Comments

Popular posts from this blog

java - Date formats difference between yyyy-MM-dd'T'HH:mm:ss and yyyy-MM-dd'T'HH:mm:ssXXX -

c# - Get rid of xmlns attribute when adding node to existing xml -