file/type list/libmagic overhaul
Christos Zoulas
christos at zoulas.com
Wed Aug 20 11:02:53 EEST 2008
On Aug 19, 9:04am, filemaillist at adaptivetime.com (Gravis) wrote:
-- Subject: Re: file/type list/libmagic overhaul
| > That is a good idea, although you'll need a lot of manual fixes because
| > people usually give weak magic descriptions that match ~everything. I
| > would also suggest that people submit sample files so that we can write
| > unit-tests.
|
| i agree that people will give weak descriptions which is why i think an
| online "bombardment test" script would be a good idea. basically, a
| repository of known format files that the submitted magic description
| would be tested against. of course they will need to upload a file they
| are looking to identify. assuming they get passed a basic test that it
| doesnt misidentify files, it would be tested on a larger more extensive
| collection of files. any way it goes, there needs to be a way of
| getting formats added to the list.
Right. This is why we need to start collecting samples, so that we can
perform such tests.
| > Well, file and the magic format specification has a POSIX definition. Most
| > commercial and non commercial OS's use this implementation of file and I
| > doubt it that they would appreciate a change in the magic format.
|
| hmm... i didnt realize it was posix. do you have a copy of the spec i
| could have? from what i can tell the only way to get a copy from IEEE
| is to be a member of IEEE, which i am not. if speed truly isnt an issue
| here, i guess changing the file format is a moot point.
most posix stuff is free now (except the compliance tests):
http://www.opengroup.org/onlinepubs/009695399/utilities/file.html
| anyway, my main concern is to get more formats added because not having
| basic file magic descriptions like one for PNG is just ridiculous.
I don't know why you say this; here's the current entry for PNG from
images:
# PNG [Portable Network Graphics, or "PNG's Not GIF"] images
# (Greg Roelofs, newt at uchicago.edu)
# (Albert Cahalan, acahalan at cs.uml.edu)
#
# 137 P N G \r \n ^Z \n [4-byte length] H E A D [HEAD data] [HEAD crc] ...
#
0 string \x89PNG\x0d\x0a\x1a\x0a PNG image
!:mime image/png
>>16 belong x \b, %ld x
>>20 belong x %ld,
>>24 byte x %d-bit
>>25 byte 0 grayscale,
>>25 byte 2 \b/color RGB,
>>25 byte 3 colormap,
>>25 byte 4 gray+alpha,
>>25 byte 6 \b/color RGBA,
#>>26 byte 0 deflate/32K,
>>28 byte 0 non-interlaced
>>28 byte 1 interlaced
christos
More information about the File
mailing list