file/type list/libmagic overhaul

Christos Zoulas christos at zoulas.com
Wed Aug 20 11:02:53 EEST 2008


On Aug 19,  9:04am, filemaillist at adaptivetime.com (Gravis) wrote:
-- Subject: Re: file/type list/libmagic overhaul

| > That is a good idea, although you'll need a lot of manual fixes because
| > people usually give weak magic descriptions that match ~everything. I
| > would also suggest that people submit sample files so that we can write
| > unit-tests.
| 
| i agree that people will give weak descriptions which is why i think an
| online "bombardment test" script would be a good idea.  basically, a
| repository of known format files that the submitted magic description
| would be tested against.  of course they will need to upload a file they
| are looking to identify.  assuming they get passed a basic test that it
| doesnt misidentify files, it would be tested on a larger more extensive
| collection of files.  any way it goes, there needs to be a way of
| getting formats added to the list.

Right. This is why we need to start collecting samples, so that we can
perform such tests.

| > Well, file and the magic format specification has a POSIX definition. Most
| > commercial and non commercial OS's use this implementation of file and I
| > doubt it that they would appreciate a change in the magic format.
| 
| hmm... i didnt realize it was posix.  do you have a copy of the spec i
| could have?  from what i can tell the only way to get a copy from IEEE
| is to be a member of IEEE, which i am not.  if speed truly isnt an issue
| here, i guess changing the file format is a moot point.

most posix stuff is free now (except the compliance tests):

http://www.opengroup.org/onlinepubs/009695399/utilities/file.html

| anyway, my main concern is to get more formats added because not having
| basic file magic descriptions like one for PNG is just ridiculous.

I don't know why you say this; here's the current entry for PNG from
images:

# PNG [Portable Network Graphics, or "PNG's Not GIF"] images
# (Greg Roelofs, newt at uchicago.edu)
# (Albert Cahalan, acahalan at cs.uml.edu)
#
# 137 P N G \r \n ^Z \n [4-byte length] H E A D [HEAD data] [HEAD crc] ...
#
0       string          \x89PNG\x0d\x0a\x1a\x0a         PNG image
!:mime  image/png
>>16    belong          x               \b, %ld x
>>20    belong          x               %ld,
>>24    byte            x               %d-bit
>>25    byte            0               grayscale,
>>25    byte            2               \b/color RGB,
>>25    byte            3               colormap,
>>25    byte            4               gray+alpha,
>>25    byte            6               \b/color RGBA,
#>>26   byte            0               deflate/32K,
>>28    byte            0               non-interlaced
>>28    byte            1               interlaced


christos



More information about the File mailing list