From file-libmagic at uukgoblin.net Wed Jan 7 13:29:03 2009 From: file-libmagic at uukgoblin.net (Robert G) Date: Wed, 7 Jan 2009 11:29:03 +0000 Subject: [PATCH] fix memleak in libmagic in file 4.26 Message-ID: <20090107112903.GA32140@robert.goliasz@uk.clara.net> libmagic is leaking memory if someone calls magic_buffer() (magic_file() will probably cause it too) multiple times with one magic_set cookie. This small patch should fix it. -- Robert G -------------- next part -------------- diff --git a/src/funcs.c b/src/funcs.c index af98605..280c715 100644 --- a/src/funcs.c +++ b/src/funcs.c @@ -235,6 +235,7 @@ file_reset(struct magic_set *ms) file_error(ms, 0, "no magic files loaded"); return -1; } + free(ms->o.buf); ms->o.buf = NULL; ms->haderr = 0; ms->error = -1; From dnovotny at redhat.com Mon Jan 12 14:10:19 2009 From: dnovotny at redhat.com (Daniel Novotny) Date: Mon, 12 Jan 2009 07:10:19 -0500 (EST) Subject: BTRFS filesystem magic entry In-Reply-To: <320116548.113081231762161779.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com> Message-ID: <2090521046.113201231762219065.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com> hello, I got this from Eric Sandeen: # BTRFS 0x10040 string _BHRfS_M BTRFS Filesystem >0x1012b string >\0 (label "%s", >0x10090 lelong x sectorsize %d, >0x10094 lelong x nodesize %d, >0x10098 lelong x leafsize %d) sending also in patch form have a nice day, Daniel Novotny -------------- next part -------------- A non-text attachment was scrubbed... Name: file-4.26-btrfs.patch Type: text/x-patch Size: 851 bytes Desc: not available URL: From obbad at msn.com Mon Jan 12 20:01:55 2009 From: obbad at msn.com (M. obbad) Date: Mon, 12 Jan 2009 18:01:55 +0000 Subject: Micorsoft 2007 documents ? In-Reply-To: <2090521046.113201231762219065.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com> References: <320116548.113081231762161779.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com> <2090521046.113201231762219065.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com> Message-ID: I would like to get the Misorsoft 2007 libmagic numbers. Can someone send me a pointer to the pre-release magic file? Thank you, -Marc -------------- next part -------------- An HTML attachment was scrubbed... URL: From christos at zoulas.com Tue Jan 27 19:28:48 2009 From: christos at zoulas.com (Christos Zoulas) Date: Tue, 27 Jan 2009 12:28:48 -0500 Subject: [PATCH] file-4.26: add wireless-regdb magic file format In-Reply-To: <20090127165416.GJ19242@tesla> from "Luis R. Rodriguez" (Jan 27, 8:54am) Message-ID: <20090127172848.5A6A55654E@rebar.astron.com> On Jan 27, 8:54am, lrodriguez at atheros.com ("Luis R. Rodriguez") wrote: -- Subject: [PATCH] file-4.26: add wireless-regdb magic file format | Christos, this patch adds the wireless-regdb magic file format | to the file Magdir. We use this binary file format in Linux for our | wireless regulatory database parsed by CRDA. If there are any questions | please feel free to puke us on the linux-wireless list. | | Do you have plans on releasing a file-4.27 any time soon? | | Thanks to Johannes for coming up with this. | | Luis | | --- /dev/null 2009-01-18 12:51:54.000000000 -0800 | +++ file-4.26/magic/Magdir/wireless-regdb 2009-01-27 08:40:43.000000000 -0800 | @@ -0,0 +1,6 @@ | +# wireless-regdb: file(1) magic for CRDA wireless-regdb file format | +# | +# CRDA Regulatory database file | +# http://wireless.kernel.org/en/developers/Regulatory | +0 belong 0x52474442 | +>4 belong 19 (Version 1) Isn't that belong really: 0 string RGDB CRDA wireless regulatory database file >4 belong 19 (Version 1) ??? christos From dnovotny at redhat.com Mon Feb 2 16:47:58 2009 From: dnovotny at redhat.com (Daniel Novotny) Date: Mon, 2 Feb 2009 09:47:58 -0500 (EST) Subject: [patch] JPC (JPEG-2000 Code Stream Bitmap) magic In-Reply-To: <665844726.112021233586010877.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> Message-ID: <1876370459.112041233586078657.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> hello, this patch adds magic for JPEG-2000 Code Stream Bitmap files regards, Daniel Novotny -------------- next part -------------- A non-text attachment was scrubbed... Name: file-4.26-jpeg2000.patch Type: text/x-patch Size: 582 bytes Desc: not available URL: From christos at zoulas.com Wed Feb 4 01:36:56 2009 From: christos at zoulas.com (Christos Zoulas) Date: Tue, 3 Feb 2009 18:36:56 -0500 Subject: file-5.00 is now available Message-ID: <20090203233656.70EE35654E@rebar.astron.com> From: ftp://ftp.astron.com:/pub/file/file-5.00.tar.gz Thanks to everyone who tested, wrote code, or contributed fixes! I bumped the version to 5.00 because there were a lot of changes in the code that might affect how file behaves now compared to in the past (I have not found any regressions, but there might be). - add CDF file support (Microsoft Office <= 2007) - add recursive magic, so we can handle ID3 files properly - str{cat,cpy} -> strl{cpy,cat} - --mime* flags now really work - add --apple magic - many encoding fixes - read ~/.magic in addition to the default magic file not instead of, as documented in the man page - many magic changes christos From colin at e-e.com Wed Feb 4 21:41:32 2009 From: colin at e-e.com (Colin Bartolome) Date: Wed, 04 Feb 2009 11:41:32 -0800 Subject: file-5.00 is now available In-Reply-To: References: Message-ID: <4989EF6C.3050800@e-e.com> Are the magic rules that detect Office 2007 files not part of this version? My Office 2007 files are still being reported as ZIP files. From christos at zoulas.com Thu Feb 5 00:57:35 2009 From: christos at zoulas.com (Christos Zoulas) Date: Wed, 4 Feb 2009 17:57:35 -0500 Subject: file-5.00 is now available In-Reply-To: <4989EF6C.3050800@e-e.com> from Colin Bartolome (Feb 4, 11:41am) Message-ID: <20090204225735.D5D485654E@rebar.astron.com> On Feb 4, 11:41am, colin at e-e.com (Colin Bartolome) wrote: -- Subject: Re: file-5.00 is now available | Are the magic rules that detect Office 2007 files not part of this | version? My Office 2007 files are still being reported as ZIP files. That should be < 2007 in the doc. the xml ones are still parsed as zip files. Next version... christos From christos at zoulas.com Wed Feb 11 16:17:02 2009 From: christos at zoulas.com (Christos Zoulas) Date: Wed, 11 Feb 2009 09:17:02 -0500 Subject: file-5.00 fails to compile on IRIX In-Reply-To: <4992D170.9020405@openobjects.com> from Stuart Shelton (Feb 11, 1:24pm) Message-ID: <20090211141702.9FC6B5654E@rebar.astron.com> On Feb 11, 1:24pm, stuart at openobjects.com (Stuart Shelton) wrote: -- Subject: file-5.00 fails to compile on IRIX | | I notice during the configure stage that the following is output: | | checking for gcc compiler warnings... ./configure: line 23935: test: =: | unary operator expected | yes | | ... for which nothing out of the ordinary is logged in config.log. | | (Note that on IRIX, the -Wxxx compiler options are accepted but have no | effect, and the strndup declaration is in not , so | this is misdetected as not being available) I guess you can #undef HAVE_GETOPT_H in the generated config.h to compile it. I will try to add an autoconf test for it. christos From christos at zoulas.com Wed Feb 11 19:45:36 2009 From: christos at zoulas.com (Christos Zoulas) Date: Wed, 11 Feb 2009 12:45:36 -0500 Subject: file-5.00 fails to compile on IRIX In-Reply-To: <4992EFD3.3010606@openobjects.com> from Stuart Shelton (Feb 11, 3:33pm) Message-ID: <20090211174536.10F405654E@rebar.astron.com> On Feb 11, 3:33pm, stuart at openobjects.com (Stuart Shelton) wrote: -- Subject: Re: file-5.00 fails to compile on IRIX | | Hi Christos, | | With HAVE_GETOPT_H unset it certainly gets further... the build now | fails with: | | Making all in magic | make[2]: Entering directory `/opt/gnu/var/tmp/build/file-5.00/magic' | ../src/file -C -m ../magic/Magdir | 21181373:/opt/gnu/var/tmp/build/file-5.00/src/.libs/lt-file: rld: Fatal | Error: attempted access to unresolvable symbol in | /opt/gnu/var/tmp/build/file-5.00/src/.libs/lt-file: getopt_long | make[2]: *** [magic.mgc] Error 1 | make[2]: Leaving directory `/opt/gnu/var/tmp/build/file-5.00/magic' | make[1]: *** [all-recursive] Error 1 | make[1]: Leaving directory `/opt/gnu/var/tmp/build/file-5.00' | make: *** [all] Error 2 | | ... so it looks as if, without HAVE_GETOPT_H and with HAVE_GETOPT_LONG | unset (which is correct for this platform), the code attempts to use | getopt_long() without defining it. | Is getopt_long.c compiled and linked in? christos From christos at zoulas.com Thu Feb 12 15:51:49 2009 From: christos at zoulas.com (Christos Zoulas) Date: Thu, 12 Feb 2009 08:51:49 -0500 Subject: file-5.00 fails to compile on IRIX In-Reply-To: <49940DD0.9050004@openobjects.com> from Stuart Shelton (Feb 12, 11:53am) Message-ID: <20090212135150.09DB05654E@rebar.astron.com> On Feb 12, 11:53am, stuart at openobjects.com (Stuart Shelton) wrote: -- Subject: Re: file-5.00 fails to compile on IRIX | | Hi Christos, | | getopt_long.c is being compiled, as various .o files are being written: | | $ find . -name \*getopt_long\* | ./src/getopt_long.c | ./src/.deps/getopt_long.Plo | ./src/.libs/getopt_long.o | ./src/getopt_long.o | ./src/getopt_long.lo | | Other than when it is built, the only other references to getopt_long are: | | /opt/gnu/bin/bash ../libtool --tag=CC --mode=link cc -Wall | -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith | -Wmissing-declarations -Wredundant-decls -Wnested-externs | -Wsign-compare -Wreturn-type -Wswitch -Wshadow -Wcast-qual | -Wwrite-strings -Wextra -Wunused-parameter -c99 -O2 -n32 -mips4 -r14000 | -float_const -use_readonly_const | -TARG:isa=mips4:platform=ip35:processor=r14000 -TENV:zeroinit_in_bss=ON | -OPT:fast_io=ON:Olimit=8192:reorg_common=ON:swp=ON | -LNO:auto_dist=ON:fusion_peeling_limit=8:gather_scatter=2 -diag_error | 1035 -woff 1174,1183,1185,1552,3968,3970 -no-undefined -version-info | 1:0:0 | -Wl,-s,-x,-n32,-mips4,-rdata_shared,-allow_jump_at_eop,-rpath,/opt/gnu/usr/lib:/opt/gnu/lib | -L/opt/gnu/usr/lib -L/opt/gnu/lib -o libmagic.la -rpath /opt/gnu/usr/lib | magic.lo apprentice.lo softmagic.lo ascmagic.lo encoding.lo compress.lo | is_tar.lo readelf.lo print.lo fsmagic.lo funcs.lo apptype.lo cdf.lo | cdf_time.lo readcdf.lo getopt_long.lo asprintf.lo vasprintf.lo -lz | /opt/gnu/sbin/ld -n32 -shared .libs/magic.o .libs/apprentice.o | .libs/softmagic.o .libs/ascmagic.o .libs/encoding.o .libs/compress.o | .libs/is_tar.o .libs/readelf.o .libs/print.o .libs/fsmagic.o | .libs/funcs.o .libs/apptype.o .libs/cdf.o .libs/cdf_time.o | .libs/readcdf.o .libs/getopt_long.o .libs/asprintf.o .libs/vasprintf.o | -L/opt/gnu/usr/lib -L/opt/gnu/lib -lz -lc -s -x -n32 -mips4 | -rdata_shared -allow_jump_at_eop -rpath /opt/gnu/usr/lib:/opt/gnu/lib | -soname libmagic.so.2 `test -n "sgi2.0" && echo -set_version sgi2.0` | -update_registry .libs/so_locations -o .libs/libmagic.so.2.0 | (cd .libs && rm -f libmagic.so.2 && ln -s libmagic.so.2.0 libmagic.so.2) | (cd .libs && rm -f libmagic.so && ln -s libmagic.so.2.0 libmagic.so) | (cd .libs && rm -f libmagic.so && ln -s libmagic.so.2.0 libmagic.so) | ar cru .libs/libmagic.a magic.o apprentice.o softmagic.o ascmagic.o | encoding.o compress.o is_tar.o readelf.o print.o fsmagic.o funcs.o | apptype.o cdf.o cdf_time.o readcdf.o getopt_long.o asprintf.o vasprintf.o | ranlib .libs/libmagic.a | creating libmagic.la | (cd .libs && rm -f libmagic.la && ln -s ../libmagic.la libmagic.la) | | | Might any of this be due to 'configure's "nm" test failing?: | | configure:6456: checking command to parse nm output from cc object | configure:6552: cc -c -c99 -O2 -n32 -mips4 -r14000 -float_const | -use_readonly_const -TARG:isa=mips4:platform=ip35:processor=r14000 | -TENV:zeroinit_in_bss=ON | -OPT:fast_io=ON:Olimit=8192:reorg_common=ON:swp=ON | -LNO:auto_dist=ON:fusion_peeling_limit=8:gather_scatter=2 -diag_error | 1035 -woff 1174,1183,1185,1552,3968,3970 -I/opt/gnu/usr/include | conftest.c >&5 | configure:6555: $? = 0 | configure:6559: nm conftest.o \| sed -n -e 's/^.*[ | ]\([BCDEGRST][BCDEGRST]*\)[ ][ ]*\([_A-Za-z][_A-Za-z0-9]*\)$/\1 | \2 \2/p' \> conftest.nm | configure:6562: $? = 0 | cannot run sed -n -e 's/^.*[ ]\([BCDEGRST][BCDEGRST]*\)[ ][ | ]*\([_A-Za-z][_A-Za-z0-9]*\)$/\1 \2 \2/p' | configure:6552: cc -c -c99 -O2 -n32 -mips4 -r14000 -float_const | -use_readonly_const -TARG:isa=mips4:platform=ip35:processor=r14000 | -TENV:zeroinit_in_bss=ON | -OPT:fast_io=ON:Olimit=8192:reorg_common=ON:swp=ON | -LNO:auto_dist=ON:fusion_peeling_limit=8:gather_scatter=2 -diag_error | 1035 -woff 1174,1183,1185,1552,3968,3970 -I/opt/gnu/usr/include | conftest.c >&5 | configure:6555: $? = 0 | configure:6559: nm conftest.o \| sed -n -e 's/^.*[ | ]\([BCDEGRST][BCDEGRST]*\)[ ][ | ]*_\([_A-Za-z][_A-Za-z0-9]*\)$/\1 _\2 \2/p' \> conftest.nm | configure:6562: $? = 0 | cannot run sed -n -e 's/^.*[ ]\([BCDEGRST][BCDEGRST]*\)[ ][ | ]*_\([_A-Za-z][_A-Za-z0-9]*\)$/\1 _\2 \2/p' | configure:6652: result: failed Yes, looks that way... But since getopt_long is included in the link line, I don't see why the link fails. Does nm getopt_long.o show a getopt_long symbol? christos From dnovotny at redhat.com Thu Feb 12 12:05:30 2009 From: dnovotny at redhat.com (Daniel Novotny) Date: Thu, 12 Feb 2009 05:05:30 -0500 (EST) Subject: file 5.00 error with a French .doc file In-Reply-To: <1556170325.294671234433096812.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> Message-ID: <1906130661.294731234433130943.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> hello, the new file 5.00 returns error exit code for a .DOC file: $ file ~/Download/PMD.doc /home/pmatilai/Download/PMD.doc: ERROR: CDF V2 Document, Little Endian, Os: Windows, Version 5.1, Code page: 1252vasprintf failed (Invalid or incomplete multibyte or wide character) I am attaching the file for analysis (btw this corrupts rpmbuild process, which checks every file it packs) best regards, Daniel Novotny -------------- next part -------------- A non-text attachment was scrubbed... Name: PMD.doc Type: application/msword Size: 32256 bytes Desc: not available URL: From vapier at gentoo.org Fri Feb 13 19:44:49 2009 From: vapier at gentoo.org (Mike Frysinger) Date: Fri, 13 Feb 2009 12:44:49 -0500 Subject: file 5.00 error with a French .doc file In-Reply-To: <1906130661.294731234433130943.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> References: <1906130661.294731234433130943.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> Message-ID: <200902131244.55244.vapier@gentoo.org> On Thursday 12 February 2009 05:05:30 Daniel Novotny wrote: > the new file 5.00 returns error exit code for a .DOC file: > > $ file ~/Download/PMD.doc > /home/pmatilai/Download/PMD.doc: ERROR: CDF V2 Document, Little Endian, Os: > Windows, Version 5.1, Code page: 1252vasprintf failed (Invalid or > incomplete multibyte or wide character) > > I am attaching the file for analysis > > (btw this corrupts rpmbuild process, which checks every file it packs) it's probably because you're using a unicode based locale. set LC_ALL to C and i bet the `file` will work again. -mike -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: This is a digitally signed message part. URL: From christos at zoulas.com Fri Feb 13 20:24:53 2009 From: christos at zoulas.com (Christos Zoulas) Date: Fri, 13 Feb 2009 13:24:53 -0500 Subject: [PATCH] fix memleak in libmagic in file 4.26 In-Reply-To: <20090107112903.GA32140@robert.goliasz@uk.clara.net> from Robert G (Jan 7, 11:29am) Message-ID: <20090213182454.12DB95654E@rebar.astron.com> On Jan 7, 11:29am, file-libmagic at uukgoblin.net (Robert G) wrote: -- Subject: [PATCH] fix memleak in libmagic in file 4.26 | libmagic is leaking memory if someone calls magic_buffer() (magic_file() | will probably cause it too) multiple times with one magic_set cookie. | This small patch should fix it. | -- | Robert G Thanks, fixed! christos From christos at zoulas.com Fri Feb 13 20:46:12 2009 From: christos at zoulas.com (Christos Zoulas) Date: Fri, 13 Feb 2009 13:46:12 -0500 Subject: file 5.00 error with a French .doc file In-Reply-To: <1906130661.294731234433130943.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> from Daniel Novotny (Feb 12, 5:05am) Message-ID: <20090213184612.2EA775654E@rebar.astron.com> On Feb 12, 5:05am, dnovotny at redhat.com (Daniel Novotny) wrote: -- Subject: file 5.00 error with a French .doc file | | ------=_Part_10359_582458326.1234433130940 | Content-Type: text/plain; charset=utf-8 | Content-Transfer-Encoding: 7bit | | hello, | | the new file 5.00 returns error exit code for a .DOC file: | | $ file ~/Download/PMD.doc | /home/pmatilai/Download/PMD.doc: ERROR: CDF V2 Document, Little Endian, Os: | Windows, Version 5.1, Code page: 1252vasprintf failed (Invalid or incomplete | multibyte or wide character) | | I am attaching the file for analysis | | (btw this corrupts rpmbuild process, which checks every file it packs) | | best regards, $ file -m ../magic/magic.mgc PMD.doc PMD.doc: CDF V2 Document, Little Endian, Os: Windows, Version 5.1, Code page: 1252, Title: \225, Subject: , Author: Bekrar, Keywords: , Comments: , Template: Normal.dot, Last Saved By: Bekrar, Revision Number: 2, Name of Creating Application: Microsoft Office Word, Total Editing Time: 01:00, Create Time/Date: Mon Aug 15 10:51:00 2005, Last Saved Time/Date: Mon Aug 15 11:02:00 2005, Number of Pages: 2, Number of Words: 646, Number of Characters: 3555, Security: 0 It must be the \225 character. Since we don't handle localization, I will just eat the bad strings. Here is the output after the change: PMD.doc: CDF V2 Document, Little Endian, Os: Windows, Version 5.1, Code page: 1252, Author: Bekrar, Template: Normal.dot, Last Saved By: Bekrar, Revision Number: 2, Name of Creating Application: Microsoft Office Word, Total Editing Time: 01:00, Create Time/Date: Mon Aug 15 10:51:00 2005, Last Saved Time/Date: Mon Aug 15 11:02:00 2005, Number of Pages: 2, Number of Words: 646, Number of Characters: 3555, Security: 0 Index: readcdf.c =================================================================== RCS file: /p/file/cvsroot/file/src/readcdf.c,v retrieving revision 1.11 diff -u -u -r1.11 readcdf.c --- readcdf.c 3 Feb 2009 20:27:51 -0000 1.11 +++ readcdf.c 13 Feb 2009 18:45:33 -0000 @@ -75,9 +75,23 @@ if (len > 1) { s = info[i].pi_str.s_buf; if (NOTMIME(ms)) { - if (file_printf(ms, ", %s: %.*s", buf, - len, s) == -1) - return -1; + char vbuf[1024]; + size_t j; + for (j = 0; j < sizeof(vbuf) && len--; + j++, s++) { + if (*s == '\0') + break; + if (isprint((unsigned char)*s)) + vbuf[j] = *s; + } + if (j == sizeof(vbuf)) + --j; + vbuf[j] = '\0'; + if (vbuf[0]) { + if (file_printf(ms, ", %s: %s", + buf, vbuf) == -1) + return -1; + } } else if (info[i].pi_id == CDF_PROPERTY_NAME_OF_APPLICATION) { if (strstr(s, "Word")) christos From vapier at gentoo.org Sat Feb 14 04:41:59 2009 From: vapier at gentoo.org (Mike Frysinger) Date: Fri, 13 Feb 2009 21:41:59 -0500 Subject: outdated info in README Message-ID: <200902132141.59854.vapier@gentoo.org> the README talks about file-4.x ... looks like that should be updated ;) -mike -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: This is a digitally signed message part. URL: From vapier at gentoo.org Sat Feb 14 04:52:33 2009 From: vapier at gentoo.org (Mike Frysinger) Date: Fri, 13 Feb 2009 21:52:33 -0500 Subject: default data file storage location Message-ID: <200902132152.34230.vapier@gentoo.org> it seems with file-5.00, only one data file is installed now ? with file-4.xx, there'd be 4 files: magic{,.mgc,.mime,.mime.mgc}. now i only see one file installed with file-5.00: magic.mgc. assuming this is correct, can we get the default data dir for file changed from /usr/share/file/ to /usr/share/misc/ ? the former makes sense when there is a bunch of files, but when there is just one, the latter makes more sense. -mike -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: This is a digitally signed message part. URL: From christos at zoulas.com Sat Feb 14 17:16:56 2009 From: christos at zoulas.com (Christos Zoulas) Date: Sat, 14 Feb 2009 10:16:56 -0500 Subject: outdated info in README In-Reply-To: <200902132141.59854.vapier@gentoo.org> from Mike Frysinger (Feb 13, 9:41pm) Message-ID: <20090214151656.ACC095654E@rebar.astron.com> On Feb 13, 9:41pm, vapier at gentoo.org (Mike Frysinger) wrote: -- Subject: outdated info in README | the README talks about file-4.x ... looks like that should be updated ;) | -mike Thanks, fixed! This is Release 5.x of Ian Darwin's (copyright but distributable) file(1) command. This version is the standard "file" command for Linux, *BSD, and other systems. (See "patchlevel.h" for the exact release number). The major changes for 5.x are CDF file parsing, indirect magic, and overhaul in mime and ascii encoding handling. christos From christos at zoulas.com Sat Feb 14 17:22:23 2009 From: christos at zoulas.com (Christos Zoulas) Date: Sat, 14 Feb 2009 10:22:23 -0500 Subject: default data file storage location In-Reply-To: <200902132152.34230.vapier@gentoo.org> from Mike Frysinger (Feb 13, 9:52pm) Message-ID: <20090214152223.A29B75654E@rebar.astron.com> On Feb 13, 9:52pm, vapier at gentoo.org (Mike Frysinger) wrote: -- Subject: default data file storage location | it seems with file-5.00, only one data file is installed now ? with | file-4.xx, there'd be 4 files: magic{,.mgc,.mime,.mime.mgc}. now i only see | one file installed with file-5.00: magic.mgc. | | assuming this is correct, can we get the default data dir for file changed. | | from /usr/share/file/ to /usr/share/misc/ ? the former makes sense when there | is a bunch of files, but when there is just one, the latter makes more sense. | mike Will do, christos From schwehr at ccom.unh.edu Mon Feb 16 15:29:19 2009 From: schwehr at ccom.unh.edu (Kurt Schwehr) Date: Mon, 16 Feb 2009 08:29:19 -0500 Subject: unsigned verses signed numbers Message-ID: <49996A2F.4000803@ccom.unh.edu> Hi, Is it possible to specify signed vrs unsigned binary number in magic entries? Also, I've started working through file formats for multbeam sonars, seismic data, and lidar bathymetry. If anyone is up for looking through what I've done a telling me how it looks, I would really appreciate it. Is there an easy way to tell if I've got some sort of collision with other rules? Some of these files only have one or two stable bytes to trigger off of. The list is probably a long ways from usable for many of the formats and we've got a lot more formats to add. If any of these formats are good enough, I'd like to start submitting some to be included with file. Thanks, -kurt -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: magic-marine-sciences URL: From maillist at jg555.com Tue Feb 17 07:08:16 2009 From: maillist at jg555.com (Jim Gifford) Date: Mon, 16 Feb 2009 21:08:16 -0800 Subject: Cross-Compiling for Powerpc Message-ID: <499A4640.4010603@jg555.com> I have been trying to track down this error for a few days. Any suggestions on where to start? Doing a cross-compile build on a x86_64 to a powerpc, this was not an issue on the 4.26 version, but seems to be a big issue on 5.00. make[2]: Leaving directory `/mnt/clfs/var/build_system/work/file-5.00/src' Making all in magic make[2]: Entering directory `/mnt/clfs/var/build_system/work/file-5.00/magic' file -C -m ../magic/Magdir ../magic/Magdir/audio, 296: Warning: indirect offset type `I' invalid ../magic/Magdir/audio, 296: Warning: type `indirect x \b, contains: ' invalid file: Unknown !: entry `!:apple 8BIMJPEG' make[2]: *** [magic.mgc] Error 1 make[2]: Leaving directory `/mnt/clfs/var/build_system/work/file-5.00/magic' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/mnt/clfs/var/build_system/work/file-5.00' make: *** [all] Error 2 From schwehr at ccom.unh.edu Tue Feb 17 13:50:01 2009 From: schwehr at ccom.unh.edu (Kurt Schwehr) Date: Tue, 17 Feb 2009 06:50:01 -0500 Subject: Cross-Compiling for Powerpc In-Reply-To: <499A4640.4010603@jg555.com> References: <499A4640.4010603@jg555.com> Message-ID: <499AA469.4050708@ccom.unh.edu> Hi Jim, I don't know if this will help you, but it works for me on a G4 PowerPC Mac running 10.5.6 and gcc 4.0.1. # modify fink file.info to 5.00 and fink install file file --version file-5.00 magic file from /sw/share/file/magic file -C -m ../file-5.00/magic/Magdir/audio echo $? 0 md5 ../file-5.00/magic/Magdir/audio MD5 (../file-5.00/magic/Magdir/audio) = a73c3ad07a7ea10d741189c0bb7f420f Perhaps your audio file got modified? -kurt Jim Gifford wrote: > I have been trying to track down this error for a few days. Any > suggestions on where to start? > Doing a cross-compile build on a x86_64 to a powerpc, this was not an > issue on the 4.26 version, but seems to be a big issue on 5.00. > > make[2]: Leaving directory > `/mnt/clfs/var/build_system/work/file-5.00/src' > Making all in magic > make[2]: Entering directory > `/mnt/clfs/var/build_system/work/file-5.00/magic' > file -C -m ../magic/Magdir > ../magic/Magdir/audio, 296: Warning: indirect offset type `I' invalid > ../magic/Magdir/audio, 296: Warning: type `indirect x \b, > contains: ' invalid > file: Unknown !: entry `!:apple 8BIMJPEG' > make[2]: *** [magic.mgc] Error 1 > make[2]: Leaving directory > `/mnt/clfs/var/build_system/work/file-5.00/magic' > make[1]: *** [all-recursive] Error 1 > make[1]: Leaving directory `/mnt/clfs/var/build_system/work/file-5.00' > make: *** [all] Error 2 > > _______________________________________________ > File mailing list > File at mx.gw.com > http://mx.gw.com/mailman/listinfo/file From christos at zoulas.com Tue Feb 17 15:53:42 2009 From: christos at zoulas.com (Christos Zoulas) Date: Tue, 17 Feb 2009 08:53:42 -0500 Subject: unsigned verses signed numbers In-Reply-To: <49996A2F.4000803@ccom.unh.edu> from Kurt Schwehr (Feb 16, 8:29am) Message-ID: <20090217135342.ED5BA5654E@rebar.astron.com> On Feb 16, 8:29am, schwehr at ccom.unh.edu (Kurt Schwehr) wrote: -- Subject: unsigned verses signed numbers | Hi, | | Is it possible to specify signed vrs unsigned binary number in magic | entries? No, but you can print unsigned with %u. Signed vs unsigned should not matter for comparisons. | Also, I've started working through file formats for multbeam sonars, | seismic data, and lidar bathymetry. If anyone is up for looking through | what I've done a telling me how it looks, I would really appreciate it. | Is there an easy way to tell if I've got some sort of collision with | other rules? Some of these files only have one or two stable bytes to | trigger off of. The list is probably a long ways from usable for many | of the formats and we've got a lot more formats to add. | | If any of these formats are good enough, I'd like to start submitting | some to be included with file. the 0 byte 67 rule catches any file starting with 'C' so that's a no-go. Any magic with just one or two bytes of magic will produce spurious matches. The rest look fine. christos From christos at zoulas.com Tue Feb 17 15:54:48 2009 From: christos at zoulas.com (Christos Zoulas) Date: Tue, 17 Feb 2009 08:54:48 -0500 Subject: Cross-Compiling for Powerpc In-Reply-To: <499A4640.4010603@jg555.com> from Jim Gifford (Feb 16, 9:08pm) Message-ID: <20090217135448.F35A95654F@rebar.astron.com> On Feb 16, 9:08pm, maillist at jg555.com (Jim Gifford) wrote: -- Subject: Cross-Compiling for Powerpc | I have been trying to track down this error for a few days. Any | suggestions on where to start? | Doing a cross-compile build on a x86_64 to a powerpc, this was not an | issue on the 4.26 version, but seems to be a big issue on 5.00. You are probably picking up a different file program that is an older version... christos From folti at balabit.hu Tue Feb 17 12:28:29 2009 From: folti at balabit.hu (Pal Tamas) Date: Tue, 17 Feb 2009 11:28:29 +0100 Subject: Autorun.inf magic Message-ID: <20090217102829.GA31102@balabit.hu> Hello, This is the magic for Microsoft's autorun.inf file: # Autorun File 0 string/c [autorun]\r\n Microsoft Windows Autorun file. According to http://filext.com/file-extension/inf the MIME type is either text/inf or application/x-setupscript. As far as I saw, Windows uses the latter. -- Pal Tamas/Folti folti at balabit.hu From christos at zoulas.com Tue Feb 17 16:00:40 2009 From: christos at zoulas.com (Christos Zoulas) Date: Tue, 17 Feb 2009 09:00:40 -0500 Subject: Autorun.inf magic In-Reply-To: <20090217102829.GA31102@balabit.hu> from Pal Tamas (Feb 17, 11:28am) Message-ID: <20090217140040.7A6415654E@rebar.astron.com> On Feb 17, 11:28am, folti at balabit.hu (Pal Tamas) wrote: -- Subject: Autorun.inf magic | Hello, | | This is the magic for Microsoft's autorun.inf file: | | # Autorun File | 0 string/c [autorun]\r\n Microsoft Windows Autorun file. | | According to http://filext.com/file-extension/inf the MIME type is either | text/inf or application/x-setupscript. As far as I saw, Windows uses the | latter. Thanks, christos From schwehr at ccom.unh.edu Tue Feb 17 16:27:53 2009 From: schwehr at ccom.unh.edu (Kurt Schwehr) Date: Tue, 17 Feb 2009 09:27:53 -0500 Subject: unsigned verses signed numbers In-Reply-To: <20090217135342.ED5BA5654E@rebar.astron.com> References: <20090217135342.ED5BA5654E@rebar.astron.com> Message-ID: <499AC969.6040002@ccom.unh.edu> Hi Christos, Thanks for the quick reply. We will keep working on the rules to make them better. I was pretty sure that counting on C was just not going to cut it :) I will try some other tests that will likely fairly good. Didn't think about %u... does the trick! Thanks, -kurt Christos Zoulas wrote: > No, but you can print unsigned with %u. Signed vs unsigned should not > matter for comparisons. > > > the 0 byte 67 rule catches any file starting with 'C' so that's a no-go. > Any magic with just one or two bytes of magic will produce spurious matches. > The rest look fine. > > christos > From asbjorn at asbjorn.biz Tue Feb 17 17:00:03 2009 From: asbjorn at asbjorn.biz (=?ISO-8859-1?Q?Asbj=F8rn_Sloth_T=F8nnesen?=) Date: Tue, 17 Feb 2009 15:00:03 +0000 Subject: Google Earth related magic Message-ID: <499AD0F3.7030404@asbjorn.biz> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, I have created some magic lines[1] describing the Google Earth formats KML and KMZ, in both the Google version and the new standardized OpenGIS version. I have previous discussed this entry on the OpenGIS forum[2], and Michael Ashbridge, from Google, wants any zip containing file with the kml extension to be matched. However I think that such a magic would be to aggressive. I see this more as a fault in the specification, as KMZ is just described as a ZIP file containing a KML file. [1] The magic lines http://asbjorn.it/pub/misc/magic.kml.txt [2] OpenGIS: KML support in libmagic http://feature.opengeospatial.org/forumbb/viewtopic.php?p=2630#2630 - -- Best regards Asbj?rn Sloth T?nnesen Lila ApS http://lila.io/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkma0PMACgkQSViWlxucwurqCgCeMBcGapMCZfp8FPVZHPUD3nIK mr8AoIFB3sMJnFVOOEEsGSatvpD0M0Ee =ZI5s -----END PGP SIGNATURE----- From christos at zoulas.com Tue Feb 17 18:58:50 2009 From: christos at zoulas.com (Christos Zoulas) Date: Tue, 17 Feb 2009 11:58:50 -0500 Subject: Google Earth related magic In-Reply-To: <499AD0F3.7030404@asbjorn.biz> from =?ISO-8859-1?Q?Asbj=F8rn_Sloth_T=F8nnesen?= (Feb 17, 3:00pm) Message-ID: <20090217165850.604B25654E@rebar.astron.com> On Feb 17, 3:00pm, asbjorn at asbjorn.biz (=?ISO-8859-1?Q?Asbj=F8rn_Sloth_T=F8nnesen?=) wrote: -- Subject: Google Earth related magic | Hi, | | I have created some magic lines[1] describing the Google Earth formats | KML and KMZ, in both the Google version and the new standardized OpenGIS | version. | | I have previous discussed this entry on the OpenGIS forum[2], and | Michael Ashbridge, from Google, wants any zip containing file with the | kml extension to be matched. However I think that such a magic would be | to aggressive. I see this more as a fault in the specification, as KMZ | is just described as a ZIP file containing a KML file. | | [1] The magic lines | http://asbjorn.it/pub/misc/magic.kml.txt | | [2] OpenGIS: KML support in libmagic | http://feature.opengeospatial.org/forumbb/viewtopic.php?p=3D2630#2630 | | - -- | Best regards | Asbj=F8rn Sloth T=F8nnesen | Lila ApS | http://lila.io/ Thanks, I committed it. christos From kimmo at global-wire.fi Tue Feb 17 19:18:50 2009 From: kimmo at global-wire.fi (Kimmo Suominen) Date: Tue, 17 Feb 2009 19:18:50 +0200 Subject: WANTED: New moderator Message-ID: Hi! I'd like to pass the mailing list moderator torch onto someone who has an active interest in the development of file. If you are interested, please reply to this message. (You may want to reply directory to me, in which case you'll need to work around the reply-to header inserted by the mailing list.) I typically review the moderation queue once a day, when the mailing list software has sent me a summary. There's typically only something there only once or twice a week. Best regards, + Kimmo From vapier at gentoo.org Tue Feb 17 19:57:29 2009 From: vapier at gentoo.org (Mike Frysinger) Date: Tue, 17 Feb 2009 12:57:29 -0500 Subject: WANTED: New moderator In-Reply-To: References: Message-ID: <200902171257.30602.vapier@gentoo.org> On Tuesday 17 February 2009 12:18:50 Kimmo Suominen wrote: > I'd like to pass the mailing list moderator torch onto someone who has > an active interest in the development of file. If you are interested, > please reply to this message. (You may want to reply directory to me, > in which case you'll need to work around the reply-to header inserted > by the mailing list.) > > I typically review the moderation queue once a day, when the mailing > list software has sent me a summary. There's typically only something > there only once or twice a week. is everyone moderated ? what's the moderation for ? -mike -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: This is a digitally signed message part. URL: From kimmo at global-wire.fi Tue Feb 17 20:06:42 2009 From: kimmo at global-wire.fi (Kimmo Suominen) Date: Tue, 17 Feb 2009 20:06:42 +0200 Subject: WANTED: New moderator In-Reply-To: <200902171257.30602.vapier@gentoo.org> References: <200902171257.30602.vapier@gentoo.org> Message-ID: The moderator needs to approve new subscriptions. Subscriptions must be moderated, because otherwise we will get spam bots subscribed to the list. The moderator needs to advise people, who submit messages to the list without subscribing first. The current list policy is that one must first subscribe. There are two reasons for this: we don't want open submissions as this would just be a way for spam to get in, and allowing non-subscribers to post puts too much work on the moderator and creates out-of-sync correspondence when direct replies and on-list replies are mixed. Cheers, + Kimmo On Tue, Feb 17, 2009 at 19:57, Mike Frysinger wrote: > On Tuesday 17 February 2009 12:18:50 Kimmo Suominen wrote: >> I'd like to pass the mailing list moderator torch onto someone who has >> an active interest in the development of file. If you are interested, >> please reply to this message. (You may want to reply directory to me, >> in which case you'll need to work around the reply-to header inserted >> by the mailing list.) >> >> I typically review the moderation queue once a day, when the mailing >> list software has sent me a summary. There's typically only something >> there only once or twice a week. > > is everyone moderated ? what's the moderation for ? > -mike > From vapier at gentoo.org Tue Feb 17 20:45:22 2009 From: vapier at gentoo.org (Mike Frysinger) Date: Tue, 17 Feb 2009 13:45:22 -0500 Subject: WANTED: New moderator In-Reply-To: References: <200902171257.30602.vapier@gentoo.org> Message-ID: <200902171345.23247.vapier@gentoo.org> On Tuesday 17 February 2009 13:06:42 Kimmo Suominen wrote: > The moderator needs to approve new subscriptions. Subscriptions must > be moderated, because otherwise we will get spam bots subscribed to > the list. > > The moderator needs to advise people, who submit messages to the list > without subscribing first. The current list policy is that one must > first subscribe. There are two reasons for this: we don't want open > submissions as this would just be a way for spam to get in, and > allowing non-subscribers to post puts too much work on the moderator > and creates out-of-sync correspondence when direct replies and on-list > replies are mixed. ok, i wasnt sure if the list was moderated for everyone or just people who arent subscribed cant mailman be configured to autoreject non-subscribers with an explanatory message ? as for the new subscriptions, you could set up an alias so that multiple people can assist ? -mike -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: This is a digitally signed message part. URL: From kimmo at global-wire.fi Tue Feb 17 21:32:59 2009 From: kimmo at global-wire.fi (Kimmo Suominen) Date: Tue, 17 Feb 2009 21:32:59 +0200 Subject: WANTED: New moderator In-Reply-To: <200902171345.23247.vapier@gentoo.org> References: <200902171257.30602.vapier@gentoo.org> <200902171345.23247.vapier@gentoo.org> Message-ID: Messages from subscribers are automatically approved and forwarded. A list can be configured with multiple moderators. You'll probably want to work out some sort of a schedule in that case, so you don't unnecessarily login to moderate the queue just to find out someone already did it. I'm fine with the new moderator(s) reconfiguring the list settings, as long as it doesn't result in spam being forwarded through the system. We wouldn't want to be listed as a spam source in blacklists and have messages from all mailing lists rejected. Cheers, + Kim On Tue, Feb 17, 2009 at 20:45, Mike Frysinger wrote: > On Tuesday 17 February 2009 13:06:42 Kimmo Suominen wrote: >> The moderator needs to approve new subscriptions. Subscriptions must >> be moderated, because otherwise we will get spam bots subscribed to >> the list. >> >> The moderator needs to advise people, who submit messages to the list >> without subscribing first. The current list policy is that one must >> first subscribe. There are two reasons for this: we don't want open >> submissions as this would just be a way for spam to get in, and >> allowing non-subscribers to post puts too much work on the moderator >> and creates out-of-sync correspondence when direct replies and on-list >> replies are mixed. > > ok, i wasnt sure if the list was moderated for everyone or just people who > arent subscribed > > cant mailman be configured to autoreject non-subscribers with an explanatory > message ? > > as for the new subscriptions, you could set up an alias so that multiple > people can assist ? > -mike > From dnovotny at redhat.com Wed Feb 18 16:14:13 2009 From: dnovotny at redhat.com (Daniel Novotny) Date: Wed, 18 Feb 2009 09:14:13 -0500 (EST) Subject: Error: file-5.00: Thumbs.db : Cannot read short stream (Invalid argument) In-Reply-To: <1352288757.265031234966372240.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> Message-ID: <1353298101.265111234966453371.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> hello, because the Fedora rpm building process runs "file" on every file it packs and fails if there is an error, I have another bug report with a crash in file $ file ~/work/file-testfiles/Thumbs.db /home/dnovotny/work/file-testfiles/Thumbs.db: ERROR: Cannot read short stream (Invalid argument) the previous version 4.26 goes like this: $ file /usr/share/FlightGear/Aircraft/c172p/Models/Immat/Thumbs.db /usr/share/FlightGear/Aircraft/c172p/Models/Immat/Thumbs.db: Microsoft Office Document the file is probably not MS Office document at all, it just confuses the file logic and this remained not repaired, because it did not return error, which now does regards, Daniel Novotny -------------- next part -------------- A non-text attachment was scrubbed... Name: Thumbs.db Type: application/octet-stream Size: 7195 bytes Desc: not available URL: From vapier at gentoo.org Thu Feb 19 03:58:50 2009 From: vapier at gentoo.org (Mike Frysinger) Date: Wed, 18 Feb 2009 20:58:50 -0500 Subject: WANTED: New moderator In-Reply-To: References: <200902171345.23247.vapier@gentoo.org> Message-ID: <200902182058.50946.vapier@gentoo.org> On Tuesday 17 February 2009 14:32:59 Kimmo Suominen wrote: > A list can be configured with multiple moderators. You'll probably > want to work out some sort of a schedule in that case, so you don't > unnecessarily login to moderate the queue just to find out someone > already did it. > > I'm fine with the new moderator(s) reconfiguring the list settings, as > long as it doesn't result in spam being forwarded through the system. > We wouldn't want to be listed as a spam source in blacklists and have > messages from all mailing lists rejected. i dont mind being on said list ... i would just leave it open and try to recruit multiple people ;) -mike -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: This is a digitally signed message part. URL: From christos at zoulas.com Thu Feb 19 15:58:52 2009 From: christos at zoulas.com (Christos Zoulas) Date: Thu, 19 Feb 2009 08:58:52 -0500 Subject: Fwd: Patch to file-4.26 to allow build of magic file In-Reply-To: <200811281855.45398.vapier@gentoo.org> from Mike Frysinger (Nov 28, 6:55pm) Message-ID: <20090219135852.CF6355654E@rebar.astron.com> On Nov 28, 6:55pm, vapier at gentoo.org (Mike Frysinger) wrote: -- Subject: Fwd: Patch to file-4.26 to allow build of magic file | Subject: Patch to file-4.26 to allow build of magic file | Date: Tuesday 11 November 2008 | From: Bill | To: vapier at gentoo.org, cardoe at gentoo.org, ranger at gentoo.org | | Hello, | | The magic file magic/Magdir/epoc has first line commented out, similar to | earlier bug which was reported and patched. Here's the patch to that file | to allow file-4.26 to emerge: | This is fixed in 5.0. christos From christos at zoulas.com Fri Feb 20 17:49:11 2009 From: christos at zoulas.com (Christos Zoulas) Date: Fri, 20 Feb 2009 10:49:11 -0500 Subject: Error: file-5.00: Thumbs.db : Cannot read short stream (Invalid argument) In-Reply-To: <1353298101.265111234966453371.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> from Daniel Novotny (Feb 18, 9:14am) Message-ID: <20090220154911.5144B5654E@rebar.astron.com> On Feb 18, 9:14am, dnovotny at redhat.com (Daniel Novotny) wrote: -- Subject: Error: file-5.00: Thumbs.db : Cannot read short stream (Invalid | hello, | | because the Fedora rpm building process runs "file" on every file it packs | and fails if there is an error, I have another bug report with | a crash in file | | $ file ~/work/file-testfiles/Thumbs.db | /home/dnovotny/work/file-testfiles/Thumbs.db: ERROR: Cannot read short stream (Invalid argument) | | the previous version 4.26 goes like this: | | $ file /usr/share/FlightGear/Aircraft/c172p/Models/Immat/Thumbs.db | /usr/share/FlightGear/Aircraft/c172p/Models/Immat/Thumbs.db: Microsoft Office | Document | | the file is probably not MS Office document at all, it just confuses the file logic | and this remained not repaired, because it did not return error, which now does | | regards, | | Daniel Novotny Here's a patch. christos Index: cdf.c =================================================================== RCS file: /p/file/cvsroot/file/src/cdf.c,v retrieving revision 1.17 diff -u -u -r1.17 cdf.c --- cdf.c 3 Feb 2009 20:27:51 -0000 1.17 +++ cdf.c 20 Feb 2009 15:45:39 -0000 @@ -239,7 +239,9 @@ cdf_unpack_header(h, buf); cdf_swap_header(h); if (h->h_magic != CDF_MAGIC) { - DPRINTF(("Bad magic 0x%x != 0x$x\n", h->h_magic, CDF_MAGIC)); + DPRINTF(("Bad magic 0x%llx != 0x%llx\n", + (unsigned long long)h->h_magic, + (unsigned long long)CDF_MAGIC)); errno = EFTYPE; return -1; } @@ -539,10 +541,11 @@ if (dir->dir_tab[i].d_type == CDF_DIR_TYPE_ROOT_STORAGE) break; + /* If the it is not there, just fake it; some docs don't have it */ if (i == dir->dir_len) { - DPRINTF(("Cannot find root storage node\n")); - errno = EFTYPE; - return -1; + scn->sst_tab = NULL; + scn->sst_len = 0; + return 0; } d = &dir->dir_tab[i]; Index: readcdf.c =================================================================== RCS file: /p/file/cvsroot/file/src/readcdf.c,v retrieving revision 1.12 diff -u -u -r1.12 readcdf.c --- readcdf.c 13 Feb 2009 18:46:48 -0000 1.12 +++ readcdf.c 20 Feb 2009 15:45:39 -0000 @@ -129,7 +129,7 @@ case CDF_CLIPBOARD: break; default: - file_error(ms, 0, "Internal parsing error"); + errno = EFTYPE; return -1; } } @@ -202,6 +202,7 @@ cdf_stream_t sst, scn; cdf_dir_t dir; int i; + const char *expn = ""; (void)&nbytes; (void)&buf; @@ -214,7 +215,7 @@ #endif if (cdf_read_sat(fd, &h, &sat) == -1) { - file_error(ms, errno, "Can't read SAT"); + expn = "Can't read SAT"; return -1; } #ifdef CDF_DEBUG @@ -222,7 +223,7 @@ #endif if ((i = cdf_read_ssat(fd, &h, &sat, &ssat)) == -1) { - file_error(ms, errno, "Can't read SAT"); + expn = "Can't read SSAT"; goto out1; } #ifdef CDF_DEBUG @@ -230,12 +231,12 @@ #endif if ((i = cdf_read_dir(fd, &h, &sat, &dir)) == -1) { - file_error(ms, errno, "Can't read directory"); + expn = "Can't read directory"; goto out2; } if ((i = cdf_read_short_stream(fd, &h, &sat, &dir, &sst)) == -1) { - file_error(ms, errno, "Cannot read short stream"); + expn = "Cannot read short stream"; goto out3; } @@ -244,19 +245,14 @@ #endif if ((i = cdf_read_summary_info(fd, &h, &sat, &ssat, &sst, &dir, &scn)) == -1) { - /* Some files don't have summary info! */ -#ifdef notyet - file_error(ms, errno, "Can't read summary_info"); -#else - i = 0; -#endif + expn = ""; goto out4; } #ifdef CDF_DEBUG cdf_dump_summary_info(&h, &scn); #endif if ((i = cdf_file_summary_info(ms, &scn)) == -1) - file_error(ms, errno, "Can't expand summary_info"); + expn = "Can't expand summary_info"; free(scn.sst_tab); out4: free(sst.sst_tab); @@ -266,5 +262,13 @@ free(ssat.sat_tab); out1: free(sat.sat_tab); + if (i != 1) { + if (file_printf(ms, "CDF V2 Document") == -1) + return -1; + if (*expn) + if (file_printf(ms, ", corrupt: %s", expn) == -1) + return -1; + i = 1; + } return i; } From vapier at gentoo.org Sun Feb 22 01:35:15 2009 From: vapier at gentoo.org (Mike Frysinger) Date: Sat, 21 Feb 2009 18:35:15 -0500 Subject: Error: file-5.00: Thumbs.db : Cannot read short stream (Invalid argument) In-Reply-To: <20090220154911.5144B5654E@rebar.astron.com> References: <20090220154911.5144B5654E@rebar.astron.com> Message-ID: <200902211835.16249.vapier@gentoo.org> On Friday 20 February 2009 10:49:11 Christos Zoulas wrote: > Index: readcdf.c > =================================================================== > RCS file: /p/file/cvsroot/file/src/readcdf.c,v > retrieving revision 1.12 > diff -u -u -r1.12 readcdf.c is the cvs tree available publicly somewhere ? i dont imagine i'm the only person who like to browse it ... > --- readcdf.c 13 Feb 2009 18:46:48 -0000 1.12 > +++ readcdf.c 20 Feb 2009 15:45:39 -0000 > @@ -129,7 +129,7 @@ > case CDF_CLIPBOARD: > break; > default: > - file_error(ms, 0, "Internal parsing error"); > + errno = EFTYPE; > return -1; > } > } using this patch straight results in an error as EFTYPE is handled only in cdf.c currently ... you've probably already fixed this in your latest tree, so this is more of a heads up to everyone else -mike -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: This is a digitally signed message part. URL: From christos at zoulas.com Sun Feb 22 01:41:45 2009 From: christos at zoulas.com (Christos Zoulas) Date: Sat, 21 Feb 2009 18:41:45 -0500 Subject: Error: file-5.00: Thumbs.db : Cannot read short stream (Invalid argument) In-Reply-To: <200902211835.16249.vapier@gentoo.org> from Mike Frysinger (Feb 21, 6:35pm) Message-ID: <20090221234145.58C005654E@rebar.astron.com> On Feb 21, 6:35pm, vapier at gentoo.org (Mike Frysinger) wrote: -- Subject: Re: Error: file-5.00: Thumbs.db : Cannot read short stream (Inval | is the cvs tree available publicly somewhere ? i dont imagine i'm the only | person who like to browse it ... I'll make it available when I upgrade the machine, which needs upgrading because it has faulty memory and really old software. | > --- readcdf.c 13 Feb 2009 18:46:48 -0000 1.12 | > +++ readcdf.c 20 Feb 2009 15:45:39 -0000 | > @@ -129,7 +129,7 @@ | > case CDF_CLIPBOARD: | > break; | > default: | > - file_error(ms, 0, "Internal parsing error"); | > + errno =3D EFTYPE; | > return -1; | > } | > } | | using this patch straight results in an error as EFTYPE is handled only in | | cdf.c currently ... you've probably already fixed this in your latest tree, | so | this is more of a heads up to everyone else | mike Just removing setting errno should work, right? It should not matter anyway. christos From vapier at gentoo.org Sun Feb 22 02:06:20 2009 From: vapier at gentoo.org (Mike Frysinger) Date: Sat, 21 Feb 2009 19:06:20 -0500 Subject: Error: file-5.00: Thumbs.db : Cannot read short stream (Invalid argument) In-Reply-To: <20090221234145.58C005654E@rebar.astron.com> References: <20090221234145.58C005654E@rebar.astron.com> Message-ID: <200902211906.21666.vapier@gentoo.org> On Saturday 21 February 2009 18:41:45 Christos Zoulas wrote: > On Feb 21, 6:35pm, vapier at gentoo.org (Mike Frysinger) wrote: > -- Subject: Re: Error: file-5.00: Thumbs.db : Cannot read short stream > | > --- readcdf.c 13 Feb 2009 18:46:48 -0000 1.12 > | > +++ readcdf.c 20 Feb 2009 15:45:39 -0000 > | > @@ -129,7 +129,7 @@ > | > case CDF_CLIPBOARD: > | > break; > | > default: > | > - file_error(ms, 0, "Internal parsing error"); > | > + errno =3D EFTYPE; > | > return -1; > | > } > | > } > | > | using this patch straight results in an error as EFTYPE is handled only > | in cdf.c currently ... you've probably already fixed this in your latest > | tree, so this is more of a heads up to everyone else > > Just removing setting errno should work, right? It should not matter > anyway. sure, because then EFTYPE wouldnt be used, so there'd be no way to hit an undefined error ;) -mike -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: This is a digitally signed message part. URL: From sorcer76 at gmail.com Wed Mar 4 23:23:29 2009 From: sorcer76 at gmail.com (lynn kelkeys) Date: Wed, 4 Mar 2009 13:23:29 -0800 Subject: sef-extracting archive - EICAR file - not recognised Message-ID: <9577160a0903041323w6cf74cc8n5dd31ddeb314e9@mail.gmail.com> The self-extracting archives at http://www.csm-testcenter.org/test?do=show&subdo=antimalware&test=archivesare not identified by the 'msdos' magic file. is there an update for this file? If not, any ideas how to identify these files? The ChangeLog file show 2008-08-30 as last entry. Thanks for any help! Lynn -------------- next part -------------- An HTML attachment was scrubbed... URL: From christos at zoulas.com Thu Mar 5 16:08:10 2009 From: christos at zoulas.com (Christos Zoulas) Date: Thu, 5 Mar 2009 09:08:10 -0500 Subject: sef-extracting archive - EICAR file - not recognised In-Reply-To: <9577160a0903041323w6cf74cc8n5dd31ddeb314e9@mail.gmail.com> from lynn kelkeys (Mar 4, 1:23pm) Message-ID: <20090305140811.0BACC56550@rebar.astron.com> On Mar 4, 1:23pm, sorcer76 at gmail.com (lynn kelkeys) wrote: -- Subject: sef-extracting archive - EICAR file - not recognised | The self-extracting archives at | http://www.csm-testcenter.org/test?do=show&subdo=antimalware&test=archivesare | not identified by the 'msdos' magic file. | | is there an update for this file? If not, any ideas how to identify these | files? | | The ChangeLog file show 2008-08-30 as last entry. | | Thanks for any help! | | Lynn some of them are not executables. which one do you have in mind? christos From asbjorn at asbjorn.biz Thu Mar 5 16:47:29 2009 From: asbjorn at asbjorn.biz (=?ISO-8859-1?Q?Asbj=F8rn_Sloth_T=F8nnesen?=) Date: Thu, 05 Mar 2009 14:47:29 +0000 Subject: romfs Message-ID: <49AFE601.2070101@asbjorn.biz> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > > # romfs filesystems - Juan Cespedes > 0 string -rom1fs-\0 romfs filesystem, version 1 remove the \0 to add support for huge romfs, the zero isn't required in the documentation. And triggers my early romfs image, little endian, to fail the match, thereas without the null byte it would just have a wrong filesize screaming check endianness. >>8 belong x %d bytes, >>16 string x named %s. romfs documentation http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/filesystems/romfs.txt;h=2d2a7b2a16b9f64b7aaaaa50e9ce48adc32514d7;hb=HEAD - -- Best regards Asbj?rn Sloth T?nnesen -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkmv5gEACgkQSViWlxucwupASACfU7AdKnp4CkxJDH/zkqWOOsM8 xiYAoICEUkvcwyJQd8rHoHaEcMNb2c06 =6WDW -----END PGP SIGNATURE----- From christos at zoulas.com Thu Mar 5 17:37:18 2009 From: christos at zoulas.com (Christos Zoulas) Date: Thu, 5 Mar 2009 10:37:18 -0500 Subject: romfs In-Reply-To: <49AFE601.2070101@asbjorn.biz> from =?ISO-8859-1?Q?Asbj=F8rn_Sloth_T=F8nnesen?= (Mar 5, 2:47pm) Message-ID: <20090305153718.98FCF5654E@rebar.astron.com> On Mar 5, 2:47pm, asbjorn at asbjorn.biz (=?ISO-8859-1?Q?Asbj=F8rn_Sloth_T=F8nnesen?=) wrote: -- Subject: romfs | >=20 | > # romfs filesystems - Juan Cespedes | > 0 string -rom1fs-\0 romfs filesystem, version 1 | remove the \0 to add support for huge romfs, the zero isn't required in | the documentation. And triggers my early romfs image, little endian, | to fail the match, thereas without the null byte it would just have a | wrong filesize screaming check endianness. | | >>8 belong x %d bytes, | >>16 string x named %s. | | romfs documentation | http://git.kernel.org/?p=3Dlinux/kernel/git/torvalds/linux-2.6.git;a=3Dblob= | ;f=3DDocumentation/filesystems/romfs.txt;h=3D2d2a7b2a16b9f64b7aaaaa50e9ce48= | adc32514d7;hb=3DHEAD Thanks, fixed! christos From vapier at gentoo.org Mon Mar 9 02:28:20 2009 From: vapier at gentoo.org (Mike Frysinger) Date: Sun, 8 Mar 2009 19:28:20 -0500 Subject: crash with large postscript and file-5.00 Message-ID: <200903082028.20951.vapier@gentoo.org> a user reported that file-5.00 crashes when running on a large postscript file. the file in question is like 10 megs, so here is the URL for it (it doesnt want to compress at all): http://noc.axelspringer.pl/bug/p002.pps here's the backtrace with vanilla file-5.00: $ gdb --args ./file ~/p002.pps .... (gdb) bt #0 0x00007f73591721e5 in *__GI_raise (sig=) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00007f7359173703 in *__GI_abort () at abort.c:88 #2 0x00007f73591ad998 in __libc_message (do_abort=0x2, fmt=0x7f735925e9b8 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:170 #3 0x00007f73591b3138 in malloc_printerr (action=0x2, str=0x7f735925e9e8 "munmap_chunk(): invalid pointer", ptr=) at malloc.c:5994 #4 0x0000000000416aa4 in cdf_read_sat (fd=0x6, h=0x7fff616ae450, sat=0x7fff616ae440) at cdf.c:324 #5 0x000000000040c12c in file_trycdf (ms=0x12c1f20, fd=0x6, buf=0x7f7359666010 "??\021?\032?", nbytes=0x40000) at readcdf.c:202 #6 0x000000000040b263 in file_buffer (ms=0x12c1f20, fd=0x6, inname=0x7fff616b1695 "/root/p002.pps", buf=0x7f7359666010, nb=0x40000) at funcs.c:222 #7 0x0000000000403940 in file_or_fd (ms=0x12c1f20, inname=0x7fff616b1695 "/root/p002.pps", fd=0x6) at magic.c:357 #8 0x000000000040365e in magic_file (ms=0x12c1f20, inname=0x7fff616b1695 "/root/p002.pps") at magic.c:254 #9 0x0000000000402e28 in process (ms=0x12c1f20, inname=0x7fff616b1695 "/root/p002.pps", wid=0xe) at file.c:431 #10 0x0000000000402aa4 in main (argc=0x2, argv=0x7fff616b0a08) at file.c:343 -mike -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 836 bytes Desc: This is a digitally signed message part. URL: From christos at zoulas.com Mon Mar 9 03:50:22 2009 From: christos at zoulas.com (Christos Zoulas) Date: Sun, 8 Mar 2009 21:50:22 -0400 Subject: crash with large postscript and file-5.00 In-Reply-To: <200903082028.20951.vapier@gentoo.org> from Mike Frysinger (Mar 8, 7:28pm) Message-ID: <20090309015022.36FF35654F@rebar.astron.com> On Mar 8, 7:28pm, vapier at gentoo.org (Mike Frysinger) wrote: -- Subject: crash with large postscript and file-5.00 | a user reported that file-5.00 crashes when running on a large postscript | file. the file in question is like 10 megs, so here is the URL for it (it | doesnt want to compress at all): | http://noc.axelspringer.pl/bug/p002.pps Are you using ftp://ftp.astron.com/pri/file-5.00.tar.gz which has the latest fixes? christos From vapier at gentoo.org Mon Mar 9 04:39:57 2009 From: vapier at gentoo.org (Mike Frysinger) Date: Sun, 8 Mar 2009 21:39:57 -0500 Subject: crash with large postscript and file-5.00 In-Reply-To: <20090309015022.36FF35654F@rebar.astron.com> References: <20090309015022.36FF35654F@rebar.astron.com> Message-ID: <200903082239.58031.vapier@gentoo.org> On Sunday 08 March 2009 21:50:22 Christos Zoulas wrote: > On Mar 8, 7:28pm, vapier at gentoo.org (Mike Frysinger) wrote: > -- Subject: crash with large postscript and file-5.00 > | a user reported that file-5.00 crashes when running on a large postscript > | file. the file in question is like 10 megs, so here is the URL for it > | (it doesnt want to compress at all): > | http://noc.axelspringer.pl/bug/p002.pps > > Are you using ftp://ftp.astron.com/pri/file-5.00.tar.gz which has the > latest fixes? no, because i wasnt expecting the tarball to be randomly changed. can we please discontinue this practice ? changing tarballs makes distros' life a pain because we d/l, mirror, and hash these things. plus, we have no idea that the file was changed in the first place. btw, the readcdf.c error i referred to in a previous thread still applies ... unpacking the tarball and running configure/make ends in an error. -mike -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 836 bytes Desc: This is a digitally signed message part. URL: From christos at zoulas.com Mon Mar 9 04:43:25 2009 From: christos at zoulas.com (Christos Zoulas) Date: Sun, 8 Mar 2009 22:43:25 -0400 Subject: crash with large postscript and file-5.00 In-Reply-To: <200903082239.58031.vapier@gentoo.org> from Mike Frysinger (Mar 8, 9:39pm) Message-ID: <20090309024325.498C45654E@rebar.astron.com> On Mar 8, 9:39pm, vapier at gentoo.org (Mike Frysinger) wrote: -- Subject: Re: crash with large postscript and file-5.00 | | --MIMEStream=_0+105312_9669819814599_1677345382 | Content-Type: multipart/signed; | boundary="nextPart2308920.OI24hI2IAl"; | protocol="application/pgp-signature"; | micalg=pgp-sha1 | Content-Transfer-Encoding: 7bit | | | --nextPart2308920.OI24hI2IAl | Content-Type: text/plain; | charset="iso-8859-1" | Content-Transfer-Encoding: quoted-printable | Content-Disposition: inline | | On Sunday 08 March 2009 21:50:22 Christos Zoulas wrote: | > On Mar 8, 7:28pm, vapier at gentoo.org (Mike Frysinger) wrote: | > -- Subject: crash with large postscript and file-5.00 | > | a user reported that file-5.00 crashes when running on a large postscri= | pt | > | file. the file in question is like 10 megs, so here is the URL for it | > | (it doesnt want to compress at all): | > | http://noc.axelspringer.pl/bug/p002.pps | > | > Are you using ftp://ftp.astron.com/pri/file-5.00.tar.gz which has the | > latest fixes? | | no, because i wasnt expecting the tarball to be randomly changed. can we=20 | please discontinue this practice ? changing tarballs makes distros' life a= | =20 | pain because we d/l, mirror, and hash these things. plus, we have no idea= | =20 | that the file was changed in the first place. | | btw, the readcdf.c error i referred to in a previous thread still applies .= | ..=20 | unpacking the tarball and running configure/make ends in an error. | =2Dmike This is not a release tarball, hence the /pri/. I never change release tarballs. It is at best a beta, and might have more bugs. Can you please paste the error again? christos From vapier at gentoo.org Mon Mar 9 04:57:00 2009 From: vapier at gentoo.org (Mike Frysinger) Date: Sun, 8 Mar 2009 21:57:00 -0500 Subject: crash with large postscript and file-5.00 In-Reply-To: <20090309015022.36FF35654F@rebar.astron.com> References: <20090309015022.36FF35654F@rebar.astron.com> Message-ID: <200903082257.01224.vapier@gentoo.org> On Sunday 08 March 2009 21:50:22 Christos Zoulas wrote: > On Mar 8, 7:28pm, vapier at gentoo.org (Mike Frysinger) wrote: > -- Subject: crash with large postscript and file-5.00 > > | a user reported that file-5.00 crashes when running on a large postscript > | file. the file in question is like 10 megs, so here is the URL for it > | (it doesnt want to compress at all): > | http://noc.axelspringer.pl/bug/p002.pps > > Are you using ftp://ftp.astron.com/pri/file-5.00.tar.gz which has the > latest fixes? i just fetched that tarball (md5 c615e6797e3afa79c3e19e39f454b8d2) and the problem still exists -mike -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 836 bytes Desc: This is a digitally signed message part. URL: From sorcer76 at gmail.com Tue Mar 10 01:04:08 2009 From: sorcer76 at gmail.com (lynn kelkeys) Date: Mon, 9 Mar 2009 16:04:08 -0700 Subject: self-extracting archive - EICAR file - not recognized Message-ID: <9577160a0903091604n5dbc6adfm9eae7c4cbe3cf5a8@mail.gmail.com> In particular, the self extracting zip file at http://www.csm-testcenter.org/download/archives/zip/eicar.exe. It is labeled "ZIP-Archive (Self-Extracted)" on the Archives page at http://www.csm-testcenter.org/test?do=show&subdo=antimalware&test=archives. The other self-extracting files are also not recognized. Thanks for replying. -- Lynn -------------- next part -------------- An HTML attachment was scrubbed... URL: From vapier at gentoo.org Tue Mar 10 09:31:09 2009 From: vapier at gentoo.org (Mike Frysinger) Date: Tue, 10 Mar 2009 03:31:09 -0400 Subject: crash with large postscript and file-5.00 In-Reply-To: <20090309024325.498C45654E@rebar.astron.com> References: <20090309024325.498C45654E@rebar.astron.com> Message-ID: <200903100331.11632.vapier@gentoo.org> On Sunday 08 March 2009 22:43:25 Christos Zoulas wrote: > Can you please paste the error again? i fetched the latest tarball (md5 = c615e6797e3afa79c3e19e39f454b8d2) and here is a transcript. built with CFLAGS '-g -ggdb -O0'. -mike $ gdb --args ./file ~/p002.pps (gdb) r Starting program: /home/vapier/file-5.00/src/file /home/vapier/p002.pps *** glibc detected *** /home/vapier/file-5.00/src/file: munmap_chunk(): invalid pointer: 0x0000000000bdaf50 *** ======= Backtrace: ========= /lib/libc.so.6[0x7f3a4dd83138] /home/vapier/file-5.00/src/file[0x416c16] /home/vapier/file-5.00/src/file[0x40c1e9] /home/vapier/file-5.00/src/file[0x40b25b] /home/vapier/file-5.00/src/file[0x403940] /home/vapier/file-5.00/src/file[0x40365e] /home/vapier/file-5.00/src/file[0x402e28] /home/vapier/file-5.00/src/file[0x402aa4] /lib/libc.so.6(__libc_start_main+0xe6)[0x7f3a4dd2e5c6] /home/vapier/file-5.00/src/file[0x4023b9] ======= Memory map: ======== 00400000-00421000 r-xp 00000000 08:03 804298 /home/vapier/file-5.00/src/file 00620000-00621000 r--p 00020000 08:03 804298 /home/vapier/file-5.00/src/file 00621000-00622000 rw-p 00021000 08:03 804298 /home/vapier/file-5.00/src/file 00bcc000-00bed000 rw-p 00bcc000 00:00 0 [heap] 7f3a4d3d5000-7f3a4d3eb000 r-xp 00000000 08:03 329697 /lib64/libgcc_s.so.1 7f3a4d3eb000-7f3a4d5ea000 ---p 00016000 08:03 329697 /lib64/libgcc_s.so.1 7f3a4d5ea000-7f3a4d5eb000 r--p 00015000 08:03 329697 /lib64/libgcc_s.so.1 7f3a4d5eb000-7f3a4d5ec000 rw-p 00016000 08:03 329697 /lib64/libgcc_s.so.1 7f3a4d5ec000-7f3a4d7ed000 rw-p 7f3a4d5ec000 00:00 0 7f3a4d7ed000-7f3a4d997000 rw-p 00000000 08:03 4137184 /usr/share/misc/magic.mgc 7f3a4d997000-7f3a4dd10000 r--p 00000000 08:03 1570495 /usr/lib64/locale/locale-archive 7f3a4dd10000-7f3a4de5b000 r-xp 00000000 08:03 2927064 /lib64/libc-2.9.so 7f3a4de5b000-7f3a4e05b000 ---p 0014b000 08:03 2927064 /lib64/libc-2.9.so 7f3a4e05b000-7f3a4e05f000 r--p 0014b000 08:03 2927064 /lib64/libc-2.9.so 7f3a4e05f000-7f3a4e060000 rw-p 0014f000 08:03 2927064 /lib64/libc-2.9.so 7f3a4e060000-7f3a4e065000 rw-p 7f3a4e060000 00:00 0 7f3a4e065000-7f3a4e082000 r-xp 00000000 08:03 2927071 /lib64/ld-2.9.so 7f3a4e118000-7f3a4e11a000 rw-p 7f3a4e118000 00:00 0 7f3a4e11a000-7f3a4e12e000 r-xp 00000000 08:03 1620419 /lib64/libz.so.1.2.3 7f3a4e12e000-7f3a4e22d000 ---p 00014000 08:03 1620419 /lib64/libz.so.1.2.3 7f3a4e22d000-7f3a4e22f000 rw-p 00013000 08:03 1620419 /lib64/libz.so.1.2.3 7f3a4e236000-7f3a4e278000 rw-p 7f3a4e236000 00:00 0 7f3a4e278000-7f3a4e27f000 r--s 00000000 08:03 7621424 /usr/lib64/gconv/gconv-modules.cache 7f3a4e27f000-7f3a4e281000 rw-p 7f3a4e27f000 00:00 0 7f3a4e281000-7f3a4e282000 r--p 0001c000 08:03 2927071 /lib64/ld-2.9.so 7f3a4e282000-7f3a4e283000 rw-p 0001d000 08:03 2927071 /lib64/ld-2.9.so 7fff5626c000-7fff56282000 rw-p 7ffffffe9000 00:00 0 [stack] 7fff563fd000-7fff563fe000 r-xp 7fff563fd000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] /home/vapier/p002.pps: Program received signal SIGABRT, Aborted. 0x00007f3a4dd421e5 in *__GI_raise (sig=) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 64 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory. in ../nptl/sysdeps/unix/sysv/linux/raise.c (gdb) bt #0 0x00007f3a4dd421e5 in *__GI_raise (sig=) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00007f3a4dd43703 in *__GI_abort () at abort.c:88 #2 0x00007f3a4dd7d998 in __libc_message (do_abort=0x2, fmt=0x7f3a4de2e9b8 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:170 #3 0x00007f3a4dd83138 in malloc_printerr (action=0x2, str=0x7f3a4de2e9e8 "munmap_chunk(): invalid pointer", ptr=) at malloc.c:5994 #4 0x0000000000416c16 in cdf_read_sat (info=0x7fff5627bc20, h=0x7fff5627ba20, sat=0x7fff5627ba10) at cdf.c:350 #5 0x000000000040c1e9 in file_trycdf (ms=0xbccf20, fd=0x6, buf=0x7f3a4e236010 "??\021?\032?", nbytes=0x40000) at readcdf.c:219 #6 0x000000000040b25b in file_buffer (ms=0xbccf20, fd=0x6, inname=0x7fff562800e2 "/home/vapier/p002.pps", buf=0x7f3a4e236010, nb=0x40000) at funcs.c:222 #7 0x0000000000403940 in file_or_fd (ms=0xbccf20, inname=0x7fff562800e2 "/home/vapier/p002.pps", fd=0x6) at magic.c:335 #8 0x000000000040365e in magic_file (ms=0xbccf20, inname=0x7fff562800e2 "/home/vapier/p002.pps") at magic.c:248 #9 0x0000000000402e28 in process (ms=0xbccf20, inname=0x7fff562800e2 "/home/vapier/p002.pps", wid=0x15) at file.c:431 #10 0x0000000000402aa4 in main (argc=0x2, argv=0x7fff5627dff8) at file.c:343 (gdb) bt full #0 0x00007f3a4dd421e5 in *__GI_raise (sig=) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 pid = selftid = #1 0x00007f3a4dd43703 in *__GI_abort () at abort.c:88 act = { __sigaction_handler = { sa_handler = 0x7fff5627aea0, sa_sigaction = 0x7fff5627aea0 }, sa_mask = { __val = {0x7fff5627af30, 0x700000000, 0x7fff5627af70, 0x7fff562800c2, 0x1f, 0x7f3a4de2d25b, 0x3, 0x7fff5627af6a, 0x6, 0x7f3a4de2d25f, 0x2, 0x7fff5627af5e, 0x2, 0x7f3a4de2be4a, 0x1, 0x7f3a4de2d25b} }, sa_flags = 0x3, sa_restorer = 0x7fff5627af64 } sigs = { __val = {0x20, 0x0 } } #2 0x00007f3a4dd7d998 in __libc_message (do_abort=0x2, fmt=0x7f3a4de2e9b8 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:170 ap = {{ gp_offset = 0x28, fp_offset = 0x30, overflow_arg_area = 0x7fff5627b8c0, reg_save_area = 0x7fff5627b7d0 }} ap_copy = {{ gp_offset = 0x10, fp_offset = 0x30, overflow_arg_area = 0x7fff5627b8c0, reg_save_area = 0x7fff5627b7d0 }} fd = 0x7 on_2 = list = nlist = cp = written = 0x6 #3 0x00007f3a4dd83138 in malloc_printerr (action=0x2, str=0x7f3a4de2e9e8 "munmap_chunk(): invalid pointer", ptr=) at malloc.c:5994 buf = "0000000000bdaf50" cp = #4 0x0000000000416c16 in cdf_read_sat (info=0x7fff5627bc20, h=0x7fff5627ba20, sat=0x7fff5627ba10) at cdf.c:350 i = 0xe9 j = 0x0 k = 0x7c ss = 0x200 msa = (cdf_secid_t *) 0xbdaf50 mid = 0x503e #5 0x000000000040c1e9 in file_trycdf (ms=0xbccf20, fd=0x6, buf=0x7f3a4e236010 "??\021?\032?", nbytes=0x40000) at readcdf.c:219 info = { i_fd = 0x6, i_buf = 0x7f3a4e236010 "??\021?\032?", i_len = 0x40000 } h = { h_magic = 0xe11ab1a1e011cfd0, h_uuid = {0x0, 0x0}, h_revision = 0x3e, h_version = 0x3, h_byte_order = 0xfffe, h_sec_size_p2 = 0x9, h_short_sec_size_p2 = 0x6, h_unused0 = "\000\000\000\000\000\000\000\000\000", h_num_sectors_in_sat = 0xa1, h_secid_first_directory = 0x5045, h_unused1 = "\000\000\000", h_min_size_standard_stream = 0x1000, h_secid_first_sector_in_short_sat = 0x5046, h_num_sectors_in_short_sat = 0x1, h_secid_first_sector_in_master_sat = 0x503e, h_num_sectors_in_master_sat = 0x1, h_master_sat = {0x4d89, 0x4d8a, 0x4d8b, 0x4d8c, 0x4d8d, 0x4d8e, 0x4d8f, 0x4d90, 0x4d91, 0x4d92, 0x4d93, 0x4d94, 0x4d95, 0x4d96, 0x4d97, 0x4d98, 0x4d99, 0x4d9a, 0x4d9b, 0x4d9c, 0x4d9d, 0x4d9e, 0x4d9f, 0x4da0, 0x4da1, 0x4da2, 0x4da3, 0x4da4, 0x4da5, 0x4da6, 0x4da7, 0x4da8, 0x4da9, 0x4daa, 0x4dab, 0x4dac, 0x4dad, 0x4dae, 0x4daf, 0x4db0, 0x4db1, 0x4db2, 0x4db3, 0x4db4, 0x4db5, 0x4db6, 0x4db7, 0x4db8, 0x4db9, 0x4dba, 0x4dbb, 0x4dbc, 0x4dbd, 0x4dbe, 0x4dbf, 0x4dc0, 0x4dc1, 0x4dc2, 0x4dc3, 0x4dc4, 0x4dc5, 0x4dc6, 0x4dc7, 0x4dc8, 0x4dc9, 0x4dca, 0x4dcb, 0x4dcc, 0x4dcd, 0x4dce, 0x4dcf, 0x4dd0, 0x4dd1, 0x4dd2, 0x4dd3, 0x4dd4, 0x4dd5, 0x4dd6, 0x4dd7, 0x4dd8, 0x4dd9, 0x4dda, 0x4ddb, 0x4ddc, 0x4ddd, 0x4dde, 0x4ddf, 0x4de0, 0x4de1, 0x4de2, 0x4de3, 0x4de4, 0x4de5, 0x4de6, 0x4de7, 0x4de8, 0x4de9, 0x4dea, 0x4deb, 0x4dec, 0x4ded, 0x4dee, 0x4def, 0x4df0, 0x4df1, 0x4df2, 0x4df3, 0x4df4, 0x4df5} } sat = { sat_tab = 0xbcd340, sat_len = 0x6e } ssat = { sat_tab = 0x7fff5627bb68, sat_len = 0x7f3a4dd13ba8 } sst = { sst_tab = 0x7f3a00000001, sst_len = 0x0, sst_dirlen = 0x1 } scn = { sst_tab = 0x0, sst_len = 0x7f3a4e06e5fe, sst_dirlen = 0x0 } dir = { dir_tab = 0x22494966, dir_len = 0x7fff5627ba90 } i = 0x0 expn = 0x41c9f4 "" #6 0x000000000040b25b in file_buffer (ms=0xbccf20, fd=0x6, inname=0x7fff562800e2 "/home/vapier/p002.pps", buf=0x7f3a4e236010, nb=0x40000) at funcs.c:222 m = 0x0 rv = 0x0 looks_text = 0x0 mime = 0x0 ubuf = (const unsigned char *) 0x7f3a4e236010 "??\021?\032?" u8buf = (unichar *) 0x7f3a4d5ec010 ulen = 0x2 code = 0x0 code_mime = 0x41c7d9 "binary" type = 0x41bfc4 "binary" #7 0x0000000000403940 in file_or_fd (ms=0xbccf20, inname=0x7fff562800e2 "/home/vapier/p002.pps", fd=0x6) at magic.c:335 rv = 0xffffffff buf = (unsigned char *) 0x7f3a4e236010 "??\021?\032?" sb = { st_dev = 0x803, st_ino = 0x222f2a, st_nlink = 0x1, st_mode = 0x81a4, st_uid = 0x0, st_gid = 0x0, __pad0 = 0x0, st_rdev = 0x0, st_size = 0xa09200, st_blksize = 0x1000, st_blocks = 0x5070, st_atim = { tv_sec = 0x49b46092, tv_nsec = 0x0 }, st_mtim = { tv_sec = 0x49aa851e, tv_nsec = 0x0 }, st_ctim = { tv_sec = 0x49b61333, tv_nsec = 0x0 }, __unused = {0x0, 0x0, 0x0} } nbytes = 0x40000 ispipe = 0x0 #8 0x000000000040365e in magic_file (ms=0xbccf20, inname=0x7fff562800e2 "/home/vapier/p002.pps") at magic.c:248 No locals. #9 0x0000000000402e28 in process (ms=0xbccf20, inname=0x7fff562800e2 "/home/vapier/p002.pps", wid=0x15) at file.c:431 type = 0x15
std_in = 0x0 #10 0x0000000000402aa4 in main (argc=0x2, argv=0x7fff5627dff8) at file.c:343 j = 0x2 wid = 0x15 nw = 0x15 c = 0xffffffff i = 0x418f00 action = 0x0 didsomefiles = 0x0 errflg = 0x0 flags = 0x0 e = 0x0 home = 0x7fff56281d1a "/home/vapier" usermagic = 0x0 magic = (struct magic_set *) 0xbccf20 magicpath = "/home/vapier/.magic", '\0' , "\027?\006N:\177", '\0' , "?\024\000\000\000\000\000?\024\000\000\000\000\000?\024", '\0' , "\005\000\000\000\000\000\000\000\000?4\000\000\000\000\000\000\0005\000\000\000\000\000\230?4\000\000\000\000\000\230B5\000\000\000\000\000\000?\024\000\000\000\000\000\003", '\0' , "f?\006N:\177", '\0' , "@\001\000\000\000\000\000\2246\001\000\000\000\000\000\2246\001", '\0' , "\005\000\000\000\000\000\000\000\0000\021\000\000\000\000\000\000P\021\000\000\000\000\000"... longindex = 0x0 magicfile = 0x41907d "/usr/share/misc/magic" (gdb) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 836 bytes Desc: This is a digitally signed message part. URL: From christos at zoulas.com Wed Mar 11 03:43:38 2009 From: christos at zoulas.com (Christos Zoulas) Date: Tue, 10 Mar 2009 21:43:38 -0400 Subject: crash with large postscript and file-5.00 In-Reply-To: <200903082257.01224.vapier@gentoo.org> from Mike Frysinger (Mar 8, 9:57pm) Message-ID: <20090311014339.053525654F@rebar.astron.com> On Mar 8, 9:57pm, vapier at gentoo.org (Mike Frysinger) wrote: -- Subject: Re: crash with large postscript and file-5.00 | > Are you using ftp://ftp.astron.com/pri/file-5.00.tar.gz which has the | > latest fixes? | | i just fetched that tarball (md5 c615e6797e3afa79c3e19e39f454b8d2) and the= | =20 | problem still exists Can you try it again? MD5 (/p/astron/ftp/pri/file-5.00.tar.gz) = a0d2193a9c7f794a7b1617212e20a1cb christos From vapier at gentoo.org Wed Mar 11 04:11:48 2009 From: vapier at gentoo.org (Mike Frysinger) Date: Tue, 10 Mar 2009 22:11:48 -0400 Subject: crash with large postscript and file-5.00 In-Reply-To: <20090311014339.053525654F@rebar.astron.com> References: <20090311014339.053525654F@rebar.astron.com> Message-ID: <200903102211.49063.vapier@gentoo.org> On Tuesday 10 March 2009 21:43:38 Christos Zoulas wrote: > On Mar 8, 9:57pm, vapier at gentoo.org (Mike Frysinger) wrote: > -- Subject: Re: crash with large postscript and file-5.00 > > | > Are you using ftp://ftp.astron.com/pri/file-5.00.tar.gz which has the > | > latest fixes? > | > | i just fetched that tarball (md5 c615e6797e3afa79c3e19e39f454b8d2) and > | the problem still exists > > Can you try it again? > > MD5 (/p/astron/ftp/pri/file-5.00.tar.gz) = a0d2193a9c7f794a7b1617212e20a1cb still fails -mike -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 836 bytes Desc: This is a digitally signed message part. URL: From christos at zoulas.com Wed Mar 11 04:28:21 2009 From: christos at zoulas.com (Christos Zoulas) Date: Tue, 10 Mar 2009 22:28:21 -0400 Subject: crash with large postscript and file-5.00 In-Reply-To: <200903102211.49063.vapier@gentoo.org> from Mike Frysinger (Mar 10, 10:11pm) Message-ID: <20090311022821.9855F56550@rebar.astron.com> On Mar 10, 10:11pm, vapier at gentoo.org (Mike Frysinger) wrote: -- Subject: Re: crash with large postscript and file-5.00 | > MD5 (/p/astron/ftp/pri/file-5.00.tar.gz) = a0d2193a9c7f794a7b1617212e20a1cb | | still fails | -mike I guess I'll have to try it on linux... christos From christos at zoulas.com Wed Mar 11 04:30:08 2009 From: christos at zoulas.com (Christos Zoulas) Date: Tue, 10 Mar 2009 22:30:08 -0400 Subject: crash with large postscript and file-5.00 In-Reply-To: <20090311022821.9855F56550@rebar.astron.com> from Christos Zoulas (Mar 10, 10:28pm) Message-ID: <20090311023008.486035654F@rebar.astron.com> On Mar 10, 10:28pm, christos at zoulas.com (Christos Zoulas) wrote: -- Subject: Re: crash with large postscript and file-5.00 | On Mar 10, 10:11pm, vapier at gentoo.org (Mike Frysinger) wrote: | -- Subject: Re: crash with large postscript and file-5.00 | | | > MD5 (/p/astron/ftp/pri/file-5.00.tar.gz) = a0d2193a9c7f794a7b1617212e20a1cb | | | | still fails | | -mike | On NetBSD I get: ./file -m ../magic/magic.mgc p002.pps p002.pps: CDF V2 Document, corrupt: Can't read SSAT christos From sorcer76 at gmail.com Fri Mar 13 02:15:38 2009 From: sorcer76 at gmail.com (lynn kelkeys) Date: Thu, 12 Mar 2009 17:15:38 -0700 Subject: self-extracting archive - EICAR file - not recognized In-Reply-To: <9577160a0903091604n5dbc6adfm9eae7c4cbe3cf5a8@mail.gmail.com> References: <9577160a0903091604n5dbc6adfm9eae7c4cbe3cf5a8@mail.gmail.com> Message-ID: <9577160a0903121715y41333561y7f5df492898374ba@mail.gmail.com> Here is my last post again. Thanks. On Mon, Mar 9, 2009 at 4:04 PM, lynn kelkeys wrote: > In particular, the self extracting zip file at > http://www.csm-testcenter.org/download/archives/zip/eicar.exe. It is > labeled "ZIP-Archive (Self-Extracted)" on the Archives page at > http://www.csm-testcenter.org/test?do=show&subdo=antimalware&test=archives > . > > The other self-extracting files are also not recognized. Thanks for > replying. > > -- > Lynn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dnovotny at redhat.com Mon Mar 23 15:04:38 2009 From: dnovotny at redhat.com (Daniel Novotny) Date: Mon, 23 Mar 2009 09:04:38 -0400 (EDT) Subject: file descriptor leak in compress.c In-Reply-To: <2058902275.37251237813432067.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> Message-ID: <2006223992.37321237813478088.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> hello, while using libmagic on a very large number of zipped files in a RPM, we found a file descriptor leak, which can be reproduced by running "file" with the "-z" switch: [pmatilai at localhost ~]$ valgrind --track-fds=yes file -z /usr/lib64/openoffice.org/basis3.0/share/config/images.zip ==11616== Memcheck, a memory error detector. ==11616== Copyright (C) 2002-2008, and GNU GPL'd, by Julian Seward et al. ==11616== Using LibVEX rev 1884, a library for dynamic binary translation. ==11616== Copyright (C) 2004-2008, and GNU GPL'd, by OpenWorks LLP. ==11616== Using valgrind-3.4.1, a dynamic binary instrumentation framework. ==11616== Copyright (C) 2000-2008, and GNU GPL'd, by Julian Seward et al. ==11616== For more details, rerun with: -v ==11616== /usr/lib64/openoffice.org/basis3.0/share/config/images.zip: PNG image, 16 x 16, 8-bit/color RGBA, non-interlaced (Zip archive data, at least v2.0 to extract) ==11616== ==11616== FILE DESCRIPTORS: 4 open at exit. ==11616== Open file descriptor 4: ==11616== at 0x3E20AD8BD7: pipe (in /lib64/libc-2.9.so) ==11616== by 0x4C2F882: file_zmagic (compress.c:383) ==11616== by 0x4C34DA3: file_buffer (funcs.c:206) ==11616== by 0x4C272C6: file_or_fd (magic.c:357) ==11616== by 0x401228: process (file.c:431) ==11616== by 0x401CDB: main (file.c:343) ==11616== the patch is attached regards, Daniel Novotny -------------- next part -------------- A non-text attachment was scrubbed... Name: file-5.00-fdleak.patch Type: text/x-patch Size: 310 bytes Desc: not available URL: From christos at zoulas.com Mon Mar 23 16:20:11 2009 From: christos at zoulas.com (Christos Zoulas) Date: Mon, 23 Mar 2009 10:20:11 -0400 Subject: file descriptor leak in compress.c In-Reply-To: <2006223992.37321237813478088.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> from Daniel Novotny (Mar 23, 9:04am) Message-ID: <20090323142011.5EA9B5654E@rebar.astron.com> On Mar 23, 9:04am, dnovotny at redhat.com (Daniel Novotny) wrote: -- Subject: file descriptor leak in compress.c | the patch is attached | | regards, | | Daniel Novotny Thanks a lot! christos | | ------=_Part_1106_1056931247.1237813478087 | Content-Type: text/x-patch; name=file-5.00-fdleak.patch | Content-Transfer-Encoding: 7bit | Content-Disposition: attachment; filename=file-5.00-fdleak.patch | | diff -up file-5.00/src/compress.c.fdleak file-5.00/src/compress.c | --- file-5.00/src/compress.c.fdleak 2009-03-23 13:42:38.000000000 +0200 | +++ file-5.00/src/compress.c 2009-03-23 13:43:42.000000000 +0200 | @@ -486,6 +486,7 @@ err: | #else | (void)wait(NULL); | #endif | + (void) close(fdin[0]); | return n; | } | } | | ------=_Part_1106_1056931247.1237813478087 | Content-Type: text/plain; charset="us-ascii" | MIME-Version: 1.0 | Content-Transfer-Encoding: 7bit | Content-Disposition: inline | | _______________________________________________ | File mailing list | File at mx.gw.com | http://mx.gw.com/mailman/listinfo/file | | ------=_Part_1106_1056931247.1237813478087-- -- End of excerpt from Daniel Novotny From tledouxfr at gmail.com Tue Mar 24 13:13:56 2009 From: tledouxfr at gmail.com (Thomas Ledoux) Date: Tue, 24 Mar 2009 12:13:56 +0100 Subject: Add mimetype for icc profile files Message-ID: <248062180903240413n5bf72e7dh88c8852d18e8db50@mail.gmail.com> Hello, could it be possible to add the mimetype of the icc profile files as specified by the iana http://www.iana.org/assignments/media-types/application/vnd.iccprofile In the magic/Magdir/sun file, add the two lines # Microsoft ICM color profile 36 string acspMSFT Microsoft ICM Color Profile !:mime application/vnd.iccprofile # Sun KCMS 36 string acsp Kodak Color Management System, ICC Profile !:mime application/vnd.iccprofile Thanks in advance Thomas From asbjorn at asbjorn.biz Thu Apr 2 16:26:51 2009 From: asbjorn at asbjorn.biz (=?ISO-8859-1?Q?Asbj=F8rn_Sloth_T=F8nnesen?=) Date: Thu, 02 Apr 2009 13:26:51 +0000 Subject: Match strength Message-ID: <49D4BD1B.4060108@asbjorn.biz> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, I a file conversion project that I am working on I need to figure out the mime type of a file based on the content and file extention. I would like to use the power of File as much as possible so we can also handle stuff with a wrong file extension correctly. Until now we have just been using all magics, however we need to use the file extension in some cases. 1) Check strong magic rules 2) Guess based on file extension 3) Check weak magic rules Or 1) Check all magic rules 2) Lookup the strength for magic matches, if any 3) Guess based on file extension 4) Fall back to weak magic match Is there a standard way to measure magic rule strength? Number of levels? Examples where we need to fall back to file extensions: example.bmp: application/octet-stream example.xpm: text/x-c - -- Best regards Asbj?rn Sloth T?nnesen -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAknUvRoACgkQSViWlxucwurgtACdEywErzpmJBFHkyPcm3qtoo81 v/UAnimKCAoPEOkyxx2+Ho/h+hdNKvQ6 =waqI -----END PGP SIGNATURE----- From christos at zoulas.com Thu Apr 2 16:43:05 2009 From: christos at zoulas.com (Christos Zoulas) Date: Thu, 2 Apr 2009 09:43:05 -0400 Subject: Match strength In-Reply-To: <49D4BD1B.4060108@asbjorn.biz> from =?ISO-8859-1?Q?Asbj=F8rn_Sloth_T=F8nnesen?= (Apr 2, 1:26pm) Message-ID: <20090402134305.4C61856550@rebar.astron.com> On Apr 2, 1:26pm, asbjorn at asbjorn.biz (=?ISO-8859-1?Q?Asbj=F8rn_Sloth_T=F8nnesen?=) wrote: -- Subject: Match strength | Hi, | | I a file conversion project that I am working on I need to figure out | the mime type of a file based on the content and file extention. | | I would like to use the power of File as much as possible so we can also | handle stuff with a wrong file extension correctly. | | Until now we have just been using all magics, however we need to use the | file extension in some cases. | | 1) Check strong magic rules | 2) Guess based on file extension | 3) Check weak magic rules | | Or | | 1) Check all magic rules | 2) Lookup the strength for magic matches, if any | 3) Guess based on file extension | 4) Fall back to weak magic match | | Is there a standard way to measure magic rule strength? Number of levels? | | Examples where we need to fall back to file extensions: | | example.bmp: application/octet-stream | example.xpm: text/x-c Look in apprentice.c for "strength", right now the API does not expose the strength factor of the particular magic selected. Also the strength does not deal with multi-level magic entries that use the "if no description keep going" rule. I think that you can black-list certain mime types such as the two above and fail back to extensions. christos From dnovotny at redhat.com Fri Apr 3 15:40:51 2009 From: dnovotny at redhat.com (Daniel Novotny) Date: Fri, 3 Apr 2009 08:40:51 -0400 (EDT) Subject: two new magic entries for fonts In-Reply-To: <1418401818.1186511238762437324.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> Message-ID: <72259026.1186561238762451564.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> 1) for PostScript Type1 fonts: the "search" directive does not work for some reason, added a string match at position 0 2) a brand new entry: True Type font collection patch attached. best regards, Daniel Novotny -------------- next part -------------- A non-text attachment was scrubbed... Name: file-5.00-fonts-ttc-pfa.patch Type: text/x-patch Size: 845 bytes Desc: not available URL: From christos at zoulas.com Fri Apr 3 16:51:24 2009 From: christos at zoulas.com (Christos Zoulas) Date: Fri, 3 Apr 2009 09:51:24 -0400 Subject: two new magic entries for fonts In-Reply-To: <72259026.1186561238762451564.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> from Daniel Novotny (Apr 3, 8:40am) Message-ID: <20090403135124.4DCEB56550@rebar.astron.com> On Apr 3, 8:40am, dnovotny at redhat.com (Daniel Novotny) wrote: -- Subject: two new magic entries for fonts | 1) for PostScript Type1 fonts: the "search" directive does not work for some reason, | added a string match at position 0 That's strange, I just tested it and it works... Can you run file -d on a magic file that has just the PostScript Type1 magic that uses search and see what happens? | 2) a brand new entry: True Type font collection Thanks, christos From dnovotny at redhat.com Fri Apr 3 17:15:25 2009 From: dnovotny at redhat.com (Daniel Novotny) Date: Fri, 3 Apr 2009 10:15:25 -0400 (EDT) Subject: two new magic entries for fonts In-Reply-To: <1220675027.1221681238768055023.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> Message-ID: <118442457.1221811238768125061.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> ----- "Christos Zoulas" wrote: > On Apr 3, 8:40am, dnovotny at redhat.com (Daniel Novotny) wrote: > -- Subject: two new magic entries for fonts > > | 1) for PostScript Type1 fonts: the "search" directive does not work > for some reason, > | added a string match at position 0 > > That's strange, I just tested it and it works... Can you run file -d > on > a magic file that has just the PostScript Type1 magic that uses > search > and see what happens? the problem is, that it says "PostScript document text" and our new font management tools require the string "font" to appear when it's a font file (that is why I made this change) [dnovotny at dhcp-lab-180 fonts]$ file -d NachlieliCLM-Bold.pfa 2> file-d.txt NachlieliCLM-Bold.pfa: PostScript document text "file-d.txt" attached the font file attached Daniel Novotny -------------- next part -------------- A non-text attachment was scrubbed... Name: NachlieliCLM-Bold.pfa Type: application/x-font-type1 Size: 55837 bytes Desc: not available URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: file-d.txt URL: From file at bzzt.net Wed Apr 8 02:06:55 2009 From: file at bzzt.net (Arnout Engelen) Date: Wed, 08 Apr 2009 01:06:55 +0200 Subject: Specify java version for class files Message-ID: <20090407230654.GE4490@bzzt.net> Hi, `file' already reports the major/minor version numbers of Java class files (e.g. '50.0'). It'd be convenient of those 'internal' numbers were converted to the more commonly known version numbers (e.g. '1.6'): In the cafebabe magic file: 0 belong 0xcafebabe !:mime application/x-java-applet >4 belong >30 compiled Java class data, >>6 beshort x version %d. >>4 beshort x \b%d # Which is which? #>>4 belong 0x032d (Java 1.0) #>>4 belong 0x032d (Java 1.1) >>4 belong 0x002e (Java 1.2) >>4 belong 0x002f (Java 1.3) >>4 belong 0x0030 (Java 1.4) >>4 belong 0x0031 (Java 1.5) >>4 belong 0x0032 (Java 1.6) Kind regards, Arnout From christos at zoulas.com Mon Apr 13 06:09:59 2009 From: christos at zoulas.com (Christos Zoulas) Date: Sun, 12 Apr 2009 23:09:59 -0400 Subject: Specify java version for class files In-Reply-To: <20090407230654.GE4490@bzzt.net> from Arnout Engelen (Apr 8, 1:06am) Message-ID: <20090413030959.342305654E@rebar.astron.com> On Apr 8, 1:06am, file at bzzt.net (Arnout Engelen) wrote: -- Subject: Specify java version for class files | Hi, | | `file' already reports the major/minor version numbers of Java class files | (e.g. '50.0'). It'd be convenient of those 'internal' numbers were converted | to the more commonly known version numbers (e.g. '1.6'): | | In the cafebabe magic file: | | 0 belong 0xcafebabe | !:mime application/x-java-applet | >4 belong >30 compiled Java class data, | >>6 beshort x version %d. | >>4 beshort x \b%d | # Which is which? | #>>4 belong 0x032d (Java 1.0) | #>>4 belong 0x032d (Java 1.1) | >>4 belong 0x002e (Java 1.2) | >>4 belong 0x002f (Java 1.3) | >>4 belong 0x0030 (Java 1.4) | >>4 belong 0x0031 (Java 1.5) | >>4 belong 0x0032 (Java 1.6) | I am not sure if I like this (because it will need constant updating), but I think I'll put it in anyway. christos From christos at zoulas.com Mon Apr 13 06:16:31 2009 From: christos at zoulas.com (Christos Zoulas) Date: Sun, 12 Apr 2009 23:16:31 -0400 Subject: two new magic entries for fonts In-Reply-To: <118442457.1221811238768125061.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> from Daniel Novotny (Apr 3, 10:15am) Message-ID: <20090413031631.BC2405654F@rebar.astron.com> On Apr 3, 10:15am, dnovotny at redhat.com (Daniel Novotny) wrote: -- Subject: Re: two new magic entries for fonts | [dnovotny at dhcp-lab-180 fonts]$ file -d NachlieliCLM-Bold.pfa 2> file-d.txt | NachlieliCLM-Bold.pfa: PostScript document text | | "file-d.txt" attached | the font file attached | | Daniel Novotny Thanks. I reverted the search/1 to string and everything works now. christos From Sven.Hartrumpf at FernUni-Hagen.de Mon Apr 27 12:26:50 2009 From: Sven.Hartrumpf at FernUni-Hagen.de (Sven Hartrumpf) Date: Mon, 27 Apr 2009 11:26:50 +0200 (CEST) Subject: shared library versions Message-ID: <20090427.112650.71104817.Sven.Hartrumpf@fernuni-hagen.de> Hi all. I have witnessed a crash when a "file" vers. 5 binary runs (accidentally) with a libmagic.so.1 from vers. 4.24. > uname -a Linux ... 2.6.27.19-3.2-default #1 SMP 2009-02-25 15:40:44 +0100 x86_64 x86_64 x86_64 GNU/Linux So, my dumb question: Should not the library version be increased? (in order to allow coexistence of different "file" versions) Sven From christos at zoulas.com Mon Apr 27 17:33:34 2009 From: christos at zoulas.com (Christos Zoulas) Date: Mon, 27 Apr 2009 10:33:34 -0400 Subject: shared library versions In-Reply-To: <20090427.112650.71104817.Sven.Hartrumpf@fernuni-hagen.de> from Sven Hartrumpf (Apr 27, 11:26am) Message-ID: <20090427143334.CCFE15654E@rebar.astron.com> On Apr 27, 11:26am, Sven.Hartrumpf at FernUni-Hagen.de (Sven Hartrumpf) wrote: -- Subject: shared library versions | Hi all. | | I have witnessed a crash when a "file" vers. 5 binary | runs (accidentally) with a libmagic.so.1 from vers. 4.24. | > uname -a | Linux ... 2.6.27.19-3.2-default #1 SMP 2009-02-25 15:40:44 +0100 x86_64 x86_64 x86_64 GNU/Linux | | So, my dumb question: | | Should not the library version be increased? | (in order to allow coexistence of different "file" versions) I don't think that the API's have changed, perhaps the crash is due to a bad/old magic compiled file? christos From dnovotny at redhat.com Tue Apr 28 11:57:32 2009 From: dnovotny at redhat.com (Daniel Novotny) Date: Tue, 28 Apr 2009 04:57:32 -0400 (EDT) Subject: file crashes when run on an MSI file In-Reply-To: <1847390298.45651240909031004.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> Message-ID: <436532989.45671240909052424.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> hello, When running file on an MSI file, file crashes. The following link causes a crash with file 5.x: http://www.python.org/ftp/python/2.6.2/python-2.6.2.msi % file python-2.6.2.msi *** glibc detected *** file: munmap_chunk(): invalid pointer: 0x0000000001a8cf50 *** Tested with file 4.x and the file is properly identified. best regards, Daniel Novotny From dnovotny at redhat.com Wed Apr 29 11:00:33 2009 From: dnovotny at redhat.com (Daniel Novotny) Date: Wed, 29 Apr 2009 04:00:33 -0400 (EDT) Subject: missing quotes in Erlang magic definition In-Reply-To: <1447565535.164241240991904935.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> Message-ID: <752347850.164361240992033513.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> hello, due to missing quotes in Erlang magic definition, every file with "Tue" at fourth byte is misrecognized to be erlang file, with additional date printed: [dnovotny at dhcp-lab-180 .libs]$ echo '1234Tue' | file - /dev/stdin: Jan 22 14:32:44 MET 1991\011Erlang JAM file - version 4.2 [dnovotny at dhcp-lab-180 .libs]$ the spaces in the "string" declaration in the magic file have to be escaped, the patch is in the attachment regards, Daniel Novotny p.s. see also https://bugs.launchpad.net/ubuntu/+source/file/+bug/248619 , https://bugzilla.redhat.com/show_bug.cgi?id=498036 -------------- next part -------------- A non-text attachment was scrubbed... Name: file-5.00-erlang.patch Type: text/x-patch Size: 755 bytes Desc: not available URL: From christos at zoulas.com Wed Apr 29 15:49:11 2009 From: christos at zoulas.com (Christos Zoulas) Date: Wed, 29 Apr 2009 08:49:11 -0400 Subject: missing quotes in Erlang magic definition In-Reply-To: <752347850.164361240992033513.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> from Daniel Novotny (Apr 29, 4:00am) Message-ID: <20090429124911.E279A5654E@rebar.astron.com> On Apr 29, 4:00am, dnovotny at redhat.com (Daniel Novotny) wrote: -- Subject: missing quotes in Erlang magic definition | hello, | | due to missing quotes in Erlang magic definition, every file with "Tue" at fourth byte | is misrecognized to be erlang file, with additional date printed: | | [dnovotny at dhcp-lab-180 .libs]$ echo '1234Tue' | file - | /dev/stdin: Jan 22 14:32:44 MET 1991\011Erlang JAM file - version 4.2 | [dnovotny at dhcp-lab-180 .libs]$ | | the spaces in the "string" declaration in the magic file have to be escaped, | the patch is in the attachment Thanks, already fixed! christos From christos at zoulas.com Fri May 1 00:27:01 2009 From: christos at zoulas.com (Christos Zoulas) Date: Thu, 30 Apr 2009 17:27:01 -0400 Subject: file 5.01 is now available Message-ID: <20090430212701.BE23F5654E@rebar.astron.com> Hello, File-5.01 is now available from: ftp://ftp.astron.com/pub/file/file-5.01.tar.gz This is as expected a bug fix release: * Fix more cdf lossage. All the documents I have right now print the correct information. * don't print \012- separators in the same magic entry if it consists of multiple magic printing lines. * Avoid file descriptor leak in compress code from (Daniel Novotny) * Allow escaping of relation characters, so that we can say \^[A-Z] and the ^ is not eaten as a relation char. * Fix troff and fortran to their previous glory using regex. This was broken since their removel from ascmagic. * don't use strlen in strndup() (Toby Peterson) * avoid c99 syntax. * make the cdf code use the buffer first if available, and then the fd code. * look for struct option to determine if getopt.h is usable for IRIX. * sanitize cdf document strings * fix OS/2 warnings. The cdf parser is now much more robust and I can't find yet a document where it fails. There are also quite a lot magic fixes that were regressions from 4.x. Right now, I believe that most regressions to 4.x have been fixed, and there is no reason to support 4.x anymore. Please correct me if I am wrong. Enjoy, christos From lists-file at cappella.us Fri May 1 00:53:11 2009 From: lists-file at cappella.us (Mike Cappella) Date: Thu, 30 Apr 2009 14:53:11 -0700 Subject: file 5.01 is now available In-Reply-To: <20090430212701.BE23F5654E@rebar.astron.com> References: <20090430212701.BE23F5654E@rebar.astron.com> Message-ID: <49FA1DC7.9090902@cappella.us> Christos, On 4/30/2009 2:27 PM, Christos Zoulas wrote: > File-5.01 is now available from: > Thanks for your continued support from just one of the many happy users. > This is as expected a bug fix release: > > * Fix more cdf lossage. All the documents I have > right now print the correct information. > > The cdf parser is now much more robust and I can't find yet a > document where it fails. There are also quite a lot magic fixes Late last year, I sent you a document that was created with Word, yet file outputs as created with Excel. See: http://mx.gw.com/pipermail/file/2008/000287.html This still seems to be the case: $ /usr/local/bin/file --version file-5.01 magic file from /usr/local/share/misc/magic $ /usr/local/bin/file 'Word11 complex.doc' Word11 complex.doc: CDF V2 Document, Little Endian, Os: Windows, Version 5.1, Code page: 1252, Author: Mike Cappella, Last Saved By: Mike Cappella, Name of Creating Application: Microsoft Excel, Create ^^^^^^^^^^^^^^^ Time/Date: Wed Dec 10 18:17:45 2008, Last Saved Time/Date: Wed Dec 10 18:17:48 2008, Security: 0 I'll send a copy of the document if you want it. > that were regressions from 4.x. Right now, I believe that most > regressions to 4.x have been fixed, and there is no reason to > support 4.x anymore. Please correct me if I am wrong. I'm done with it! :-) Mike From christos at zoulas.com Fri May 1 01:32:43 2009 From: christos at zoulas.com (Christos Zoulas) Date: Thu, 30 Apr 2009 18:32:43 -0400 Subject: file 5.01 is now available In-Reply-To: <49FA1DC7.9090902@cappella.us> from Mike Cappella (Apr 30, 2:53pm) Message-ID: <20090430223243.D2A965654E@rebar.astron.com> On Apr 30, 2:53pm, lists-file at cappella.us (Mike Cappella) wrote: -- Subject: Re: file 5.01 is now available | Christos, | | On 4/30/2009 2:27 PM, Christos Zoulas wrote: | | > File-5.01 is now available from: | > | | Thanks for your continued support from just one of the many happy users. | | > This is as expected a bug fix release: | > | > * Fix more cdf lossage. All the documents I have | > right now print the correct information. | | | > | > The cdf parser is now much more robust and I can't find yet a | > document where it fails. There are also quite a lot magic fixes | | Late last year, I sent you a document that was created with Word, yet | file outputs as created with Excel. See: | | http://mx.gw.com/pipermail/file/2008/000287.html | | This still seems to be the case: | | $ /usr/local/bin/file --version | file-5.01 | magic file from /usr/local/share/misc/magic | | $ /usr/local/bin/file 'Word11 complex.doc' | Word11 complex.doc: CDF V2 Document, Little Endian, Os: Windows, Version | 5.1, Code page: 1252, Author: Mike Cappella, Last Saved By: Mike | Cappella, Name of Creating Application: Microsoft Excel, Create | ^^^^^^^^^^^^^^^ | Time/Date: Wed Dec 10 18:17:45 2008, Last Saved Time/Date: Wed Dec 10 | 18:17:48 2008, Security: 0 | | I'll send a copy of the document if you want it. | Sure, please do! | > that were regressions from 4.x. Right now, I believe that most | > regressions to 4.x have been fixed, and there is no reason to | > support 4.x anymore. Please correct me if I am wrong. | | I'm done with it! :-) I am too, I will import 5.01 to NetBSD if my baby sleeps early tonight! christos From christos at zoulas.com Mon May 4 18:31:35 2009 From: christos at zoulas.com (Christos Zoulas) Date: Mon, 4 May 2009 11:31:35 -0400 Subject: file-5.02 is now available Message-ID: <20090504153135.8E61F5654E@rebar.astron.com> Hello, Unfortunately just after a released file, Drew Yao pointed out to me some buffer overflows caused by unchecked values read from a corrupt cdf causing integer overflows... File-5.02 fixes this issue, and adds even more checks to cdf parsing... ftp://ftp.astron.com/pub/file-5.02.tar.gz christos 2009-05-01 18:37 Christos Zoulas * Buffer overflow fixes from Drew Yao From christos at zoulas.com Wed May 6 23:56:26 2009 From: christos at zoulas.com (Christos Zoulas) Date: Wed, 6 May 2009 16:56:26 -0400 Subject: file-5.03 is now available Message-ID: <20090506205626.6F8C456550@rebar.astron.com> More cdf bug fixes. Hopefully the last ones. ftp://ftp.astron.com/pub/file/file-5.03.tar.gz Famous last words, christos From christos at zoulas.com Wed May 13 17:44:40 2009 From: christos at zoulas.com (Christos Zoulas) Date: Wed, 13 May 2009 10:44:40 -0400 Subject: fix last capability lost problem In-Reply-To: <1242209514-28790-1-git-send-email-crquan@gmail.com> from Cheng Renquan (May 13, 6:11pm) Message-ID: <20090513144440.BE1075654E@rebar.astron.com> On May 13, 6:11pm, crquan at gmail.com (Cheng Renquan) wrote: -- Subject: fix last capability lost problem | the '>=' comparison is incorrect, it causes problem that `file` | do not report the last capability . Fixed, thanks! christos From sacha at ssl.co.uk Fri May 22 19:29:35 2009 From: sacha at ssl.co.uk (Sacha Varma) Date: Fri, 22 May 2009 17:29:35 +0100 Subject: Moving from file-4.19 to file-5.03 Message-ID: <4A16D2EF.9020906@ssl.co.uk> Hi, Apologies if these are known issues. We've recently moved from file-4.19 to file-5.03 (UNIX & Windows) and I thought I'd let you know about some issues we came across. Our code is only interested in MIME types, so we have a simple magic.mime file that has lines like this: 2048 string PCD_IPI image/x-photo-cd 0 beshort 0xffd8 image/jpeg 0 string GIF8 image/gif This is what our code looks like: m = magic_open(MAGIC_MIME | MAGIC_ERROR) if (magic_load(m, "magic") != 0) { ... } mime_type = magic_file(m, filename) Issue #1: In 4.19, the magic_load() line would append a .mime suffix to the file and load it (i.e. "magic.mime"). In 5.03 this doesn't happen; I've stepped through the code, and while the function apprentice.c:mkdbname() has some backwards-compatibility code, it looks only for "magic.mime.mgc", not "magic.mime". A workaround for us (to maintain backwards compatibility) is to explicitly try magic.mime: if (magic_load(m, "magic") != 0 && magic_load(m, "magic.mime") != 0) Issue #2: When used with a JPEG file, file-5.03 seems to successfully match using the supplied magic.mime file, but then returns application/octet-stream. The MAGIC_DEBUG output ends with this: [15 0 beshort&,=-40,"image/jpeg"] 18446744073709551576 == 18446744073709551576 = 1 This appears to indicate that the comparison has succeeded, however the code then ultimately returns "application/octet-stream". Removing the MAGIC_MIME flag to magic_open() resolves this, although we now also get additional information which needs to be trimmed off: setgid sticky image/jpeg (The file does have weird permissions, but file-4.19 did not report these.) Without having debugged it any further, my hunch would be that the format of the magic file has changed between versions. I've had a quick look at the ChangeLog in the 5.03 source but with without version numbers in it it's hard to get one's bearings. In any case, this is mostly all FYI as I think we've been able to work around these backwards-compatibility issues. Thanks, Sacha. From christos at zoulas.com Sat May 23 15:50:51 2009 From: christos at zoulas.com (Christos Zoulas) Date: Sat, 23 May 2009 08:50:51 -0400 Subject: Moving from file-4.19 to file-5.03 In-Reply-To: <4A16D2EF.9020906@ssl.co.uk> from Sacha Varma (May 22, 5:29pm) Message-ID: <20090523125051.59C055654E@rebar.astron.com> On May 22, 5:29pm, sacha at ssl.co.uk (Sacha Varma) wrote: -- Subject: Moving from file-4.19 to file-5.03 | Hi, | | Apologies if these are known issues. | | We've recently moved from file-4.19 to file-5.03 (UNIX & Windows) and I | thought I'd let you know about some issues we came across. | | Our code is only interested in MIME types, so we have a simple | magic.mime file that has lines like this: | | 2048 string PCD_IPI image/x-photo-cd | 0 beshort 0xffd8 image/jpeg | 0 string GIF8 image/gif We prefer to incorporate the mime magic tags in the standard magic file as attributes so that we don't duplicate magic. | This is what our code looks like: | | m = magic_open(MAGIC_MIME | MAGIC_ERROR) | if (magic_load(m, "magic") != 0) { ... } | mime_type = magic_file(m, filename) | | | Issue #1: In 4.19, the magic_load() line would append a .mime suffix to | the file and load it (i.e. "magic.mime"). In 5.03 this doesn't happen; | I've stepped through the code, and while the function | apprentice.c:mkdbname() has some backwards-compatibility code, it looks | only for "magic.mime.mgc", not "magic.mime". Yes because the new file does not even install the uncompiled files by default. We don't concatenate the magic fragments to make a large file in the default case either. | A workaround for us (to maintain backwards compatibility) is to | explicitly try magic.mime: | | if (magic_load(m, "magic") != 0 && magic_load(m, "magic.mime") != 0) | | | Issue #2: When used with a JPEG file, file-5.03 seems to successfully | match using the supplied magic.mime file, but then returns | application/octet-stream. The MAGIC_DEBUG output ends with this: | | [15 0 beshort&,=-40,"image/jpeg"] | 18446744073709551576 == 18446744073709551576 = 1 | | This appears to indicate that the comparison has succeeded, however the | code then ultimately returns "application/octet-stream". | | Removing the MAGIC_MIME flag to magic_open() resolves this, although we | now also get additional information which needs to be trimmed off: | | setgid sticky image/jpeg | | (The file does have weird permissions, but file-4.19 did not report these.) This is expected, because you are parsing the mime file as a regular file, not attributes. I will see if I can replicate it. | Without having debugged it any further, my hunch would be that the | format of the magic file has changed between versions. I've had a quick | look at the ChangeLog in the 5.03 source but with without version | numbers in it it's hard to get one's bearings. | | | In any case, this is mostly all FYI as I think we've been able to work | around these backwards-compatibility issues. Ok, thanks. christos From rbock at eudoxos.de Tue Jun 16 09:57:01 2009 From: rbock at eudoxos.de (Roland Bock) Date: Tue, 16 Jun 2009 08:57:01 +0200 Subject: HTML files classified as application/octet-stream and text/plain Message-ID: <4A37423D.7090107@eudoxos.de> Hi, after having had a few minor problems with file-4.21 which comes with Ubuntu-8.04, I upgraded to 5.03. Now I have two problems, an old one, and a new one: 1) A HTML file with leading blank lines is classified as text/plain: I had the same problem with 4.21. I wonder what I should do? I assume that it is not generally advisable to remove all blank lines in a file's content before handing it over to magic_buffer? 2) A HTML file (nothing special as far as I can see) is classified as application octet stream: The old version detected text/html Other HTML files are classified correctly (did not test more than ten though). What's the best way to proceed? Should I send the files to this list? Regards, Roland From dnovotny at redhat.com Tue Jun 16 12:55:17 2009 From: dnovotny at redhat.com (Daniel Novotny) Date: Tue, 16 Jun 2009 05:55:17 -0400 (EDT) Subject: a few font issues In-Reply-To: <2071485033.74741245145784614.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> Message-ID: <418384601.74851245146117584.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> hello, Because of the Fedora's new feature, which issues handling packages with fonts differently, we need all the fonts to be identified (with file) as fonts. These are a few magic entries of PostScript fonts, that we had to add 1) PostScript font files in groff have "%!PS-Adobe-3.0\ Resource-Font" header 2) PostScript font files in texlive-texmf have "%!FontType1" header on 6th byte 3) a minor thing: "OpenType font data" magic entry has one more trailing space " " in the description, I removed it patch attached best regards, Daniel Novotny, Red Hat inc. -------------- next part -------------- A non-text attachment was scrubbed... Name: file-5.03-fonts-postscript.patch Type: text/x-patch Size: 907 bytes Desc: not available URL: From dnovotny at redhat.com Tue Jun 16 13:18:17 2009 From: dnovotny at redhat.com (Daniel Novotny) Date: Tue, 16 Jun 2009 06:18:17 -0400 (EDT) Subject: a few font issues - followup In-Reply-To: <970749349.75151245147480721.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> Message-ID: <64209291.75201245147497532.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> hello, this is more up-to-date version of the patch in the last mail: the %!FontType1 header can be also on zeroth byte. Please use this patch instead of the previous one, thanks sorry for the confusion Daniel >hello, > >Because of the Fedora's new feature, which issues handling packages >with fonts differently, we need all the fonts to be identified (with file) >as fonts. These are a few magic entries of PostScript fonts, that we had to add >1) PostScript font files in groff have "%!PS-Adobe-3.0\ Resource-Font" header >2) PostScript font files in texlive-texmf have "%!FontType1" header on 6th byte >3) a minor thing: "OpenType font data" magic entry has one more trailing space " " in the description, I removed it >patch attached >best regards, > Daniel Novotny, Red Hat inc. -------------- next part -------------- A non-text attachment was scrubbed... Name: file-5.03-fonts-postscript.patch Type: text/x-patch Size: 966 bytes Desc: not available URL: From christos at zoulas.com Tue Jun 16 15:42:44 2009 From: christos at zoulas.com (Christos Zoulas) Date: Tue, 16 Jun 2009 08:42:44 -0400 Subject: HTML files classified as application/octet-stream and text/plain In-Reply-To: <4A37423D.7090107@eudoxos.de> from Roland Bock (Jun 16, 8:57am) Message-ID: <20090616124244.224325654E@rebar.astron.com> On Jun 16, 8:57am, rbock at eudoxos.de (Roland Bock) wrote: -- Subject: HTML files classified as application/octet-stream and text/plain | Hi, | | after having had a few minor problems with file-4.21 which comes with | Ubuntu-8.04, I upgraded to 5.03. Now I have two problems, an old one, | and a new one: | | 1) A HTML file with leading blank lines is classified as text/plain: | I had the same problem with 4.21. I wonder what I should do? I assume | that it is not generally advisable to remove all blank lines in a file's | content before handing it over to magic_buffer? | | 2) A HTML file (nothing special as far as I can see) is classified as | application octet stream: | The old version detected text/html | Other HTML files are classified correctly (did not test more than ten | though). | | | What's the best way to proceed? Should I send the files to this list? Yes, send the file to me. christos From christos at zoulas.com Tue Jun 16 15:47:34 2009 From: christos at zoulas.com (Christos Zoulas) Date: Tue, 16 Jun 2009 08:47:34 -0400 Subject: a few font issues In-Reply-To: <418384601.74851245146117584.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> from Daniel Novotny (Jun 16, 5:55am) Message-ID: <20090616124734.E454E5654E@rebar.astron.com> On Jun 16, 5:55am, dnovotny at redhat.com (Daniel Novotny) wrote: -- Subject: a few font issues | hello, | | Because of the Fedora's new feature, which issues handling packages | with fonts differently, we need all the fonts to be identified (with file) | as fonts. These are a few magic entries of PostScript fonts, that we had to add | | 1) PostScript font files in groff have "%!PS-Adobe-3.0\ Resource-Font" header | 2) PostScript font files in texlive-texmf have "%!FontType1" header on 6th byte | 3) a minor thing: "OpenType font data" magic entry has one more trailing | space " " in the description, I removed it | | patch attached | | best regards, | | Daniel Novotny, Red Hat inc. Applied, thanks christos From christos at zoulas.com Tue Jun 16 15:49:07 2009 From: christos at zoulas.com (Christos Zoulas) Date: Tue, 16 Jun 2009 08:49:07 -0400 Subject: a few font issues - followup In-Reply-To: <64209291.75201245147497532.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> from Daniel Novotny (Jun 16, 6:18am) Message-ID: <20090616124907.CC3BD5654E@rebar.astron.com> On Jun 16, 6:18am, dnovotny at redhat.com (Daniel Novotny) wrote: -- Subject: a few font issues - followup | hello, | this is more up-to-date version of the patch in the last mail: | the %!FontType1 header can be also on zeroth byte. | Please use this patch instead of the previous one, thanks | sorry for the confusion | | Daniel Got it. christos From rbock at eudoxos.de Tue Jun 16 16:04:27 2009 From: rbock at eudoxos.de (Roland Bock) Date: Tue, 16 Jun 2009 15:04:27 +0200 Subject: HTML files classified as application/octet-stream and text/plain In-Reply-To: <20090616124244.224325654E@rebar.astron.com> References: <20090616124244.224325654E@rebar.astron.com> Message-ID: <4A37985B.7020803@eudoxos.de> Christos Zoulas wrote: > On Jun 16, 8:57am, rbock at eudoxos.de (Roland Bock) wrote: > -- Subject: HTML files classified as application/octet-stream and text/plain > > | Hi, > | > | after having had a few minor problems with file-4.21 which comes with > | Ubuntu-8.04, I upgraded to 5.03. Now I have two problems, an old one, > | and a new one: > | > | 1) A HTML file with leading blank lines is classified as text/plain: > | I had the same problem with 4.21. I wonder what I should do? I assume > | that it is not generally advisable to remove all blank lines in a file's > | content before handing it over to magic_buffer? > | > | 2) A HTML file (nothing special as far as I can see) is classified as > | application octet stream: > | The old version detected text/html > | Other HTML files are classified correctly (did not test more than ten > | though). > | > | > | What's the best way to proceed? Should I send the files to this list? > > Yes, send the file to me. > > christos Files are attached :-) This is what I did and got: ./src/file --magic-file magic/magic.mgc -i ../file/* ../file/seed_biz_yahoo_archive.html: application/octet-stream; charset=binary ../file/test.html: text/plain; charset=us-ascii Regards, Roland -------------- next part -------------- A non-text attachment was scrubbed... Name: file.zip Type: application/zip Size: 6866 bytes Desc: not available URL: From s.marechal at jejik.com Tue Jun 30 11:36:53 2009 From: s.marechal at jejik.com (Sander Marechal) Date: Tue, 30 Jun 2009 10:36:53 +0200 Subject: [PATCH] Better OpenDocument support Message-ID: <4A49CEA5.9010506@jejik.com> Hello, I have made a couple of improvements to the magic rules for OpenDocument format. See the attached patch (against file-5.0.3). The patch fixes three things: 1) It adds the mimetypes for all OpenDocument types. Currently only the mimetype for OpenDocument Text files is present. The other file types only have a description but no mimetype. 2) It adds the missing "OpenDocument Image Template" 3) It removes the Zip version number from the test. Some ODF applications generate ODF files that use Zip 1.0 for the ODF container instead of Zip 2.0 (which OpenOffice uses). These files are now detected as Zip archives instead of OpenDocument files. According to the ODF spec you can use any Zip version. So, the test is removed. See also: http://www.jejik.com/articles/2009/06/fixing_opendocument_mime_magic_on_linux/ Kind regards, -- Sander Marechal -------------- next part -------------- A non-text attachment was scrubbed... Name: file-5.0.3-opendocument.patch Type: text/x-patch Size: 5910 bytes Desc: not available URL: From christos at zoulas.com Tue Jun 30 16:00:21 2009 From: christos at zoulas.com (Christos Zoulas) Date: Tue, 30 Jun 2009 09:00:21 -0400 Subject: [PATCH] Better OpenDocument support In-Reply-To: <4A49CEA5.9010506@jejik.com> from Sander Marechal (Jun 30, 10:36am) Message-ID: <20090630130021.190795654E@rebar.astron.com> On Jun 30, 10:36am, s.marechal at jejik.com (Sander Marechal) wrote: -- Subject: [PATCH] Better OpenDocument support | Hello, | | I have made a couple of improvements to the magic rules for OpenDocument | format. See the attached patch (against file-5.0.3). The patch fixes | three things: | | 1) It adds the mimetypes for all OpenDocument types. Currently only the | mimetype for OpenDocument Text files is present. The other file types | only have a description but no mimetype. | | 2) It adds the missing "OpenDocument Image Template" | | 3) It removes the Zip version number from the test. Some ODF | applications generate ODF files that use Zip 1.0 for the ODF container | instead of Zip 2.0 (which OpenOffice uses). These files are now detected | as Zip archives instead of OpenDocument files. According to the ODF spec | you can use any Zip version. So, the test is removed. | | See also: | http://www.jejik.com/articles/2009/06/fixing_opendocument_mime_magic_on_linux/ | For some reason the patch does not apply. Is that diff against the original 5.03? christos From s.marechal at jejik.com Tue Jun 30 16:24:42 2009 From: s.marechal at jejik.com (Sander Marechal) Date: Tue, 30 Jun 2009 15:24:42 +0200 Subject: [PATCH] Better OpenDocument support In-Reply-To: <20090630130021.190795654E@rebar.astron.com> References: <20090630130021.190795654E@rebar.astron.com> Message-ID: <4A4A121A.6060901@jejik.com> Christos Zoulas wrote: > For some reason the patch does not apply. Is that diff against the original > 5.03? Yes. It was made against: ftp://ftp.astron.com/pub/file/file-5.03.tar.gz I made the patch as follows: $ ls -1 file-5.03 file-5.03-opendocument $ diff -Naur file-5.03 file-5.03-opendocument > \ file-5.0.3-opendocument.patch Perhaps you're using the wrong value for the -p parameter when patching? If it still does not work I can also send the changed magic/Magdir/archive file so you can diff/patch it yourself. -- Sander Marechal From christos at zoulas.com Tue Jun 30 18:00:30 2009 From: christos at zoulas.com (Christos Zoulas) Date: Tue, 30 Jun 2009 11:00:30 -0400 Subject: [PATCH] Better OpenDocument support In-Reply-To: <4A4A121A.6060901@jejik.com> from Sander Marechal (Jun 30, 3:24pm) Message-ID: <20090630150030.DB2935654E@rebar.astron.com> On Jun 30, 3:24pm, s.marechal at jejik.com (Sander Marechal) wrote: -- Subject: Re: [PATCH] Better OpenDocument support | Christos Zoulas wrote: | > For some reason the patch does not apply. Is that diff against the original | > 5.03? | | Yes. It was made against: ftp://ftp.astron.com/pub/file/file-5.03.tar.gz | | I made the patch as follows: | | $ ls -1 | file-5.03 | file-5.03-opendocument | $ diff -Naur file-5.03 file-5.03-opendocument > \ | file-5.0.3-opendocument.patch | | Perhaps you're using the wrong value for the -p parameter when patching? | If it still does not work I can also send the changed | magic/Magdir/archive file so you can diff/patch it yourself. | No, it rejects the patch, can you send me a copy of your file please? christos From s.marechal at jejik.com Wed Jul 1 02:05:50 2009 From: s.marechal at jejik.com (Sander Marechal) Date: Wed, 01 Jul 2009 01:05:50 +0200 Subject: [PATCH] Better OpenDocument support In-Reply-To: <20090630150030.DB2935654E@rebar.astron.com> References: <20090630150030.DB2935654E@rebar.astron.com> Message-ID: <4A4A9A4E.3040402@jejik.com> Christos Zoulas wrote: > On Jun 30, 3:24pm, s.marechal at jejik.com (Sander Marechal) wrote: > | Perhaps you're using the wrong value for the -p parameter when patching? > | If it still does not work I can also send the changed > | magic/Magdir/archive file so you can diff/patch it yourself. > | > > No, it rejects the patch, can you send me a copy of your file please? Sure. Here you go :-) -- Sander Marechal -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: archive URL: From christos at zoulas.com Wed Jul 1 03:08:15 2009 From: christos at zoulas.com (Christos Zoulas) Date: Tue, 30 Jun 2009 20:08:15 -0400 Subject: [PATCH] Better OpenDocument support In-Reply-To: <4A4A9A4E.3040402@jejik.com> from Sander Marechal (Jul 1, 1:05am) Message-ID: <20090701000815.A5DAD5654E@rebar.astron.com> On Jul 1, 1:05am, s.marechal at jejik.com (Sander Marechal) wrote: -- Subject: Re: [PATCH] Better OpenDocument support | > No, it rejects the patch, can you send me a copy of your file please? | | Sure. Here you go :-) Thanks, committed. christos From asbjorn at asbjorn.biz Mon Jul 6 16:10:04 2009 From: asbjorn at asbjorn.biz (=?ISO-8859-1?Q?Asbj=F8rn_Sloth_T=F8nnesen?=) Date: Mon, 06 Jul 2009 13:10:04 +0000 Subject: Font mime types patch Message-ID: <4A51F7AC.7090901@asbjorn.biz> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, Lets start out with the straight forward stuff first: Spline Font Database ==================== In May 2008 George Williams registered application/vnd.font-fontforge-sfd with IANA, so theres no doubt that its the official MIME type. Good work. http://www.iana.org/assignments/media-types/application/vnd.font-fontforge-sfd Truetype ======== No registered MIME type, but application/x-font-ttf seam to be the most videly used, followed by application/x-truetype-font, and in the Adobe Flash world application/x-font-truetype is used. http://www.google.com/search?q="application/x-font-ttf" http://www.google.com/search?q="application/x-truetype-font" http://www.google.com/search?q="application/x-font-truetype" http://www.google.com/search?hl=en&q=mime+application+truetype ISO/IEC JTC 1/SC34 are working on a new font top-level medatype. But on the other hand they also recognize "application/x-font-ttf" as being the experimental (read: not standardized) defacto MIME type for Truetype fonts. OpenType ======== Format developed by Adobe and Microsoft Specs hosted by Adobe, according to Wikipedia, http://partners.adobe.com/public/developer/opentype/index_spec.html but Adobe redirects to Microsoft: http://www.microsoft.com/typography/otspec/ In Microsoft's XPS docs, they say that the OpenFont media type is application/vnd.ms-opentype Google search for application/vnd.ms-opentype on microsoft.com http://www.google.com/search?q=application%2Fvnd.ms-opentype+site%3Amicrosoft.com However neither Microsoft, Adobe or other members of the ISO wg have registered this mime MIME with IANA. Patch ===== diff -Naur file-5.03/magic/Magdir/fonts file/magic/Magdir/fonts - --- file-5.03/magic/Magdir/fonts 2009-04-13 03:14:37.000000000 +0000 +++ file/magic/Magdir/fonts 2009-07-06 13:04:58.975889861 +0000 @@ -51,6 +51,7 @@ # True Type fonts 0 string \000\001\000\000\000 TrueType font data +!:mime application/x-font-ttf 0 string \007\001\001\000Copyright\ (c)\ 199 Adobe Multiple Master font 0 string \012\001\001\000Copyright\ (c)\ 199 Adobe Multiple Master font @@ -59,7 +60,9 @@ # Opentype font data from Avi Bercovich 0 string OTTO OpenType font data +!:mime application/vnd.ms-opentype # G?rkan Seng?n , www.linuks.mine.nu 0 string SplineFontDB: Spline Font Database +!mime application/vnd.font-fontforge-sfd >14 string x version %s - -- Best regards Asbj?rn Sloth T?nnesen Backend System Architect Lila ApS http://lila.io/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkpR96wACgkQSViWlxucwureiQCdHV3ZX6h5Ujsd3w/BZfRKSYFm QDoAnRrvhrcRbwEWHcXar4RTUN9bgWDn =R6AZ -----END PGP SIGNATURE----- From oscaruser at programmer.net Fri Jul 10 00:31:53 2009 From: oscaruser at programmer.net (Oscar Usifer) Date: Thu, 9 Jul 2009 16:31:53 -0500 Subject: file repo read access Message-ID: <20090709213153.F080B478024@ws1-5.us4.outblaze.com> Greets Christos, Can I access the repo to get the latest version of the sources? If so what is the repo URL and which source control tool would I need (e.g. cvs, svn, etc) ? Also is there a web page dedicated to this project? Thanks, OSC -- Be Yourself @ mail.com! Choose From 200+ Email Addresses Get a Free Account at www.mail.com From oscaruser at programmer.net Fri Jul 10 04:58:24 2009 From: oscaruser at programmer.net (Oscar Usifer) Date: Thu, 9 Jul 2009 20:58:24 -0500 Subject: Fw: Re: [PATCH] Better OpenDocument support Message-ID: <20090710015824.279F6326701@ws1-8.us4.outblaze.com> ----- Original Message ----- From: "Oscar Usifer" To: s.marechal at jejik.com Cc: oscaruser at programmer.net, file-request at mx.gw.com Subject: Re: [PATCH] Better OpenDocument support Date: Thu, 9 Jul 2009 19:26:30 -0500 Hi Sander, I used your patched version of file magic archive and built from source a new magic.mgc. I stripped down the magic types to only contain the archive magic file, with no others in order to isolate this test. Running it was not recognizing the docx format properly during tests as I had hoped. Here is the output of the run. Also, looking at the file man page does not list test type 'ubelong' or 'ubeshort'. I am guessing the 'u' stands for uncompress matching(?). Thanks, -OSC [osc at host /usr/home/osc/file/file-5.03]$ uname -a FreeBSD host 7.2-RELEASE-p1 FreeBSD 7.2-RELEASE-p1 #0: Tue Jun 9 18:02:21 UTC 2009 root at amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 [osc at host /usr/home/osc/file/file-5.03]$ echo $LD_LIBRARY_PATH /usr/home/osc/file/file-5.03/dist/lib [osc at host /usr/home/osc/file/file-5.03]$ ldd ./dist/bin/file ./dist/bin/file: libmagic.so.1 => /usr/home/osc/file/file-5.03/dist/lib/libmagic.so.1 (0x800635000) libz.so.4 => /lib/libz.so.4 (0x80074e000) libc.so.7 => /lib/libc.so.7 (0x800862000) [osc at host /usr/home/osc/file/file-5.03]$ ./dist/bin/file -d -b -i -k -m /usr/home/osc/file/file-5.03/dist/share/misc/magic.mgc -z --help Usage: file [OPTION...] [FILE...] Determine type of FILEs. --help display this help and exit -v, --version output version information and exit -m, --magic-file LIST use LIST as a colon-separated list of magic number files -z, --uncompress try to look inside compressed files -b, --brief do not prepend filenames to output lines -c, --checking-printout print the parsed form of the magic file, use in conjunction with -m to debug a new magic file before installing it -e, --exclude TEST exclude TEST from the list of test to be performed for file. Valid tests are: ascii, apptype, compress, elf, soft, tar, tokens, troff -f, --files-from FILE read the filenames to be examined from FILE -F, --separator STRING use string as separator instead of `:' -i, --mime output MIME type strings (--mime-type and --mime-encoding) --apple output the Apple CREATOR/TYPE --mime-type output the MIME type --mime-encoding output the MIME encoding -k, --keep-going don't stop at the first match -L, --dereference follow symlinks (default) -h, --no-dereference don't follow symlinks -n, --no-buffer do not buffer output -N, --no-pad do not pad output -0, --print0 terminate filenames with ASCII NUL -p, --preserve-date preserve access times on files -r, --raw don't translate unprintable chars to \ooo -s, --special-files treat special (block/char devices) files as ordinary ones -C, --compile compile file specified by -m -d, --debug print debugging messages [osc at host /usr/home/osc/file/file-5.03]$ [osc at host /usr/home/osc/file/file-5.03]$ ./dist/bin/file -d -b -i -k -m /usr/home/osc/file/file-5.03/dist/share/misc/magic.mgc -z ./test/docx/test.docx mget @0: PK\003\004\024\000\006\000\b\000\000\000!\000\335\374\2257f\001\000\000 \005\000\000\023\000\b\002[C [9 0 string,=PK\003\004,""] 0 == 0 = 1 mget @30: [Content_Types].xml \242\004\002(\240\000\002\000\000\000\000\000 [10> 30 ubelong&,!1835625829,""] 1531146094 != 1835625829 = 1 mget @4: \024\000\006\000\b\000\000\000!\000\335\374\2257f\001\000\000 \005\000\000\023\000\b\002[Conte [11>> 4 byte&,=0,"Zip archive data"] 20 == 0 = 0 mget @4: \024\000\006\000\b\000\000\000!\000\335\374\2257f\001\000\000 \005\000\000\023\000\b\002[Conte [13>> 4 byte&,=9,"Zip archive data, at least v0.9 to extract"] 20 == 9 = 0 mget @4: \024\000\006\000\b\000\000\000!\000\335\374\2257f\001\000\000 \005\000\000\023\000\b\002[Conte [15>> 4 byte&,=10,"Zip archive data, at least v1.0 to extract"] 20 == 10 = 0 mget @4: \024\000\006\000\b\000\000\000!\000\335\374\2257f\001\000\000 \005\000\000\023\000\b\002[Conte [17>> 4 byte&,=11,"Zip archive data, at least v1.1 to extract"] 20 == 11 = 0 mget @353: \000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000 [19>> 353 string,=WINZIP,"Zip archive data, WinZIP self-extracting"] 18446744073709551529 == 0 = 0 mget @4: \024\000\006\000\b\000\000\000!\000\335\374\2257f\001\000\000 \005\000\000\023\000\b\002[Conte [21>> 4 byte&,=20,"Zip archive data, at least v2.0 to extract"] 20 == 20 = 1 softmagic 1 zmagic 1 application/x-empty compressed-encoding=application/zip; charset=binary; charset=binary [osc at host /usr/home/osc/file/file-5.03]$ -- Be Yourself @ mail.com! Choose From 200+ Email Addresses Get a Free Account at www.mail.com -- Be Yourself @ mail.com! Choose From 200+ Email Addresses Get a Free Account at www.mail.com From christos at zoulas.com Fri Jul 10 14:12:28 2009 From: christos at zoulas.com (Christos Zoulas) Date: Fri, 10 Jul 2009 07:12:28 -0400 Subject: file repo read access In-Reply-To: <20090709213153.F080B478024@ws1-5.us4.outblaze.com> from "Oscar Usifer" (Jul 9, 4:31pm) Message-ID: <20090710111228.B12225654E@rebar.astron.com> On Jul 9, 4:31pm, oscaruser at programmer.net ("Oscar Usifer") wrote: -- Subject: file repo read access | Greets Christos, | | Can I access the repo to get the latest version of the sources? | If so what is the repo URL and which source control tool would I | need (e.g. cvs, svn, etc) ? Also is there a web page dedicated to | this project? No, I have not created a public repository yet, mainly due to lack of time. On the other hand, I frequently put a tar file of head on ftp.astron.com:/pri/file-X.YY.tar.gz where X.YY is the latest version. christos From asbjorn at asbjorn.biz Fri Jul 10 16:51:30 2009 From: asbjorn at asbjorn.biz (=?ISO-8859-1?Q?Asbj=F8rn_Sloth_T=F8nnesen?=) Date: Fri, 10 Jul 2009 13:51:30 +0000 Subject: Adobe Photoshop magic patch Message-ID: <4A574762.7050008@asbjorn.biz> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Christos, Here you have a patch for making the PSD magic a little bit more verbose. Examples: Adobe Photoshop Image, 4 x 1, indexed, 8-bit channel Adobe Photoshop Image, 2688 x 2736, CMYKA, 5x 8-bit channels Adobe Photoshop Image, 32 x 32, grayscale with alpha, 2x 8-bit channels Adobe Photoshop Image, 518 x 259, RGB, 3x 8-bit channels Adobe Photoshop Image, 42 x 42, RGB, 3x 8-bit channels PNG example for comparison: PNG image, 4224 x 2376, 8-bit/color RGB, non-interlaced Based on the specs. from the original CS release in Oct. 2003. Attached patch is against file-5.03.tar.gz - -- Best regards Asbj?rn Sloth T?nnesen Backend System Architect Lila ApS http://lila.io/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkpXR2IACgkQSViWlxucwuogVwCglQPu7tcFCnbw3qMaTu8725RI fMsAn2gOJHw9iNztYxm8L/7G/jJy3Oqu =y7Ee -----END PGP SIGNATURE----- -------------- next part -------------- A non-text attachment was scrubbed... Name: psd.patch Type: text/x-patch Size: 845 bytes Desc: not available URL: From christos at zoulas.com Fri Jul 10 19:44:34 2009 From: christos at zoulas.com (Christos Zoulas) Date: Fri, 10 Jul 2009 12:44:34 -0400 Subject: Adobe Photoshop magic patch In-Reply-To: <4A574762.7050008@asbjorn.biz> from =?ISO-8859-1?Q?Asbj=F8rn_Sloth_T=F8nnesen?= (Jul 10, 1:51pm) Message-ID: <20090710164434.2D40A5654E@rebar.astron.com> On Jul 10, 1:51pm, asbjorn at asbjorn.biz (=?ISO-8859-1?Q?Asbj=F8rn_Sloth_T=F8nnesen?=) wrote: -- Subject: Adobe Photoshop magic patch | Hi Christos, | | Here you have a patch for making the PSD magic a little bit more verbose. | | Examples: | Adobe Photoshop Image, 4 x 1, indexed, 8-bit channel | Adobe Photoshop Image, 2688 x 2736, CMYKA, 5x 8-bit channels | Adobe Photoshop Image, 32 x 32, grayscale with alpha, 2x 8-bit channels | Adobe Photoshop Image, 518 x 259, RGB, 3x 8-bit channels | Adobe Photoshop Image, 42 x 42, RGB, 3x 8-bit channels | | PNG example for comparison: | PNG image, 4224 x 2376, 8-bit/color RGB, non-interlaced | | Based on the specs. from the original CS release in Oct. 2003. | | Attached patch is against file-5.03.tar.gz Thanks a lot! christos From oscaruser at programmer.net Sat Jul 11 03:05:56 2009 From: oscaruser at programmer.net (Oscar Usifer) Date: Fri, 10 Jul 2009 19:05:56 -0500 Subject: magic test for compressed file Message-ID: <20090711000556.42731BE407E@ws1-9.us4.outblaze.com> Folks, Is it possible to write a magic test so that it compares the uncompressed file input sigs even if the input stream is compressed? OF docx file types have this issue given they are zip files. One thing I noticed while testing was it looked like the uncompressed stream was being tested during debug runs. Thanks, -OSC $ file -d test.docx mget @0: PK\003\004\024\000\006\000\b\000\000\000!\000\335\374\2257f\001\000\000 \005\000\000\023\000\b\002[C [9 0 string,=PK\003\004,""] 0 == 0 = 1 mget @30: [Content_Types].xml \242\004\002(\240\000\002\000\000\000\000\000 [10> 30 ubelong&,!1835625829,""] 1531146094 != 1835625829 = 1 mget @4: \024\000\006\000\b\000\000\000!\000\335\374\2257f\001\000\000 \005\000\000\023\000\b\002[Conte [11>> 4 byte&,=0,"Zip archive data"] 20 == 0 = 0 mget @4: \024\000\006\000\b\000\000\000!\000\335\374\2257f\001\000\000 \005\000\000\023\000\b\002[Conte [13>> 4 byte&,=9,"Zip archive data, at least v0.9 to extract"] 20 == 9 = 0 mget @4: \024\000\006\000\b\000\000\000!\000\335\374\2257f\001\000\000 \005\000\000\023\000\b\002[Conte [15>> 4 byte&,=10,"Zip archive data, at least v1.0 to extract"] 20 == 10 = 0 mget @4: \024\000\006\000\b\000\000\000!\000\335\374\2257f\001\000\000 \005\000\000\023\000\b\002[Conte [17>> 4 byte&,=11,"Zip archive data, at least v1.1 to extract"] 20 == 11 = 0 mget @353: \000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000 [19>> 353 string,=WINZIP,"Zip archive data, WinZIP self-extracting"] 18446744073709551529 == 0 = 0 mget @4: \024\000\006\000\b\000\000\000!\000\335\374\2257f\001\000\000 \005\000\000\023\000\b\002[Conte [21>> 4 byte&,=20,"Zip archive data, at least v2.0 to extract"] 20 == 20 = 1 mget @30: [Content_Types].xml \242\004\002(\240\000\002\000\000\000\000\000 [28> 30 string,=mimetype,""] 18446744073709551598 == 0 = 0 softmagic 1 test.docx: Zip archive data, at least v2.0 to extract -- Be Yourself @ mail.com! Choose From 200+ Email Addresses Get a Free Account at www.mail.com From christos at zoulas.com Sat Jul 11 17:24:04 2009 From: christos at zoulas.com (Christos Zoulas) Date: Sat, 11 Jul 2009 10:24:04 -0400 Subject: magic test for compressed file In-Reply-To: <20090711000556.42731BE407E@ws1-9.us4.outblaze.com> from "Oscar Usifer" (Jul 10, 7:05pm) Message-ID: <20090711142404.8D1D55654E@rebar.astron.com> On Jul 10, 7:05pm, oscaruser at programmer.net ("Oscar Usifer") wrote: -- Subject: magic test for compressed file | Folks, | | Is it possible to write a magic test so that it compares the uncompressed file input sigs even if the input stream is compressed? OF docx file types have this issue given they are zip files. One thing I noticed while testing was it looked like the uncompressed stream was being tested during debug runs. | | Thanks, I am planning to add support for Microsoft's compressed XML documents using libzip or libarchive. christos From oscaruser at programmer.net Sun Jul 12 00:04:35 2009 From: oscaruser at programmer.net (Oscar Usifer) Date: Sat, 11 Jul 2009 16:04:35 -0500 Subject: Fw: Crystal Reports file test reports 'CDF V2 Document, corrupt' Message-ID: <20090711210435.A30A2606865@ws1-4.us4.outblaze.com> Folks, File-5.03 file test is reporting 'corrupt' for crystal reports file test. Any ideas why? I added a magic for this, but even before getting to the test, file pid exits. File version 4.23 does not suffer from the same issue, and reports file as 'Microsoft Office Document' also incorrectly, but with a different result. Thanks, -OSC $ file -d test.rpt cdf 1 CDF V2 Document, corrupt: Can't expand summary_info; charset=binary truss file -d test.rpt __sysctl(0x7fffffffe590,0x2,0x7fffffffe5ac,0x7fffffffe5a0,0x0,0x0) = 0 (0x0) mmap(0x0,608,PROT_READ|PROT_WRITE,MAP_ANON,-1,0x0) = 34365149184 (0x800529000) munmap(0x800529000,608) = 0 (0x0) __sysctl(0x7fffffffe600,0x2,0x800631288,0x7fffffffe5f8,0x0,0x0) = 0 (0x0) mmap(0x0,32768,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34365149184 (0x800529000) issetugid(0x80052a015,0x800524aa9,0x800634bd0,0x800634ba0,0x556c,0x7fffffffe5f8) = 0 (0x0) open("/etc/libmap.conf",O_RDONLY,0666) ERR#2 'No such file or directory' access("/usr/home/osc/file/file-5.03/dist/lib/libmagic.so.1",0) = 0 (0x0) open("/usr/home/osc/file/file-5.03/dist/lib/libmagic.so.1",O_RDONLY,030610540) = 3 (0x3) fstat(3,{ mode=-rwxr-xr-x ,inode=1532794,size=306994,blksize=4096 }) = 0 (0x0) fstatfs(0x3,0x7fffffffe3e0,0x800631160,0x80050dcfc,0xffffffff80b3ed40,0x7fffffffe3d8) = 0 (0x0) read(3,"\^?ELF\^B\^A\^A\t\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) mmap(0x0,1150976,PROT_READ|PROT_EXEC,MAP_PRIVATE|MAP_NOCORE,3,0x0) = 34366246912 (0x800635000) mprotect(0x80064b000,4096,PROT_READ|PROT_WRITE|PROT_EXEC) = 0 (0x0) mprotect(0x80064b000,4096,PROT_READ|PROT_EXEC) = 0 (0x0) mmap(0x80074c000,8192,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_FIXED,3,0x17000) = 34367389696 (0x80074c000) close(3) = 0 (0x0) access("/usr/home/osc/file/file-5.03/dist/lib/libz.so.4",0) ERR#2 'No such file or directory' access("/usr/home/osc/file/file-5.03/dist/lib/libz.so.4",0) ERR#2 'No such file or directory' open("/var/run/ld-elf.so.hints",O_RDONLY,030610540) = 3 (0x3) read(3,"Ehnt\^A\0\0\0\M^@\0\0\0\M^F\0\0"...,128) = 128 (0x80) lseek(3,0x80,SEEK_SET) = 128 (0x80) read(3,"/lib:/usr/lib:/usr/lib/compat:/u"...,134) = 134 (0x86) close(3) = 0 (0x0) access("/lib/libz.so.4",0) = 0 (0x0) open("/lib/libz.so.4",O_RDONLY,030610540) = 3 (0x3) fstat(3,{ mode=-r--r--r-- ,inode=49488,size=84240,blksize=4096 }) = 0 (0x0) fstatfs(0x3,0x7fffffffe3e0,0x800631160,0x80050dcfc,0xffffffff80b3edc0,0x7fffffffe3d8) = 0 (0x0) read(3,"\^?ELF\^B\^A\^A\t\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) mmap(0x0,1130496,PROT_READ|PROT_EXEC,MAP_PRIVATE|MAP_NOCORE,3,0x0) = 34367397888 (0x80074e000) mprotect(0x800760000,4096,PROT_READ|PROT_WRITE|PROT_EXEC) = 0 (0x0) mprotect(0x800760000,4096,PROT_READ|PROT_EXEC) = 0 (0x0) mmap(0x800860000,8192,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_FIXED,3,0x12000) = 34368520192 (0x800860000) close(3) = 0 (0x0) access("/usr/home/osc/file/file-5.03/dist/lib/libc.so.7",0) ERR#2 'No such file or directory' access("/usr/home/osc/file/file-5.03/dist/lib/libc.so.7",0) ERR#2 'No such file or directory' access("/lib/libc.so.7",0) = 0 (0x0) open("/lib/libc.so.7",O_RDONLY,030610540) = 3 (0x3) fstat(3,{ mode=-r--r--r-- ,inode=49457,size=1184384,blksize=4096 }) = 0 (0x0) fstatfs(0x3,0x7fffffffe3e0,0x800631160,0x80050dcfc,0xffffffff80b3ed40,0x7fffffffe3d8) = 0 (0x0) read(3,"\^?ELF\^B\^A\^A\t\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) mmap(0x0,2244608,PROT_READ|PROT_EXEC,MAP_PRIVATE|MAP_NOCORE,3,0x0) = 34368528384 (0x800862000) mprotect(0x800951000,4096,PROT_READ|PROT_WRITE|PROT_EXEC) = 0 (0x0) mprotect(0x800951000,4096,PROT_READ|PROT_EXEC) = 0 (0x0) mmap(0x800a52000,118784,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_FIXED,3,0xf0000) = 34370560000 (0x800a52000) mmap(0x800a6f000,94208,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_FIXED|MAP_ANON,-1,0x0) = 34370678784 (0x800a6f000) close(3) = 0 (0x0) sysarch(0x81,0x7fffffffe680,0x80052d1c8,0x0,0xffffffffffad5930,0x8080808080808080) = 0 (0x0) mmap(0x0,848,PROT_READ|PROT_WRITE,MAP_ANON,-1,0x0) = 34365181952 (0x800531000) munmap(0x800531000,848) = 0 (0x0) mmap(0x0,3200,PROT_READ|PROT_WRITE,MAP_ANON,-1,0x0) = 34365181952 (0x800531000) munmap(0x800531000,3200) = 0 (0x0) mmap(0x0,2080,PROT_READ|PROT_WRITE,MAP_ANON,-1,0x0) = 34365181952 (0x800531000) munmap(0x800531000,2080) = 0 (0x0) mmap(0x0,42256,PROT_READ|PROT_WRITE,MAP_ANON,-1,0x0) = 34365181952 (0x800531000) munmap(0x800531000,42256) = 0 (0x0) sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0) sigprocmask(SIG_SETMASK,0x0,0x0) = 0 (0x0) __sysctl(0x7fffffffe620,0x2,0x800a6f9e0,0x7fffffffe618,0x0,0x0) = 0 (0x0) sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0) sigprocmask(SIG_SETMASK,0x0,0x0) = 0 (0x0) access("/home/osc/.magic",4) ERR#2 'No such file or directory' __sysctl(0x7fffffffdbc0,0x2,0x800a73a48,0x7fffffffdbd8,0x0,0x0) = 0 (0x0) __sysctl(0x7fffffffd710,0x2,0x800a826d8,0x7fffffffd708,0x0,0x0) = 0 (0x0) __sysctl(0x7fffffffd750,0x2,0x7fffffffd76c,0x7fffffffd760,0x0,0x0) = 0 (0x0) readlink("/etc/malloc.conf",0x7fffffffd7b0,1024) ERR#2 'No such file or directory' issetugid(0x800949d2a,0x7fffffffd7b0,0xffffffffffffffff,0x0,0xffffffff80b3edc0,0x7fffffffd788) = 0 (0x0) break(0x600000) = 0 (0x0) __sysctl(0x7fffffffdaa0,0x2,0x7fffffffdabc,0x7fffffffdab0,0x0,0x0) = 0 (0x0) mmap(0x0,1048576,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34370772992 (0x800a86000) mmap(0x800b86000,499712,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34371821568 (0x800b86000) munmap(0x800a86000,499712) = 0 (0x0) open("/usr/home/osc/file/file-5.03/dist/share/misc/magic.mgc",O_RDONLY,066) = 3 (0x3) fstat(3,{ mode=-rw-r--r-- ,inode=1532822,size=12200,blksize=4096 }) = 0 (0x0) mmap(0x0,12200,PROT_READ|PROT_WRITE,MAP_PRIVATE,3,0x0) = 34365181952 (0x800531000) close(3) = 0 (0x0) fstat(1,{ mode=crw--w---- ,inode=106,size=0,blksize=4096 }) = 0 (0x0) ioctl(1,TIOCGETA,0xffffd320) = 0 (0x0) lstat("test.rpt",{ mode=-rw-r--r-- ,inode=1532800,size=39936,blksize=4096 }) = 0 (0x0) stat("test.rpt",{ mode=-rw-r--r-- ,inode=1532800,size=39936,blksize=4096 }) = 0 (0x0) open("test.rpt",O_RDONLY,037777755660) = 3 (0x3) fcntl(3,F_GETFL,) = 0 (0x0) fcntl(3,F_SETFL,0x0) = 0 (0x0) read(3,"\M-P\M-O\^Q\M-`\M-!\M-1\^Z\M-a\0"...,262144) = 39936 (0x9c00) cdf 1 write(2,"cdf 1\n",6) = 6 (0x6) close(3) = 0 (0x0) test.rpt: CDF V2 Document, corrupt: Can't expand summary_info write(1,"test.rpt: CDF V2 Docu"...,73) = 73 (0x49) munmap(0x800531000,12200) = 0 (0x0) sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0) sigprocmask(SIG_SETMASK,0x0,0x0) = 0 (0x0) sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0) sigprocmask(SIG_SETMASK,0x0,0x0) = 0 (0x0) process exit, rval = 0 -- Be Yourself @ mail.com! Choose From 200+ Email Addresses Get a Free Account at www.mail.com From christos at zoulas.com Sun Jul 12 00:12:22 2009 From: christos at zoulas.com (Christos Zoulas) Date: Sat, 11 Jul 2009 17:12:22 -0400 Subject: Fw: Crystal Reports file test reports 'CDF V2 Document, corrupt' In-Reply-To: <20090711210435.A30A2606865@ws1-4.us4.outblaze.com> from "Oscar Usifer" (Jul 11, 4:04pm) Message-ID: <20090711211223.05B125654E@rebar.astron.com> On Jul 11, 4:04pm, oscaruser at programmer.net ("Oscar Usifer") wrote: -- Subject: Fw: Crystal Reports file test reports 'CDF V2 Document, corrupt' | Folks, | | File-5.03 file test is reporting 'corrupt' for crystal reports file | test. Any ideas why? I added a magic for this, but even before | getting to the test, file pid exits. File version 4.23 does not | suffer from the same issue, and reports file as 'Microsoft Office | Document' also incorrectly, but with a different result. | | Thanks, | -OSC | | $ file -d test.rpt | cdf 1 | CDF V2 Document, corrupt: Can't expand summary_info; charset=binary It seems that the crystal reports document is a CDF file (OLE2) but does not have the same structure as office documents do. Perhaps if I can look at a few samples I can figure out what is different. christos From oscaruser at programmer.net Sun Jul 12 00:34:56 2009 From: oscaruser at programmer.net (Oscar Usifer) Date: Sat, 11 Jul 2009 16:34:56 -0500 Subject: Fw: Crystal Reports file test reports 'CDF V2 Document, corrupt' Message-ID: <20090711213456.F270C326701@ws1-8.us4.outblaze.com> Hi Christos, I'll send you over a few off list. I added magic test file 'crystal' based on the hexdump sig that I saw from a few samples, and seems to work (thou limited testing) for file v.4.26. Thanks $ hexdump.exe -C Campaign\ Effectiveness.rpt | head 00000000 d0 cf 11 e0 a1 b1 1a e1 00 00 00 00 00 00 00 00 |................| 00000010 00 00 00 00 00 00 00 00 3e 00 03 00 fe ff 09 00 |........>.......| 00000020 06 00 00 00 00 00 00 00 00 00 00 00 03 00 00 00 |................| 00000030 00 00 00 00 00 00 00 00 00 10 00 00 09 00 00 00 |................| 00000040 02 00 00 00 fe ff ff ff 00 00 00 00 01 00 00 00 |................| 00000050 27 00 00 00 e4 00 00 00 ff ff ff ff ff ff ff ff |'...............| 00000060 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| * 00000200 52 00 6f 00 6f 00 74 00 20 00 45 00 6e 00 74 00 |R.o.o.t. .E.n.t.| 00000210 72 00 79 00 00 00 00 00 00 00 00 00 00 00 00 00 |r.y.............| [osc at host ~/file/src/file-4.26/magic/Magdir]$ cat crystal #------------------------------------------------------------------------------ # Crystal Reports: file(1) magic for Business Objects Crystal Reports format # 0 ubelong 0xd0cf11e0 >4 ubelong 0xa1b11ae1 Crystal Reports !:mime application/x-rpt [osc at host ~/file/src/file-4.26/magic/Magdir]$ > ----- Original Message ----- > From: christos at zoulas.com > To: "File Utility" > Subject: Re: Fw: Crystal Reports file test reports 'CDF V2 > Document, corrupt' > Date: Sat, 11 Jul 2009 17:12:22 -0400 > > > On Jul 11, 4:04pm, oscaruser at programmer.net ("Oscar Usifer") wrote: > -- Subject: Fw: Crystal Reports file test reports 'CDF V2 Document, corrupt' > > | Folks, > | > | File-5.03 file test is reporting 'corrupt' for crystal reports file > | test. Any ideas why? I added a magic for this, but even before > | getting to the test, file pid exits. File version 4.23 does not > | suffer from the same issue, and reports file as 'Microsoft Office > | Document' also incorrectly, but with a different result. > | > | Thanks, > | -OSC > | > | $ file -d test.rpt > | cdf 1 > | CDF V2 Document, corrupt: Can't expand summary_info; charset=binary > > It seems that the crystal reports document is a CDF file (OLE2) but > does not have the same structure as office documents do. Perhaps if > I can look at a few samples I can figure out what is different. > > christos > > _______________________________________________ > File mailing list > File at mx.gw.com > http://mx.gw.com/mailman/listinfo/file > -- Be Yourself @ mail.com! Choose From 200+ Email Addresses Get a Free Account at www.mail.com From christos at zoulas.com Sun Jul 12 01:02:04 2009 From: christos at zoulas.com (Christos Zoulas) Date: Sat, 11 Jul 2009 18:02:04 -0400 Subject: Fw: Crystal Reports file test reports 'CDF V2 Document, corrupt' In-Reply-To: <20090711213456.F270C326701@ws1-8.us4.outblaze.com> from "Oscar Usifer" (Jul 11, 4:34pm) Message-ID: <20090711220204.BA5075654E@rebar.astron.com> On Jul 11, 4:34pm, oscaruser at programmer.net ("Oscar Usifer") wrote: -- Subject: Re: Fw: Crystal Reports file test reports 'CDF V2 Document, corru | Hi Christos, | | I'll send you over a few off list. I added magic test file 'crystal' based on the hexdump sig that I saw from a few samples, and seems to work (thou limited testing) for file v.4.26. | Thanks! christos From oscaruser at programmer.net Mon Jul 13 20:58:51 2009 From: oscaruser at programmer.net (Oscar Usifer) Date: Mon, 13 Jul 2009 12:58:51 -0500 Subject: file-5.03 build has no magic Message-ID: <20090713175851.A833010613@ws1-3.us4.outblaze.com> Folks, I pun, in a way. 'make DESTDIR=/opt/rpm/build/file/dist-file-5.03 install' left absent '/usr/local/share/misc/magic', '/usr/local/share/file/magic.mime', and '/usr/local/share/file/magic.mime.mgc'. Is this an issue with the Makefile, and if so is there a quick fix? Thanks, -OSC [osc at host /opt/rpm/build/file]$ find dist-file-5.03 dist-file-5.03 dist-file-5.03/usr dist-file-5.03/usr/lib dist-file-5.03/usr/lib/libmagic.so.1.0.0 dist-file-5.03/usr/lib/libmagic.so.1 dist-file-5.03/usr/lib/libmagic.so dist-file-5.03/usr/lib/libmagic.la dist-file-5.03/usr/lib/libmagic.a dist-file-5.03/usr/local dist-file-5.03/usr/local/bin dist-file-5.03/usr/local/bin/file dist-file-5.03/usr/local/include dist-file-5.03/usr/local/include/magic.h dist-file-5.03/usr/local/share ** issue as listed below dist-file-5.03/usr/local/share/misc dist-file-5.03/usr/local/share/misc/magic.mgc ** dist-file-5.03/usr/local/share/man dist-file-5.03/usr/local/share/man/man1 dist-file-5.03/usr/local/share/man/man1/file.1 dist-file-5.03/usr/local/share/man/man3 dist-file-5.03/usr/local/share/man/man3/libmagic.3 dist-file-5.03/usr/local/share/man/man4 dist-file-5.03/usr/local/share/man/man4/magic.4 dist-file-5.03/usr/local/share/man/man5 -- Be Yourself @ mail.com! Choose From 200+ Email Addresses Get a Free Account at www.mail.com From christos at zoulas.com Mon Jul 13 21:10:06 2009 From: christos at zoulas.com (Christos Zoulas) Date: Mon, 13 Jul 2009 14:10:06 -0400 Subject: file-5.03 build has no magic In-Reply-To: <20090713175851.A833010613@ws1-3.us4.outblaze.com> from "Oscar Usifer" (Jul 13, 12:58pm) Message-ID: <20090713181006.315235654E@rebar.astron.com> On Jul 13, 12:58pm, oscaruser at programmer.net ("Oscar Usifer") wrote: -- Subject: file-5.03 build has no magic | Folks, | | I pun, in a way. 'make DESTDIR=/opt/rpm/build/file/dist-file-5.03 install' left absent '/usr/local/share/misc/magic', '/usr/local/share/file/magic.mime', and '/usr/local/share/file/magic.mime.mgc'. Is this an issue with the Makefile, and if so is there a quick fix? | | Thanks, | -OSC I don't understand what you are asking, but we only install a single magic.mgc file in 5.03. christos From vapier at gentoo.org Mon Jul 13 21:15:09 2009 From: vapier at gentoo.org (Mike Frysinger) Date: Mon, 13 Jul 2009 14:15:09 -0400 Subject: file-5.03 build has no magic In-Reply-To: <20090713175851.A833010613@ws1-3.us4.outblaze.com> References: <20090713175851.A833010613@ws1-3.us4.outblaze.com> Message-ID: <200907131415.09711.vapier@gentoo.org> On Monday 13 July 2009 13:58:51 Oscar Usifer wrote: > I pun, in a way. 'make DESTDIR=/opt/rpm/build/file/dist-file-5.03 install' > left absent '/usr/local/share/misc/magic', > '/usr/local/share/file/magic.mime', and > '/usr/local/share/file/magic.mime.mgc'. Is this an issue with the Makefile, > and if so is there a quick fix? this is correct behavior for file-5.03 -mike -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 836 bytes Desc: This is a digitally signed message part. URL: From oscaruser at programmer.net Tue Jul 14 00:29:39 2009 From: oscaruser at programmer.net (Oscar Usifer) Date: Mon, 13 Jul 2009 16:29:39 -0500 Subject: file-5.03 build has no magic Message-ID: <20090713212944.55D9410612@ws1-3.us4.outblaze.com> That was because of what file itself prints. It lead me to believe the logic was still actively checking for the existance of this file, and reading it if present. Also someone requested that I send them my magic file, but I could not find this file in the 5.03 dist. Thanks $ file -v file-5.03 magic file from /usr/share/misc/magic > ----- Original Message ----- > From: christos at zoulas.com > To: "File Utility" > Subject: Re: file-5.03 build has no magic > Date: Mon, 13 Jul 2009 14:10:06 -0400 > > > On Jul 13, 12:58pm, oscaruser at programmer.net ("Oscar Usifer") wrote: > -- Subject: file-5.03 build has no magic > > | Folks, > | > | I pun, in a way. 'make DESTDIR=/opt/rpm/build/file/dist-file-5.03 > install' left absent '/usr/local/share/misc/magic', > '/usr/local/share/file/magic.mime', and > '/usr/local/share/file/magic.mime.mgc'. Is this an issue with the > Makefile, and if so is there a quick fix? > | > | Thanks, > | -OSC > > I don't understand what you are asking, but we only install a single magic.mgc > file in 5.03. > > christos > > _______________________________________________ > File mailing list > File at mx.gw.com > http://mx.gw.com/mailman/listinfo/file > -- Be Yourself @ mail.com! Choose From 200+ Email Addresses Get a Free Account at www.mail.com From christos at zoulas.com Tue Jul 14 01:07:17 2009 From: christos at zoulas.com (Christos Zoulas) Date: Mon, 13 Jul 2009 18:07:17 -0400 Subject: file-5.03 build has no magic In-Reply-To: <20090713212944.55D9410612@ws1-3.us4.outblaze.com> from "Oscar Usifer" (Jul 13, 4:29pm) Message-ID: <20090713220717.F163E5654E@rebar.astron.com> On Jul 13, 4:29pm, oscaruser at programmer.net ("Oscar Usifer") wrote: -- Subject: Re: file-5.03 build has no magic | That was because of what file itself prints. It lead me to believe the logic was still actively checking for the existance of this file, and reading it if present. Also someone requested that I send them my magic file, but I could not find this file in the 5.03 dist. | | Thanks | | $ file -v | file-5.03 | magic file from /usr/share/misc/magic That's a bug! christos From christos at zoulas.com Wed Jul 15 00:51:17 2009 From: christos at zoulas.com (Christos Zoulas) Date: Tue, 14 Jul 2009 17:51:17 -0400 Subject: crystal reports patch Message-ID: <20090714215117.46B905654E@rebar.astron.com> The following adds the types crystal reports needs to be recognized. Does anyone know what mime type to set? christos Index: cdf.c =================================================================== RCS file: /p/file/cvsroot/file/src/cdf.c,v retrieving revision 1.32 diff -u -u -r1.32 cdf.c --- cdf.c 8 May 2009 23:25:46 -0000 1.32 +++ cdf.c 14 Jul 2009 21:49:10 -0000 @@ -770,6 +770,7 @@ if (inp[i].pi_type & (CDF_ARRAY|CDF_BYREF|CDF_RESERVED)) goto unknown; switch (inp[i].pi_type & CDF_TYPEMASK) { + case CDF_NULL: case CDF_EMPTY: break; case CDF_SIGNED16: @@ -804,6 +805,7 @@ inp[i].pi_u64 = CDF_TOLE8((uint64_t)u64); break; case CDF_LENGTH32_STRING: + case CDF_LENGTH32_WSTRING: if (nelements > 1) { size_t nelem = inp - *info; if (*maxcount > CDF_PROP_LIMIT @@ -1112,12 +1114,14 @@ cdf_timestamp_t tp; struct timespec ts; char buf[64]; - size_t i; + size_t i, j; for (i = 0; i < count; i++) { cdf_print_property_name(buf, sizeof(buf), info[i].pi_id); (void)fprintf(stderr, "%zu) %s: ", i, buf); switch (info[i].pi_type) { + case CDF_NULL: + break; case CDF_SIGNED16: (void)fprintf(stderr, "signed 16 [%hd]\n", info[i].pi_s16); @@ -1135,6 +1139,13 @@ info[i].pi_str.s_len, info[i].pi_str.s_len, info[i].pi_str.s_buf); break; + case CDF_LENGTH32_WSTRING: + (void)fprintf(stderr, "string %u [", + info[i].pi_str.s_len); + for (j = 0; j < info[i].pi_str.s_len - 1; j++) + (void)fputc(info[i].pi_str.s_buf[j << 1], stderr); + (void)fprintf(stderr, "]\n"); + break; case CDF_FILETIME: tp = info[i].pi_tp; if (tp < 1000000000000000LL) { Index: readcdf.c =================================================================== RCS file: /p/file/cvsroot/file/src/readcdf.c,v retrieving revision 1.19 diff -u -u -r1.19 readcdf.c --- readcdf.c 8 May 2009 17:41:59 -0000 1.19 +++ readcdf.c 14 Jul 2009 21:49:10 -0000 @@ -55,6 +55,8 @@ for (i = 0; i < count; i++) { cdf_print_property_name(buf, sizeof(buf), info[i].pi_id); switch (info[i].pi_type) { + case CDF_NULL: + break; case CDF_SIGNED16: if (NOTMIME(ms) && file_printf(ms, ", %s: %hd", buf, info[i].pi_s16) == -1) @@ -71,22 +73,26 @@ return -1; break; case CDF_LENGTH32_STRING: + case CDF_LENGTH32_WSTRING: len = info[i].pi_str.s_len; if (len > 1) { + char vbuf[1024]; + size_t j, k = 1; + + if (info[i].pi_type == CDF_LENGTH32_WSTRING) + k++; s = info[i].pi_str.s_buf; + for (j = 0; j < sizeof(vbuf) && len--; + j++, s += k) { + if (*s == '\0') + break; + if (isprint((unsigned char)*s)) + vbuf[j] = *s; + } + if (j == sizeof(vbuf)) + --j; + vbuf[j] = '\0'; if (NOTMIME(ms)) { - char vbuf[1024]; - size_t j; - for (j = 0; j < sizeof(vbuf) && len--; - j++, s++) { - if (*s == '\0') - break; - if (isprint((unsigned char)*s)) - vbuf[j] = *s; - } - if (j == sizeof(vbuf)) - --j; - vbuf[j] = '\0'; if (vbuf[0]) { if (file_printf(ms, ", %s: %s", buf, vbuf) == -1) @@ -94,11 +100,11 @@ } } else if (info[i].pi_id == CDF_PROPERTY_NAME_OF_APPLICATION) { - if (strstr(s, "Word")) + if (strstr(vbuf, "Word")) str = "msword"; - else if (strstr(s, "Excel")) + else if (strstr(vbuf, "Excel")) str = "vnd.ms-excel"; - else if (strstr(s, "Powerpoint")) + else if (strstr(vbuf, "Powerpoint")) str = "vnd.ms-powerpoint"; } } From oscaruser at programmer.net Thu Jul 16 20:50:39 2009 From: oscaruser at programmer.net (Oscar Usifer) Date: Thu, 16 Jul 2009 12:50:39 -0500 Subject: crystal reports patch Message-ID: <20090716175039.326C4BE407E@ws1-9.us4.outblaze.com> Based on looking at the filext.com site, I set as follows. Also I tried patching the sources file-5.03 sources, but received Hunk patch fails. Thanks application/x-rpts; charset=binary http://filext.com/file-extension/RPT MIME Type application/x-rpt magnus-internal/rpt [jay at phase ~/file/file-5.03/src]$ patch < patch.diff Hmm... Looks like a unified diff to me... The text leading up to this was: -------------------------- |Index: cdf.c |=================================================================== |RCS file: /p/file/cvsroot/file/src/cdf.c,v |retrieving revision 1.32 |diff -u -u -r1.32 cdf.c |--- cdf.c 8 May 2009 23:25:46 -0000 1.32 |+++ cdf.c 14 Jul 2009 21:49:10 -0000 -------------------------- Patching file cdf.c using Plan A... Hunk #1 failed at 770. Hunk #2 failed at 805. Hunk #3 failed at 1114. Hunk #4 failed at 1139. 4 out of 4 hunks failed--saving rejects to cdf.c.rej Hmm... The next patch looks like a unified diff to me... The text leading up to this was: -------------------------- |Index: readcdf.c |=================================================================== |RCS file: /p/file/cvsroot/file/src/readcdf.c,v |retrieving revision 1.19 |diff -u -u -r1.19 readcdf.c |--- readcdf.c 8 May 2009 17:41:59 -0000 1.19 |+++ readcdf.c 14 Jul 2009 21:49:10 -0000 -------------------------- Patching file readcdf.c using Plan A... Hunk #1 failed at 55. Hunk #2 failed at 73. Hunk #3 failed at 100. 3 out of 3 hunks failed--saving rejects to readcdf.c.rej done > ----- Original Message ----- > From: christos at zoulas.com > To: file at mx.gw.com > Subject: crystal reports patch > Date: Tue, 14 Jul 2009 17:51:17 -0400 > > > > The following adds the types crystal reports needs to be recognized. Does > anyone know what mime type to set? > > christos > > Index: cdf.c > =================================================================== > RCS file: /p/file/cvsroot/file/src/cdf.c,v > retrieving revision 1.32 > diff -u -u -r1.32 cdf.c > --- cdf.c 8 May 2009 23:25:46 -0000 1.32 > +++ cdf.c 14 Jul 2009 21:49:10 -0000 > @@ -770,6 +770,7 @@ > if (inp[i].pi_type & (CDF_ARRAY|CDF_BYREF|CDF_RESERVED)) > goto unknown; > switch (inp[i].pi_type & CDF_TYPEMASK) { > + case CDF_NULL: > case CDF_EMPTY: > break; > case CDF_SIGNED16: > @@ -804,6 +805,7 @@ > inp[i].pi_u64 = CDF_TOLE8((uint64_t)u64); > break; > case CDF_LENGTH32_STRING: > + case CDF_LENGTH32_WSTRING: > if (nelements > 1) { > size_t nelem = inp - *info; > if (*maxcount > CDF_PROP_LIMIT > @@ -1112,12 +1114,14 @@ > cdf_timestamp_t tp; > struct timespec ts; > char buf[64]; > - size_t i; > + size_t i, j; > > for (i = 0; i < count; i++) { > cdf_print_property_name(buf, sizeof(buf), info[i].pi_id); > (void)fprintf(stderr, "%zu) %s: ", i, buf); > switch (info[i].pi_type) { > + case CDF_NULL: > + break; > case CDF_SIGNED16: > (void)fprintf(stderr, "signed 16 [%hd]\n", > info[i].pi_s16); > @@ -1135,6 +1139,13 @@ > info[i].pi_str.s_len, > info[i].pi_str.s_len, info[i].pi_str.s_buf); > break; > + case CDF_LENGTH32_WSTRING: > + (void)fprintf(stderr, "string %u [", > + info[i].pi_str.s_len); > + for (j = 0; j < info[i].pi_str.s_len - 1; j++) > + (void)fputc(info[i].pi_str.s_buf[j << 1], stderr); > + (void)fprintf(stderr, "]\n"); > + break; > case CDF_FILETIME: > tp = info[i].pi_tp; > if (tp < 1000000000000000LL) { > Index: readcdf.c > =================================================================== > RCS file: /p/file/cvsroot/file/src/readcdf.c,v > retrieving revision 1.19 > diff -u -u -r1.19 readcdf.c > --- readcdf.c 8 May 2009 17:41:59 -0000 1.19 > +++ readcdf.c 14 Jul 2009 21:49:10 -0000 > @@ -55,6 +55,8 @@ > for (i = 0; i < count; i++) { > cdf_print_property_name(buf, sizeof(buf), info[i].pi_id); > switch (info[i].pi_type) { > + case CDF_NULL: > + break; > case CDF_SIGNED16: > if (NOTMIME(ms) && file_printf(ms, ", %s: %hd", buf, > info[i].pi_s16) == -1) > @@ -71,22 +73,26 @@ > return -1; > break; > case CDF_LENGTH32_STRING: > + case CDF_LENGTH32_WSTRING: > len = info[i].pi_str.s_len; > if (len > 1) { > + char vbuf[1024]; > + size_t j, k = 1; > + > + if (info[i].pi_type == CDF_LENGTH32_WSTRING) > + k++; > s = info[i].pi_str.s_buf; > + for (j = 0; j < sizeof(vbuf) && len--; > + j++, s += k) { > + if (*s == '\0') > + break; > + if (isprint((unsigned char)*s)) > + vbuf[j] = *s; > + } > + if (j == sizeof(vbuf)) > + --j; > + vbuf[j] = '\0'; > if (NOTMIME(ms)) { > - char vbuf[1024]; > - size_t j; > - for (j = 0; j < sizeof(vbuf) && len--; > - j++, s++) { > - if (*s == '\0') > - break; > - if (isprint((unsigned char)*s)) > - vbuf[j] = *s; > - } > - if (j == sizeof(vbuf)) > - --j; > - vbuf[j] = '\0'; > if (vbuf[0]) { > if (file_printf(ms, ", %s: %s", > buf, vbuf) == -1) > @@ -94,11 +100,11 @@ > } > } else if (info[i].pi_id == > CDF_PROPERTY_NAME_OF_APPLICATION) { > - if (strstr(s, "Word")) > + if (strstr(vbuf, "Word")) > str = "msword"; > - else if (strstr(s, "Excel")) > + else if (strstr(vbuf, "Excel")) > str = "vnd.ms-excel"; > - else if (strstr(s, "Powerpoint")) > + else if (strstr(vbuf, "Powerpoint")) > str = "vnd.ms-powerpoint"; > } > } > > _______________________________________________ > File mailing list > File at mx.gw.com > http://mx.gw.com/mailman/listinfo/file > -- Be Yourself @ mail.com! Choose From 200+ Email Addresses Get a Free Account at www.mail.com From christos at zoulas.com Thu Jul 16 21:03:45 2009 From: christos at zoulas.com (Christos Zoulas) Date: Thu, 16 Jul 2009 14:03:45 -0400 Subject: crystal reports patch In-Reply-To: <20090716175039.326C4BE407E@ws1-9.us4.outblaze.com> from "Oscar Usifer" (Jul 16, 12:50pm) Message-ID: <20090716180346.062245654E@rebar.astron.com> On Jul 16, 12:50pm, oscaruser at programmer.net ("Oscar Usifer") wrote: -- Subject: Re: crystal reports patch | Based on looking at the filext.com site, I set as follows. Also I tried patching the sources file-5.03 sources, but received Hunk patch fails. | | Thanks | | | application/x-rpts; charset=binary | | http://filext.com/file-extension/RPT | | MIME Type | application/x-rpt | magnus-internal/rpt Hmm, I cannot find any documentation on those mime types... grab a complete tar from ftp://ftp.astron.com/pri/file-5.03.tar.gz christos From dnovotny at redhat.com Wed Jul 22 16:53:07 2009 From: dnovotny at redhat.com (Daniel Novotny) Date: Wed, 22 Jul 2009 09:53:07 -0400 (EDT) Subject: magic entry for xfs metadump images In-Reply-To: <749681855.823541248270755564.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> Message-ID: <1112285629.823601248270787252.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> hello, xfs metadump images are generated from xfs filesystems with the "xfs_metadump" utility, similar to e2image for ext2/3/4 attached a patch with the new magic best regards, Daniel Novotny, Red Hat inc. -------------- next part -------------- A non-text attachment was scrubbed... Name: file-5.03-xfsdump.patch Type: text/x-patch Size: 664 bytes Desc: not available URL: From christos at zoulas.com Wed Jul 22 18:46:50 2009 From: christos at zoulas.com (Christos Zoulas) Date: Wed, 22 Jul 2009 11:46:50 -0400 Subject: magic entry for xfs metadump images In-Reply-To: <1112285629.823601248270787252.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> from Daniel Novotny (Jul 22, 9:53am) Message-ID: <20090722154650.7CEDC5654E@rebar.astron.com> On Jul 22, 9:53am, dnovotny at redhat.com (Daniel Novotny) wrote: -- Subject: magic entry for xfs metadump images | hello, | | xfs metadump images are generated from xfs filesystems with the "xfs_metadump" | utility, similar to e2image for ext2/3/4 | | attached a patch with the new magic | | best regards, Committed, thanks! christos From dnovotny at redhat.com Tue Jul 28 15:20:48 2009 From: dnovotny at redhat.com (Daniel Novotny) Date: Tue, 28 Jul 2009 08:20:48 -0400 (EDT) Subject: file is confused by string "/* (if any) */" in C header and claims it "Lisp/Scheme program text" In-Reply-To: <1466342919.100981248783557221.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> Message-ID: <1228973646.101161248783648042.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> hello, in the patterns for "Lisp/Scheme" program text, there is one which searches for the string "(if" - while it's a lisp command (can be a macro or a special form) it can be contained in natural language texts and/or comments in non-Lisp source code in a cases like "(if any)", "(if you please)" and such. the other patterns in lisp magic file (like "(setq" and "(defun" ) will help identify Lisp files and cannot be mistaken, so I solved this situation by removing the "(if" Patch is attached. for the specific case which did us trouble see Red Hat bug report: https://bugzilla.redhat.com/show_bug.cgi?id=510429 best regards, Daniel Novotny, Red Hat inc. -------------- next part -------------- A non-text attachment was scrubbed... Name: file-5.03-ifany.patch Type: text/x-patch Size: 507 bytes Desc: not available URL: From christos at zoulas.com Tue Jul 28 15:24:39 2009 From: christos at zoulas.com (Christos Zoulas) Date: Tue, 28 Jul 2009 08:24:39 -0400 Subject: file is confused by string "/* (if any) */" in C header and claims it "Lisp/Scheme program text" In-Reply-To: <1228973646.101161248783648042.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> from Daniel Novotny (Jul 28, 8:20am) Message-ID: <20090728122439.5DB765654E@rebar.astron.com> On Jul 28, 8:20am, dnovotny at redhat.com (Daniel Novotny) wrote: -- Subject: file is confused by string "/* (if any) */" in C header and claim | hello, | | in the patterns for "Lisp/Scheme" program text, there is one which | searches for the string "(if" - while it's a lisp command (can be a macro or a special form) | it can be contained in natural language texts and/or comments in non-Lisp source | code in a cases like "(if any)", "(if you please)" and such. | | the other patterns in lisp magic file (like "(setq" and "(defun" ) will help | identify Lisp files and cannot be mistaken, so I solved this situation | by removing the "(if" | | Patch is attached. | | for the specific case which did us trouble see Red Hat bug report: | https://bugzilla.redhat.com/show_bug.cgi?id=510429 | | best regards, | | Daniel Novotny, Red Hat inc. I will remove it, thanks. christos From ian at darwinsys.com Tue Jul 28 18:08:25 2009 From: ian at darwinsys.com (Ian Darwin) Date: Tue, 28 Jul 2009 11:08:25 -0400 Subject: Contact/Other In-Reply-To: <1210403570.1248776338849.JavaMail._tomcat@localhost.darwinsys.com> References: <1210403570.1248776338849.JavaMail._tomcat@localhost.darwinsys.com> Message-ID: <4A6F1469.5000906@darwinsys.com> Paul Mannino wrote: > I recently ran your "File for Windows" program across my server. The program crashed on a number of files. Specifically, a bunch of MS artifacts from Visual Studio compilations. The file extensions were .IDB, .PDB, and .NCB. It also failed to identify the new MS Office Documents as such. The program thinks they're ZIP files. I work in computer forensics and have an interest in this stuff. I can send you sample files if you wish. > > First, please go back to my contact form and follow the link: "If you are *reporting a problem with my software*, please read this first" . Please do not think you're exempt from doing your homework because you "work in forensics" :-) Secondly, where did you get it from? We only build the Source Code and a few Unix versions; others build the code for other platforms. The problem could be an old version (see the latest version number here: ftp://ftp.astron.com/pub/file/), or a bad build. Or you might have a genuine problem with the file's "magic" data, in which case we'd be glad to work on making the software work better. Please reply to the mailing list, thank you. From jean.revertera at no-log.org Sun Aug 16 13:04:58 2009 From: jean.revertera at no-log.org (Jean Revertera) Date: Sun, 16 Aug 2009 12:04:58 +0200 (CEST) Subject: Duplicate/Wrong magic entry Message-ID: <11971.AQYCXllbDX4=.1250417098.squirrel@webmail.no-log.org> I've just noticed (in file version 5.03) that there are two entries for TNEF files. One of them is in 'Magdir/mail.news': ----- # TNEF files... 0 lelong 0x223E9F78 Transport Neutral Encapsulation Format ----- And the other one in 'Magdir/msdos': ----- # TNEF magic From "Joomy" # Microsoft Outlook's Transport Neutral Encapsulation Format (TNEF) 0 leshort 0x223e9f78 TNEF !:mime application/vnd.ms-tnef ----- The latter one, more complete, should probably be kept... However, the specified value type seems incorrect, since 0x223e9f78 obviously won't fit in a short. Suggested fix: # TNEF magic From "Joomy" # Microsoft Outlook's Transport Neutral Encapsulation Format (TNEF) 0 lelong 0x223e9f78 Transport Neutral Encapsulation Format (TNEF) !:mime application/vnd.ms-tnef Hope this helps, JR From dnovotny at redhat.com Mon Aug 24 12:00:10 2009 From: dnovotny at redhat.com (Daniel Novotny) Date: Mon, 24 Aug 2009 05:00:10 -0400 (EDT) Subject: how much machine-dependent is the magic.mgc file? In-Reply-To: <1620051688.139911251104301588.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> Message-ID: <110979750.140021251104410025.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> hello, we started having a problem with the magic.mgc file: it used to be the same for i386 and x86_64 architectures, but now it's not (something changed in our build system, perhaps newer gcc or target i586/i686) the problem is with people who use multilib environment and need both i386 and x86_64 file-libs packages: when the packaging system sees both files are the same, it works, but when they differ, it complains and fail to install both packages... seems both files have the same size, but differ inside: [dnovotny at dhcp-0-118 mgc]$ ls -l total 3432 -rw-rw-r-- 1 dnovotny dnovotny 1751200 2009-08-06 13:34 m32b -rw-rw-r-- 1 dnovotny dnovotny 1751200 2009-08-20 16:35 m64b [dnovotny at dhcp-0-118 mgc]$ cmp m32b m64b m32b m64b differ: byte 221, line 1 [dnovotny at dhcp-0-118 mgc]$ is there any configure switch or something like that, which may help me in this situation? thanks, best regards, Daniel Novotny, Red Hat inc. From christos at zoulas.com Mon Aug 24 15:52:16 2009 From: christos at zoulas.com (Christos Zoulas) Date: Mon, 24 Aug 2009 08:52:16 -0400 Subject: how much machine-dependent is the magic.mgc file? In-Reply-To: <110979750.140021251104410025.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> from Daniel Novotny (Aug 24, 5:00am) Message-ID: <20090824125216.6A67C5654E@rebar.astron.com> On Aug 24, 5:00am, dnovotny at redhat.com (Daniel Novotny) wrote: -- Subject: how much machine-dependent is the magic.mgc file? | hello, | | we started having a problem with the magic.mgc file: | it used to be the same for i386 and x86_64 architectures, | but now it's not (something changed in our build system, | perhaps newer gcc or target i586/i686) | | the problem is with people who use multilib environment | and need both i386 and x86_64 file-libs packages: | when the packaging system sees both files are the | same, it works, but when they differ, it complains | and fail to install both packages... | | seems both files have the same size, but differ inside: | | [dnovotny at dhcp-0-118 mgc]$ ls -l | total 3432 | -rw-rw-r-- 1 dnovotny dnovotny 1751200 2009-08-06 13:34 m32b | -rw-rw-r-- 1 dnovotny dnovotny 1751200 2009-08-20 16:35 m64b | [dnovotny at dhcp-0-118 mgc]$ cmp m32b m64b | m32b m64b differ: byte 221, line 1 | [dnovotny at dhcp-0-118 mgc]$ | | is there any configure switch or something like that, | which may help me in this situation? | | thanks, best regards, | | Daniel Novotny, Red Hat inc. Looks like byte 221 is around the mimetype[] array in the the file struct. I looked and there is a memset() when we allocate new magic struct so there could not be junk there. Can you od both files and take a look? christos From dnovotny at redhat.com Mon Aug 24 17:57:27 2009 From: dnovotny at redhat.com (Daniel Novotny) Date: Mon, 24 Aug 2009 10:57:27 -0400 (EDT) Subject: how much machine-dependent is the magic.mgc file? In-Reply-To: <20090824125216.6A67C5654E@rebar.astron.com> Message-ID: <2037545240.159991251125847371.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> ----- "Christos Zoulas" wrote: > > Looks like byte 221 is around the mimetype[] array in the the file > struct. I looked and there is a memset() when we allocate new magic > struct so there could not be junk there. Can you od both files and > take a look? > > christos [dnovotny at dhcp-0-118 mgc]$ od -x -j 200 m32b | head -20 0000310 0000 0020 1f3d 0005 0000 0000 0000 0000 0000330 0000 0000 007c 0000 0000 0000 0000 0000 0000350 6f43 6162 746c 4e20 7465 6f77 6b72 2073 0000370 6e49 2e63 460a 7269 776d 7261 2065 0076 0000410 6150 6567 2064 4f43 4142 544c 6220 6f6f 0000430 2074 6f72 006d 0000 0000 0000 0000 0000 0000450 0000 0000 0000 0000 0000 0000 0000 0000 * 0000610 0000 0000 0000 0000 0001 0000 0078 0005 0000630 0000 0000 0026 0000 0000 0000 007d 0000 0000650 0000 0000 0000 0000 0000 0000 0000 0000 * 0000710 0000 0000 0000 0000 2556 342e 0073 0000 0000730 0000 0000 0000 0000 0000 0000 0000 0000 * 0001130 0000 0020 1f3d 0005 0000 0000 0000 0000 0001150 0000 0000 0026 0000 0000 0000 0000 0000 0001170 2a28 6854 7369 6920 2073 2061 614d 6874 0001210 6d65 7461 6369 2061 6962 616e 7972 0020 0001230 614d 6874 6d65 7461 6369 2061 6962 616e [dnovotny at dhcp-0-118 mgc]$ od -x -j 200 m64b | head -20 0000310 0000 0020 1f3d 0005 0000 0000 0000 0000 0000330 0000 0000 0229 0000 0000 0000 0000 0000 0000350 4353 3836 4d20 7375 6369 662d 6c69 2065 0000370 202f 6328 2029 4228 4e65 6a29 6d61 0069 0000410 6373 3836 4120 6174 6972 5320 2054 756d 0000430 6973 0063 0000 0000 0000 0000 0000 0000 0000450 0000 0000 0000 0000 0000 0000 0000 0000 * 0000610 0000 0000 0000 0000 0000 0020 1f3d 0005 0000630 0000 0000 0000 0000 0000 0000 0021 0000 0000650 0000 0000 0000 0000 3643 2034 6174 6570 0000670 6920 616d 6567 6620 6c69 0065 0000 0000 0000710 0000 0000 0000 0000 3654 2034 6174 6570 0000730 4920 616d 6567 0000 0000 0000 0000 0000 0000750 0000 0000 0000 0000 0000 0000 0000 0000 * 0001130 0001 0000 0078 000a 0000 0000 0020 0000 0001150 0000 0000 0022 0000 0000 0000 0000 0000 0001170 0000 0000 0000 0000 0000 0000 0000 0000 btw the files are on http://danielsoft.sweb.cz/f/m32b http://danielsoft.sweb.cz/f/m64b Daniel From christos at zoulas.com Mon Aug 24 21:47:41 2009 From: christos at zoulas.com (Christos Zoulas) Date: Mon, 24 Aug 2009 14:47:41 -0400 Subject: how much machine-dependent is the magic.mgc file? In-Reply-To: <2037545240.159991251125847371.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> from Daniel Novotny (Aug 24, 10:57am) Message-ID: <20090824184741.5B2695654F@rebar.astron.com> On Aug 24, 10:57am, dnovotny at redhat.com (Daniel Novotny) wrote: -- Subject: Re: how much machine-dependent is the magic.mgc file? | | ----- "Christos Zoulas" wrote: | > | > Looks like byte 221 is around the mimetype[] array in the the file | > struct. I looked and there is a memset() when we allocate new magic | > struct so there could not be junk there. Can you od both files and | > take a look? | > | > christos | | [dnovotny at dhcp-0-118 mgc]$ od -x -j 200 m32b | head -20 | 0000310 0000 0020 1f3d 0005 0000 0000 0000 0000 | 0000330 0000 0000 007c 0000 0000 0000 0000 0000 | 0000350 6f43 6162 746c 4e20 7465 6f77 6b72 2073 | 0000370 6e49 2e63 460a 7269 776d 7261 2065 0076 | 0000410 6150 6567 2064 4f43 4142 544c 6220 6f6f | 0000430 2074 6f72 006d 0000 0000 0000 0000 0000 | 0000450 0000 0000 0000 0000 0000 0000 0000 0000 Cobalt Networks Inc. Firmware vPaged COBALT boot rom | * | 0000610 0000 0000 0000 0000 0001 0000 0078 0005 | 0000630 0000 0000 0026 0000 0000 0000 007d 0000 | 0000650 0000 0000 0000 0000 0000 0000 0000 0000 | * | 0000710 0000 0000 0000 0000 2556 342e 0073 0000 | 0000730 0000 0000 0000 0000 0000 0000 0000 0000 | * | 0001130 0000 0020 1f3d 0005 0000 0000 0000 0000 | 0001150 0000 0000 0026 0000 0000 0000 0000 0000 | 0001170 2a28 6854 7369 6920 2073 2061 614d 6874 | 0001210 6d65 7461 6369 2061 6962 616e 7972 0020 | 0001230 614d 6874 6d65 7461 6369 2061 6962 616e | [dnovotny at dhcp-0-118 mgc]$ od -x -j 200 m64b | head -20 | 0000310 0000 0020 1f3d 0005 0000 0000 0000 0000 | 0000330 0000 0000 0229 0000 0000 0000 0000 0000 | 0000350 4353 3836 4d20 7375 6369 662d 6c69 2065 | 0000370 202f 6328 2029 4228 4e65 6a29 6d61 0069 | 0000410 6373 3836 4120 6174 6972 5320 2054 756d | 0000430 6973 0063 0000 0000 0000 0000 0000 0000 | 0000450 0000 0000 0000 0000 0000 0000 0000 0000 SC68 Music-file / (c) (BeN)jamisc68 Atari ST music | * | 0000610 0000 0000 0000 0000 0000 0020 1f3d 0005 | 0000630 0000 0000 0000 0000 0000 0000 0021 0000 | 0000650 0000 0000 0000 0000 3643 2034 6174 6570 | 0000670 6920 616d 6567 6620 6c69 0065 0000 0000 | 0000710 0000 0000 0000 0000 3654 2034 6174 6570 | 0000730 4920 616d 6567 0000 0000 0000 0000 0000 | 0000750 0000 0000 0000 0000 0000 0000 0000 0000 | * | 0001130 0001 0000 0078 000a 0000 0000 0020 0000 | 0001150 0000 0000 0022 0000 0000 0000 0000 0000 | 0001170 0000 0000 0000 0000 0000 0000 0000 0000 | | btw the files are on http://danielsoft.sweb.cz/f/m32b http://danielsoft.sweb.cz/f/m64b Sorting of the magic entries is different/broken? christos From dnovotny at redhat.com Tue Aug 25 14:15:28 2009 From: dnovotny at redhat.com (Daniel Novotny) Date: Tue, 25 Aug 2009 07:15:28 -0400 (EDT) Subject: how much machine-dependent is the magic.mgc file? In-Reply-To: <607005564.215061251198886404.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> Message-ID: <1849956602.215081251198928483.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> ----- "Christos Zoulas" wrote: > Sorting of the magic entries is different/broken? > > christos > in Makefile.am you have: $(FILE_COMPILE) -C -m $(MAGIC_FRAGMENT_DIR) this can produce different order on different environments (filesystem etc.) I have change it to $(FILE_COMPILE) -C -m $(MAGIC_FRAGMENT_DIR)/* ensuring alphabetic order patch attached, Daniel Novotny, Red Hat inc. -------------- next part -------------- A non-text attachment was scrubbed... Name: file-5.03-multilib.patch Type: text/x-patch Size: 458 bytes Desc: not available URL: From christos at zoulas.com Tue Aug 25 15:36:38 2009 From: christos at zoulas.com (Christos Zoulas) Date: Tue, 25 Aug 2009 08:36:38 -0400 Subject: how much machine-dependent is the magic.mgc file? In-Reply-To: <1849956602.215081251198928483.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> from Daniel Novotny (Aug 25, 7:15am) Message-ID: <20090825123638.A657F5654E@rebar.astron.com> On Aug 25, 7:15am, dnovotny at redhat.com (Daniel Novotny) wrote: -- Subject: Re: how much machine-dependent is the magic.mgc file? | > Sorting of the magic entries is different/broken? | > | > christos | > | | in Makefile.am you have: | | $(FILE_COMPILE) -C -m $(MAGIC_FRAGMENT_DIR) | | this can produce different order on different environments (filesystem etc.) | I have change it to | | $(FILE_COMPILE) -C -m $(MAGIC_FRAGMENT_DIR)/* | | ensuring alphabetic order | | patch attached, | | Daniel Novotny, Red Hat inc. | ------=_Part_6834_908321754.1251198928482 | Content-Type: text/x-patch; name=file-5.03-multilib.patch | Content-Transfer-Encoding: 7bit | Content-Disposition: attachment; filename=file-5.03-multilib.patch | | diff -up file-5.03/magic/Makefile.am.multilib file-5.03/magic/Makefile.am | --- file-5.03/magic/Makefile.am.multilib 2009-08-25 12:45:46.000000000 +0200 | +++ file-5.03/magic/Makefile.am 2009-08-25 12:50:09.000000000 +0200 | @@ -234,5 +234,5 @@ FILE_COMPILE_DEP = $(FILE_COMPILE) | endif | | ${MAGIC}: $(EXTRA_DIST) $(FILE_COMPILE_DEP) | - $(FILE_COMPILE) -C -m $(MAGIC_FRAGMENT_DIR) | + $(FILE_COMPILE) -C -m $(MAGIC_FRAGMENT_DIR)/* | @mv $(MAGIC_FRAGMENT_BASE).mgc $@ | Ok, I will do that for now! Thanks, christos From asbjorn at asbjorn.biz Wed Aug 26 19:59:42 2009 From: asbjorn at asbjorn.biz (=?ISO-8859-1?Q?Asbj=F8rn_Sloth_T=F8nnesen?=) Date: Wed, 26 Aug 2009 16:59:42 +0000 Subject: [repost] Font mime types patch Message-ID: <4A9569FE.7020104@asbjorn.biz> Hi, Lets start out with the straight forward stuff first: Spline Font Database ==================== In May 2008 George Williams registered application/vnd.font-fontforge-sfd with IANA, so theres no doubt that its the official MIME type. Good work. http://www.iana.org/assignments/media-types/application/vnd.font-fontforge-sfd Truetype ======== No registered MIME type, but application/x-font-ttf seam to be the most videly used, followed by application/x-truetype-font, and in the Adobe Flash world application/x-font-truetype is used. http://www.google.com/search?q="application/x-font-ttf" http://www.google.com/search?q="application/x-truetype-font" http://www.google.com/search?q="application/x-font-truetype" http://www.google.com/search?hl=en&q=mime+application+truetype ISO/IEC JTC 1/SC34 are working on a new font top-level medatype. But on the other hand they also recognize "application/x-font-ttf" as being the experimental (read: not standardized) defacto MIME type for Truetype fonts. OpenType ======== Format developed by Adobe and Microsoft Specs hosted by Adobe, according to Wikipedia, http://partners.adobe.com/public/developer/opentype/index_spec.html but Adobe redirects to Microsoft: http://www.microsoft.com/typography/otspec/ In Microsoft's XPS docs, they say that the OpenFont media type is application/vnd.ms-opentype Google search for application/vnd.ms-opentype on microsoft.com http://www.google.com/search?q=application%2Fvnd.ms-opentype+site%3Amicrosoft.com However neither Microsoft, Adobe or other members of the ISO wg have registered this mime MIME with IANA. -- Best regards Asbj?rn Sloth T?nnesen Backend System Architect Lila ApS http://lila.io/ -------------- next part -------------- A non-text attachment was scrubbed... Name: fontmimes.patch Type: text/x-patch Size: 804 bytes Desc: not available URL: From rbock at eudoxos.de Sat Sep 12 20:01:12 2009 From: rbock at eudoxos.de (Roland Bock) Date: Sat, 12 Sep 2009 19:01:12 +0200 Subject: HTML files classified as application/octet-stream and text/plain In-Reply-To: <4A37985B.7020803@eudoxos.de> References: <20090616124244.224325654E@rebar.astron.com> <4A37985B.7020803@eudoxos.de> Message-ID: <4AABD3D8.7090602@eudoxos.de> Hi Christos, never heard back from you about this one. Were you able to reproduce my results? Or did I do something wrong? Regards, Roland Roland Bock wrote: > Christos Zoulas wrote: >> On Jun 16, 8:57am, rbock at eudoxos.de (Roland Bock) wrote: >> -- Subject: HTML files classified as application/octet-stream and >> text/plain >> >> | Hi, >> | | after having had a few minor problems with file-4.21 which comes >> with | Ubuntu-8.04, I upgraded to 5.03. Now I have two problems, an >> old one, | and a new one: >> | | 1) A HTML file with leading blank lines is classified as text/plain: >> | I had the same problem with 4.21. I wonder what I should do? I >> assume | that it is not generally advisable to remove all blank lines >> in a file's | content before handing it over to magic_buffer? >> | | 2) A HTML file (nothing special as far as I can see) is classified >> as | application octet stream: >> | The old version detected text/html >> | Other HTML files are classified correctly (did not test more than >> ten | though). >> | | | What's the best way to proceed? Should I send the files to this >> list? >> >> Yes, send the file to me. >> >> christos > > Files are attached :-) > > This is what I did and got: > > ./src/file --magic-file magic/magic.mgc -i ../file/* > ../file/seed_biz_yahoo_archive.html: application/octet-stream; > charset=binary > ../file/test.html: text/plain; charset=us-ascii > > > > Regards, > > Roland > > > ------------------------------------------------------------------------ > > _______________________________________________ > File mailing list > File at mx.gw.com > http://mx.gw.com/mailman/listinfo/file From quel at quelrod.net Sat Sep 12 23:15:13 2009 From: quel at quelrod.net (James Nobis) Date: Sat, 12 Sep 2009 15:15:13 -0500 Subject: HTML files classified as application/octet-stream and text/plain In-Reply-To: <4AABD3D8.7090602@eudoxos.de> References: <20090616124244.224325654E@rebar.astron.com> <4A37985B.7020803@eudoxos.de> <4AABD3D8.7090602@eudoxos.de> Message-ID: <4AAC0151.5010603@quelrod.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Roland, I ran into similar issues as well. When we upgraded from Debian Etch to Debian Lenny and file went from 4.17 to 4.26 we started having Excel files identify as octet-stream. In 4.17 I found which custom rule in /etc/magic.mime got us application/vnd.ms-excel instead of a blank result but that custom rule in 4.26 made no difference. Also, tried going the other way and removing the one octet-stream difference in /usr/share/file/magic.mime. Tried every variation of adding, removing, regenerating the .mgc, removing the .mgc, etc. without any change. custom rule: 0 string \xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1 >512 string \x09\x08 application/vnd.ms-excel 5.0.3 out of the box with no custom modifications identifies this fine. I backported 5.0.3 from Debian Squeeze and wrote a wrapper to handle that the output from -i changed. James -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iQIcBAEBCgAGBQJKrAFRAAoJEEMnv34Ar9phMNoQAMewEaudX8W3z94PfC3nUAtO peUIjbh8HzwF9bC+hzid+PpqBwvk0l29ZYGR9n6K6smzk/hIIUlNLtzeL1SyOepR MiDbs88vfloZThbhsa44HdA1+ZKBZ/bFNaM87chYlOCX0ADoF6OtZLlR6Bu0PngO E55RpVDhJX0hQmufmwer2dcA43PzraQw4RPga9uZG4+C0NtaKKsEnjX/ZyDqAqLw XpXlh8qVrgtGRFxaRBz2hIOsw3NqiadVZXNk1CL1ulkyOCNtsP0CUeEnhl9lFDf6 mbgJ9OxQEChOhd4adYJTj2UJkFq8c9zvaILNySmA2vVNNvN6KiC+2iFTJvzGgtW1 oF4Wwq7CrVVsUnUQCuK2LKyz5gKFqxhfSfMHlqxLVgf25rCx4ppdUJLM8ZccZfNR Y07hdvfz3P8zD9L3J8R91oVgLHEJ2UVikhF8XoAJ6i17CjenIADhUtDVUHG8I263 2IuuvdK39qWypYoMIlnNCI01ntpHcPl43jkH+U0IbCtEEigfOWYi9Pt4Tl31SSxR 5aeMhOT+l5JH+6FFsMMwBLE8fpQi8P82iYKNz11BtpOYQ+ANtBN96uIly3O1uU50 g5pYmtea4sKcbENbYA0VuV3IbLeyWLk1MITwoJ4sYXvspJJPbfADpKSJyQvLW7x+ qvjjqhmGn522QIHbzm43 =ilvv -----END PGP SIGNATURE----- From rbock at eudoxos.de Sun Sep 13 12:35:40 2009 From: rbock at eudoxos.de (Roland Bock) Date: Sun, 13 Sep 2009 11:35:40 +0200 Subject: HTML files classified as application/octet-stream and text/plain In-Reply-To: <4AAC0151.5010603@quelrod.net> References: <20090616124244.224325654E@rebar.astron.com> <4A37985B.7020803@eudoxos.de> <4AABD3D8.7090602@eudoxos.de> <4AAC0151.5010603@quelrod.net> Message-ID: <4AACBCEC.2030602@eudoxos.de> James, thanks for the info. Since I experience problems with 4.21 and 5.03, maybe I'll try older versions... Currently, I have several hundred thousand files to classify, most of them are html, but several thousand get mis-interpreted by libmagic (or at least my application using libmagic and Ubuntu's file application). I still hope that this is actually just a misconfiguration problem. Regards, Roland James Nobis wrote: > Roland, > > I ran into similar issues as well. When we upgraded from Debian Etch to > Debian Lenny and file went from 4.17 to 4.26 we started having Excel > files identify as octet-stream. > > In 4.17 I found which custom rule in /etc/magic.mime got us > application/vnd.ms-excel instead of a blank result but that custom rule > in 4.26 made no difference. Also, tried going the other way and > removing the one octet-stream difference in /usr/share/file/magic.mime. > Tried every variation of adding, removing, regenerating the .mgc, > removing the .mgc, etc. without any change. > > custom rule: > 0 string \xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1 >> 512 string \x09\x08 application/vnd.ms-excel > > 5.0.3 out of the box with no custom modifications identifies this fine. > I backported 5.0.3 from Debian Squeeze and wrote a wrapper to handle > that the output from -i changed. > > James _______________________________________________ File mailing list File at mx.gw.com http://mx.gw.com/mailman/listinfo/file From gab at no-log.org Mon Sep 14 03:53:02 2009 From: gab at no-log.org (gab) Date: Mon, 14 Sep 2009 02:53:02 +0200 (CEST) Subject: [patch] Fix handling of ~/.magic for python wrapper Message-ID: <34990.AQoGVQwGXno=.1252889582.squirrel@webmail.no-log.org> Hello everyone, I am encountering a discrepancy between the file command and the python magic module: file appends ~/.magic (if it exists) to the default magic file list, while the python wrapping does not. The reason is the handling of ~/.magic is done in the main function of file.c, while the python wrapping directly calls magic_load from libmagic. To refresh the ideas about the handling of magic files, here is the priority (as I understood it from the code, so please, correct me if I am wrong) to select the magic files to use: - retrieve the file(s) from the parameter of the -m option invocation. - if -m was not set, check the MAGIC environment variable. - if neither -m nor MAGIC are used, pick the default magic file set by the configure script. In case of LOAD operation (not for CHECK/COMPILE), also use the ~/.magic file if it exist. As it would be convenient (at least for me ;) to have an homogeneous behavior for the python wrapping and the file command, computing the default magic files to use should be done within some code ending into libmagic. First, I allowed magic_load to handle a NULL parameter for magicfile. The issue with doing that is, in main of file.c, magic_check and magic_compile also start receiving a NULL magicfile parameter. Thus, magic_check and magic_compile also have to be updated to be able to retrieve a default magic path. Still there is an issue (at least in my opinion), because all the magic_* function now have to be updated to do almost exactly the same thing when magicfile is NULL. Actually, as all those functions maps to file_apprentice, providing it an action parameter, it is possible to factor the setting of the default magic path from file_apprentice. This is why the proposed patch does the following: - file.c: - removes the knowledge of default magic path/file from main. - initialize magicfile to NULL - compute file (using a function exported from the library) for the -v option - magic.c: - adding functions (get_magicpath/get_default_magic) to define the path to magic file(s) according to the following algo: if user provided a magic path use it else if MAGIC env variable is set use it else if action is LOAD use the magic path set at configure if ~/.magic is available use it as well else (action is CHECK or COMPILE) use the magic path set at configure time - magic.h: - add prototype for the function computing the magic path (get_magicpath) - apprentice.c: - replace the current behavior when fn is NULL to use The diff files are the result of diff -C 3 from the file 5.03 source code. Please tell me if you would like another kind of diff. I really hope you'll like this patch as it is ;) but I'll be thankful for any comment (on the patch, or on this already too long email!), and willing to update it according to what you think would be the best for the project. >From what I saw in the code, I have further comments for the file interface, but I'll keep them for another round ;) Cheers, -- gab -------------- next part -------------- A non-text attachment was scrubbed... Name: diff_apprentice.c Type: text/x-csrc Size: 636 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: diff_file.c Type: text/x-csrc Size: 2497 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: diff_magic.c Type: text/x-csrc Size: 1197 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: diff_magic.h Type: text/x-chdr Size: 437 bytes Desc: not available URL: From rbock at eudoxos.de Mon Sep 14 07:10:02 2009 From: rbock at eudoxos.de (Roland Bock) Date: Mon, 14 Sep 2009 06:10:02 +0200 Subject: HTML files classified as application/octet-stream and text/plain In-Reply-To: <20090913190107.3B9465654E@rebar.astron.com> References: <20090913190107.3B9465654E@rebar.astron.com> Message-ID: <4AADC21A.5030005@eudoxos.de> Christos, thanks for the patch and the explanation. With internet browsers being as relaxed about standards as they are, I wonder, though, whether a single NUL byte should really disqualify for HTML? Of course, it cannot be a standards conforming HTML file if NUL bytes are included. But since browsers don't care, and show it as HTML anyway, I wonder if "file" should be more strict than they. Just my 2c. Regards, Roland Christos Zoulas wrote: > On Sep 13, 11:24am, rbock at eudoxos.de (Roland Bock) wrote: > -- Subject: Re: HTML files classified as application/octet-stream and text/pl > > | Hi, > | > | after resending the mail, your mail system told me the problem: > | > | 550 Error: Sorry, we do not accept .zip file types. > | > | > | Hmm. Can you take the zip from the mailing list archive? Here's the link: > | > | http://mx.gw.com/pipermail/file/attachments/20090616/8a2f228b/attachment.zip > | > | > | Thanks and regards, > | > | Roland > > For the first file, the following patch fixes the problem. The seed* file, > contains a NUL towards the end, so it does not qualify for text tests that > is why it is not classified as HTML. > > christos > > Index: sgml > =================================================================== > RCS file: /p/file/cvsroot/file/magic/Magdir/sgml,v > retrieving revision 1.21 > diff -u -u -r1.21 sgml > --- sgml 22 Jul 2008 15:59:06 -0000 1.21 > +++ sgml 13 Sep 2009 18:59:54 -0000 > @@ -19,18 +19,18 @@ > # HyperText Markup Language (HTML) is an SGML document type, > # from Daniel Quinlan (quinlan at yggdrasil.com) > # adapted to string extenstions by Anthon van der Neut -0 search/1/cB \ +0 search/400/cB \ !:mime text/html > -0 search/1/cb \ +0 search/400/cb \ !:mime text/html > -0 search/1/cb \ +0 search/400/cb \<title HTML document text > !:mime text/html > -0 search/1/cb \<html HTML document text > +0 search/400/cb \<html HTML document text > !:mime text/html > > # Extensible markup language (XML), a subset of SGML > # from Marc Prud'hommeaux (marc at apocalypse.org) > -0 search/1/cb \<?xml XML document text > +0 search/1/cb \<?xml XML document text > !:mime application/xml > 0 string \<?xml\ version\ " XML > !:mime application/xml From christos at zoulas.com Mon Sep 14 14:19:26 2009 From: christos at zoulas.com (Christos Zoulas) Date: Mon, 14 Sep 2009 07:19:26 -0400 Subject: HTML files classified as application/octet-stream and text/plain In-Reply-To: <4AADC21A.5030005@eudoxos.de> from Roland Bock (Sep 14, 6:10am) Message-ID: <20090914111926.926FB5654E@rebar.astron.com> On Sep 14, 6:10am, rbock at eudoxos.de (Roland Bock) wrote: -- Subject: Re: HTML files classified as application/octet-stream and text/pl | Christos, | | thanks for the patch and the explanation. | | With internet browsers being as relaxed about standards as they are, I | wonder, though, whether a single NUL byte should really disqualify for HTML? | | Of course, it cannot be a standards conforming HTML file if NUL bytes | are included. But since browsers don't care, and show it as HTML anyway, | I wonder if "file" should be more strict than they. | I was thinking about changing this to relax the binary test, but it was not clear to me what to do. Ignore one NUL, all NUL's? How about other chars? It is a slippery slope, and if people produce buggy code, there is not much that can be done without having false positives. christos From rbock at eudoxos.de Mon Sep 14 15:14:44 2009 From: rbock at eudoxos.de (Roland Bock) Date: Mon, 14 Sep 2009 14:14:44 +0200 Subject: HTML files classified as application/octet-stream and text/plain In-Reply-To: <20090914111926.926FB5654E@rebar.astron.com> References: <20090914111926.926FB5654E@rebar.astron.com> Message-ID: <4AAE33B4.9070706@eudoxos.de> Christos Zoulas wrote: > On Sep 14, 6:10am, rbock at eudoxos.de (Roland Bock) wrote: > -- Subject: Re: HTML files classified as application/octet-stream and text/pl > > | Christos, > | > | thanks for the patch and the explanation. > | > | With internet browsers being as relaxed about standards as they are, I > | wonder, though, whether a single NUL byte should really disqualify for HTML? > | > | Of course, it cannot be a standards conforming HTML file if NUL bytes > | are included. But since browsers don't care, and show it as HTML anyway, > | I wonder if "file" should be more strict than they. > | > > I was thinking about changing this to relax the binary test, but it was not > clear to me what to do. Ignore one NUL, all NUL's? How about other chars? > It is a slippery slope, and if people produce buggy code, there is not much > that can be done without having false positives. > > christos Yes, I know its slippery and I'd rather see the browser developers enforcing syntactically correct HTML, but any hope in this direction is in vain, I am afraid. As a result, I have more problems with "false" negatives. HTML is special in this respect, I think, due to the nature of the Internet. I admit not being good at reading magic file yet, so I have to ask: Is NUL currently the only special character which disqualifies a text? In that case, personally, I would ignore all NULs (for HTML). Otherwise I would go for a certain percentage which is allowed, say 1% bogus characters are accepted. Another option might be to let the user define a confidence value, which decides whether a "close match" is OK or not. The default would be the "strict" handling as it is right now, but I could decide that a certain amount of non-matching noise is OK, thereby saying that I accept false positives. Regards, Roland From christos at zoulas.com Mon Sep 14 16:56:00 2009 From: christos at zoulas.com (Christos Zoulas) Date: Mon, 14 Sep 2009 09:56:00 -0400 Subject: HTML files classified as application/octet-stream and text/plain In-Reply-To: <4AAE33B4.9070706@eudoxos.de> from Roland Bock (Sep 14, 2:14pm) Message-ID: <20090914135600.57C645654E@rebar.astron.com> On Sep 14, 2:14pm, rbock at eudoxos.de (Roland Bock) wrote: -- Subject: Re: HTML files classified as application/octet-stream and text/pl | Yes, I know its slippery and I'd rather see the browser developers | enforcing syntactically correct HTML, but any hope in this direction is | in vain, I am afraid. | As a result, I have more problems with "false" negatives. HTML is | special in this respect, I think, due to the nature of the Internet. Well, in this case I can probably get yahoo to fix their problem... | I admit not being good at reading magic file yet, so I have to ask: | Is NUL currently the only special character which disqualifies a text? | In that case, personally, I would ignore all NULs (for HTML). The problem is that at the time you strip the NULs you don't know that you have HTML. | Otherwise I would go for a certain percentage which is allowed, say 1% | bogus characters are accepted. That would lead to a lot of false positives. | Another option might be to let the user define a confidence value, which | decides whether a "close match" is OK or not. The default would be the | "strict" handling as it is right now, but I could decide that a certain | amount of non-matching noise is OK, thereby saying that I accept false | positives. We could special-case the NUL test though, and that might be good enough. christos From rbock at eudoxos.de Mon Sep 14 18:28:40 2009 From: rbock at eudoxos.de (Roland Bock) Date: Mon, 14 Sep 2009 17:28:40 +0200 Subject: HTML files classified as application/octet-stream and text/plain In-Reply-To: <20090914135600.57C645654E@rebar.astron.com> References: <20090914135600.57C645654E@rebar.astron.com> Message-ID: <4AAE6128.5090402@eudoxos.de> Christos Zoulas wrote: > | Yes, I know its slippery and I'd rather see the browser developers > | enforcing syntactically correct HTML, but any hope in this direction is > | in vain, I am afraid. > | As a result, I have more problems with "false" negatives. HTML is > | special in this respect, I think, due to the nature of the Internet. > > Well, in this case I can probably get yahoo to fix their problem... I am not even sure if the problem still persists because it is an older document. > | I admit not being good at reading magic file yet, so I have to ask: > | Is NUL currently the only special character which disqualifies a text? > | In that case, personally, I would ignore all NULs (for HTML). > > The problem is that at the time you strip the NULs you don't know that > you have HTML. I feared so. > | Otherwise I would go for a certain percentage which is allowed, say 1% > | bogus characters are accepted. > > That would lead to a lot of false positives. You certainly have more experience here. What kind of documents would that be? Some kind of binary files with embedded HTML? > | Another option might be to let the user define a confidence value, which > | decides whether a "close match" is OK or not. The default would be the > | "strict" handling as it is right now, but I could decide that a certain > | amount of non-matching noise is OK, thereby saying that I accept false > | positives. > > We could special-case the NUL test though, and that might be good enough. I could run a test with several thousand files and analyze the differences if you send me the appropriate patch for 5.03. My test would be targeted towards "false" negatives, since most of the samples are supposed to be HTML. Regards, Roland From christos at zoulas.com Mon Sep 14 18:48:53 2009 From: christos at zoulas.com (Christos Zoulas) Date: Mon, 14 Sep 2009 11:48:53 -0400 Subject: HTML files classified as application/octet-stream and text/plain In-Reply-To: <4AAE6128.5090402@eudoxos.de> from Roland Bock (Sep 14, 5:28pm) Message-ID: <20090914154853.B98925654E@rebar.astron.com> On Sep 14, 5:28pm, rbock at eudoxos.de (Roland Bock) wrote: -- Subject: Re: HTML files classified as application/octet-stream and text/pl | Christos Zoulas wrote: | > | Yes, I know its slippery and I'd rather see the browser developers | > | enforcing syntactically correct HTML, but any hope in this direction is | > | in vain, I am afraid. | > | As a result, I have more problems with "false" negatives. HTML is | > | special in this respect, I think, due to the nature of the Internet. | > | > Well, in this case I can probably get yahoo to fix their problem... | | I am not even sure if the problem still persists because it is an older | document. Well, anyway I've contacted my friend at yahoo, and he might be sending you mail asking from details :-) They definitely want to fix their problems. | > | I admit not being good at reading magic file yet, so I have to ask: | > | Is NUL currently the only special character which disqualifies a text? | > | In that case, personally, I would ignore all NULs (for HTML). | > | > The problem is that at the time you strip the NULs you don't know that | > you have HTML. | | I feared so. | | > | Otherwise I would go for a certain percentage which is allowed, say 1% | > | bogus characters are accepted. | > | > That would lead to a lot of false positives. | | You certainly have more experience here. What kind of documents would | that be? Some kind of binary files with embedded HTML? Yes, exactly. Well, I guess just excluding NUL's is not too bad in that case. But the issue is that I'd have to re-work the order of the logic and I am not feeling up to it right now :-) | > | Another option might be to let the user define a confidence value, which | > | decides whether a "close match" is OK or not. The default would be the | > | "strict" handling as it is right now, but I could decide that a certain | > | amount of non-matching noise is OK, thereby saying that I accept false | > | positives. | > | > We could special-case the NUL test though, and that might be good enough. | | I could run a test with several thousand files and analyze the | differences if you send me the appropriate patch for 5.03. | | My test would be targeted towards "false" negatives, since most of the | samples are supposed to be HTML. Ok, so you are seeing lots of html files with NUL's in them? All from yahoo? Or there is another parsing bug? christos From rbock at eudoxos.de Mon Sep 14 19:42:33 2009 From: rbock at eudoxos.de (Roland Bock) Date: Mon, 14 Sep 2009 18:42:33 +0200 Subject: HTML files classified as application/octet-stream and text/plain In-Reply-To: <20090914154853.B98925654E@rebar.astron.com> References: <20090914154853.B98925654E@rebar.astron.com> Message-ID: <4AAE7279.3020801@eudoxos.de> Christos Zoulas wrote: > | > | Otherwise I would go for a certain percentage which is allowed, say 1% > | > | bogus characters are accepted. > | > > | > That would lead to a lot of false positives. > | > | You certainly have more experience here. What kind of documents would > | that be? Some kind of binary files with embedded HTML? > > Yes, exactly. Well, I guess just excluding NUL's is not too bad in that > case. But the issue is that I'd have to re-work the order of the logic > and I am not feeling up to it right now :-) No problem. Knowing this, i can prepare my data accordingly :-) > | > | Another option might be to let the user define a confidence value, which > | > | decides whether a "close match" is OK or not. The default would be the > | > | "strict" handling as it is right now, but I could decide that a certain > | > | amount of non-matching noise is OK, thereby saying that I accept false > | > | positives. > | > > | > We could special-case the NUL test though, and that might be good enough. > | > | I could run a test with several thousand files and analyze the > | differences if you send me the appropriate patch for 5.03. > | > | My test would be targeted towards "false" negatives, since most of the > | samples are supposed to be HTML. > > Ok, so you are seeing lots of html files with NUL's in them? All from yahoo? > Or there is another parsing bug? Good questions (seems I was too focused with the first two effects): *) Only a small percentage of documents are from Yahoo. The others come from a large variety of hosts. *) About half of the problematic documents are OK when leading spaces and/or newlines are skipped *) Several are classified as application/octet-stream (I have to check if NUL is always the reason *) There are several classified as c++ or pascal, but I have to check those against 5.03 (most of the time I am using 4.26 which comes with Ubuntu-8.04). The ones I took a closer look at, start with comments, sometimes syntactically invalid comments... *) Some are classified as xml but are in fact xhtml. Thus, the classification is not wrong, but it is not right either... Those are the things from the top of my head. I will try to compile a set of examples. What would be your preferred format? Zip was rejected by your mail server last time... Would tar be OK? Regards, Roland From christos at zoulas.com Mon Sep 14 20:48:27 2009 From: christos at zoulas.com (Christos Zoulas) Date: Mon, 14 Sep 2009 13:48:27 -0400 Subject: [patch] Fix handling of ~/.magic for python wrapper In-Reply-To: <34990.AQoGVQwGXno=.1252889582.squirrel@webmail.no-log.org> from "gab" (Sep 14, 2:53am) Message-ID: <20090914174827.7526F5654E@rebar.astron.com> On Sep 14, 2:53am, gab at no-log.org ("gab") wrote: -- Subject: [patch] Fix handling of ~/.magic for python wrapper | Hello everyone, | | I am encountering a discrepancy between the file command and the python magic | module: file appends ~/.magic (if it exists) to the default magic file list, | while the python wrapping does not. | The reason is the handling of ~/.magic is done in the main function of file.c, | while the python wrapping directly calls magic_load from libmagic. | | To refresh the ideas about the handling of magic files, here is the priority | (as I understood it from the code, so please, correct me if I am wrong) to | select the magic files to use: | - retrieve the file(s) from the parameter of the -m option invocation. | - if -m was not set, check the MAGIC environment variable. | - if neither -m nor MAGIC are used, pick the default magic file set by the | configure script. | In case of LOAD operation (not for CHECK/COMPILE), also use the ~/.magic file | if it exist. | | As it would be convenient (at least for me ;) to have an homogeneous behavior | for the python wrapping and the file command, computing the default magic | files | to use should be done within some code ending into libmagic. | | First, I allowed magic_load to handle a NULL parameter for magicfile. The issue | with doing that is, in main of file.c, magic_check and magic_compile also start | receiving a NULL magicfile parameter. Thus, magic_check and magic_compile also | have to be updated to be able to retrieve a default magic path. | Still there is an issue (at least in my opinion), because all the magic_* | function now have to be updated to do almost exactly the same thing when | magicfile is NULL. | Actually, as all those functions maps to file_apprentice, providing it an action | parameter, it is possible to factor the setting of the default magic path from | file_apprentice. | | This is why the proposed patch does the following: | - file.c: | - removes the knowledge of default magic path/file from main. | - initialize magicfile to NULL | - compute file (using a function exported from the library) for the -v | option | | - magic.c: | - adding functions (get_magicpath/get_default_magic) to define the path to | magic file(s) according to the following algo: | | if user provided a magic path | use it | else | if MAGIC env variable is set | use it | else | if action is LOAD | use the magic path set at configure | if ~/.magic is available | use it as well | else (action is CHECK or COMPILE) | use the magic path set at configure time | | - magic.h: | - add prototype for the function computing the magic path (get_magicpath) | | - apprentice.c: | - replace the current behavior when fn is NULL to use | | The diff files are the result of diff -C 3 from the file 5.03 source code. | Please tell me if you would like another kind of diff. | | I really hope you'll like this patch as it is ;) | but I'll be thankful for any comment (on the patch, or on this already too | long email!), and willing to update it according to what you think would be | the | best for the project. | | >From what I saw in the code, I have further comments for the file interface, | but I'll keep them for another round ;) I committed a change based on yours! Thanks, christos From christos at zoulas.com Mon Sep 14 20:52:54 2009 From: christos at zoulas.com (Christos Zoulas) Date: Mon, 14 Sep 2009 13:52:54 -0400 Subject: HTML files classified as application/octet-stream and text/plain In-Reply-To: <4AAE7279.3020801@eudoxos.de> from Roland Bock (Sep 14, 6:42pm) Message-ID: <20090914175254.C58485654E@rebar.astron.com> On Sep 14, 6:42pm, rbock at eudoxos.de (Roland Bock) wrote: -- Subject: Re: HTML files classified as application/octet-stream and text/pl | Good questions (seems I was too focused with the first two effects): | | *) Only a small percentage of documents are from Yahoo. The others come | from a large variety of hosts. | | *) About half of the problematic documents are OK when leading spaces | and/or newlines are skipped | | *) Several are classified as application/octet-stream (I have to check | if NUL is always the reason | | *) There are several classified as c++ or pascal, but I have to check | those against 5.03 (most of the time I am using 4.26 which comes with | Ubuntu-8.04). The ones I took a closer look at, start with comments, | sometimes syntactically invalid comments... | | *) Some are classified as xml but are in fact xhtml. Thus, the | classification is not wrong, but it is not right either... | | Those are the things from the top of my head. I will try to compile a | set of examples. What would be your preferred format? Zip was rejected | by your mail server last time... Would tar be OK? Posting them to the gw bugs list and sending a link works very well. This way we can keep a log of them in case others need to access them. Thanks, christos From kimmo at suominen.com Mon Sep 14 22:55:38 2009 From: kimmo at suominen.com (Kimmo Suominen) Date: Mon, 14 Sep 2009 22:55:38 +0300 Subject: HTML files classified as application/octet-stream and text/plain In-Reply-To: <20090914175254.C58485654E@rebar.astron.com> References: <4AAE7279.3020801@eudoxos.de> <20090914175254.C58485654E@rebar.astron.com> Message-ID: <b32e77100909141255w3c191babh9b893db3536d0284@mail.gmail.com> On Mon, Sep 14, 2009 at 20:52, Christos Zoulas <christos at zoulas.com> wrote: > Posting them to the gw bugs list and sending a link works very well. > This way we can keep a log of them in case others need to access them. > On that cue, I've created a "file" project on http://bugs.gw.com/ :) I haven't kept current on file(1) development, so I didn't know what versions should be filled in. Christos and I can currently add more versions, if you've got a list handy. I could also add more people with privileges, which could be useful for maintaining the tickets. I don't think CVS integration works anymore from Astron, so maybe that should be looked at as well... (It only works from GW UNIX systems, but CVS has been moved to Astron quite some time ago.) Cheers, + Kim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mx.gw.com/pipermail/file/attachments/20090914/4c95ba75/attachment-0001.html> From christos at zoulas.com Mon Sep 14 23:12:34 2009 From: christos at zoulas.com (Christos Zoulas) Date: Mon, 14 Sep 2009 16:12:34 -0400 Subject: HTML files classified as application/octet-stream and text/plain In-Reply-To: <b32e77100909141255w3c191babh9b893db3536d0284@mail.gmail.com> from Kimmo Suominen (Sep 14, 10:55pm) Message-ID: <20090914201234.8C4605654E@rebar.astron.com> On Sep 14, 10:55pm, kimmo at suominen.com (Kimmo Suominen) wrote: -- Subject: Re: HTML files classified as application/octet-stream and text/pl | On Mon, Sep 14, 2009 at 20:52, Christos Zoulas <christos at zoulas.com> wrote: | | > Posting them to the gw bugs list and sending a link works very well. | > This way we can keep a log of them in case others need to access them. | > | | On that cue, I've created a "file" project on http://bugs.gw.com/ :) I always assumed it existed, good thing you are reading this list Kim :-) | I haven't kept current on file(1) development, so I didn't know what | versions should be filled in. Christos and I can currently add more | versions, if you've got a list handy. I could also add more people with | privileges, which could be useful for maintaining the tickets. I will add some versions. | I don't think CVS integration works anymore from Astron, so maybe that | should be looked at as well... (It only works from GW UNIX systems, but CVS | has been moved to Astron quite some time ago.) Yes, we could look into it... This machine needs to be upgraded badly. It has failing memory and ancient software. I have the new hardware, I just need time. christos From rbock at eudoxos.de Tue Sep 15 00:40:44 2009 From: rbock at eudoxos.de (Roland Bock) Date: Mon, 14 Sep 2009 23:40:44 +0200 Subject: HTML files classified as application/octet-stream and text/plain In-Reply-To: <20090914175254.C58485654E@rebar.astron.com> References: <20090914175254.C58485654E@rebar.astron.com> Message-ID: <4AAEB85C.7010807@eudoxos.de> Christos Zoulas wrote: > On Sep 14, 6:42pm, rbock at eudoxos.de (Roland Bock) wrote: > -- Subject: Re: HTML files classified as application/octet-stream and text/pl > > | Good questions (seems I was too focused with the first two effects): > | > | *) Only a small percentage of documents are from Yahoo. The others come > | from a large variety of hosts. > | > | *) About half of the problematic documents are OK when leading spaces > | and/or newlines are skipped > | > | *) Several are classified as application/octet-stream (I have to check > | if NUL is always the reason > | > | *) There are several classified as c++ or pascal, but I have to check > | those against 5.03 (most of the time I am using 4.26 which comes with > | Ubuntu-8.04). The ones I took a closer look at, start with comments, > | sometimes syntactically invalid comments... > | > | *) Some are classified as xml but are in fact xhtml. Thus, the > | classification is not wrong, but it is not right either... > | > | Those are the things from the top of my head. I will try to compile a > | set of examples. What would be your preferred format? Zip was rejected > | by your mail server last time... Would tar be OK? I posted a gzipped tarball. It currently awaits moderator approval because of its size (600k), oops, about 40 real word examples... If that's too much for this list, we'll have to think of something else :-) Some statistics: I ran mime-type detection in 3 different configurations on 50.000 internet files. Most of them are HTML, some are pdf, some are text, some are trash... I decided to trust text/html and application/pdf. Never saw a false positive there. Classified to be something else were 5302 files with 5.03 1400 files with 5.03 and the patch you sent 1162 files with 5.03, plus the patch and NUL removed The improvements from patch and ignoring NUL are quite good already. The samples in the named tarball are from the last run. About 90% of the 1162 files are HTML (if you ask a browser). The samples have been chosen to represent the different types of files I saw. I know that some of these files will probably never be classified "correctly". But browsers and humans would not hesitate to classify most of these files as HTML, I'd say. Regards, Roland From rbock at eudoxos.de Tue Sep 15 10:49:13 2009 From: rbock at eudoxos.de (Roland Bock) Date: Tue, 15 Sep 2009 09:49:13 +0200 Subject: HTML files classified as application/octet-stream and text/plain In-Reply-To: <4AAEB85C.7010807@eudoxos.de> References: <20090914175254.C58485654E@rebar.astron.com> <4AAEB85C.7010807@eudoxos.de> Message-ID: <4AAF46F9.80002@eudoxos.de> Roland Bock wrote: > I posted a gzipped tarball. It currently awaits moderator approval > because of its size (600k), oops, about 40 real word examples... If > that's too much for this list, we'll have to think of something else :-) OK, it got dismissed for the obvious reason :-) As suggested by the rejection message, I filed a bug report with the attachment here http://bugs.gw.com/view.php?id=89 Regards, Roland From christos at zoulas.com Wed Sep 16 00:19:53 2009 From: christos at zoulas.com (Christos Zoulas) Date: Tue, 15 Sep 2009 17:19:53 -0400 Subject: HTML files classified as application/octet-stream and text/plain In-Reply-To: <4AAF46F9.80002@eudoxos.de> from Roland Bock (Sep 15, 9:49am) Message-ID: <20090915211953.330725654F@rebar.astron.com> On Sep 15, 9:49am, rbock at eudoxos.de (Roland Bock) wrote: -- Subject: Re: HTML files classified as application/octet-stream and text/pl | Roland Bock wrote: | > I posted a gzipped tarball. It currently awaits moderator approval | > because of its size (600k), oops, about 40 real word examples... If | > that's too much for this list, we'll have to think of something else :-) | | OK, it got dismissed for the obvious reason :-) | | As suggested by the rejection message, I filed a bug report with the | attachment here | | http://bugs.gw.com/view.php?id=89 Thanks, will take a look :-) christos From christos at zoulas.com Wed Sep 16 01:33:45 2009 From: christos at zoulas.com (Christos Zoulas) Date: Tue, 15 Sep 2009 18:33:45 -0400 Subject: HTML files classified as application/octet-stream and text/plain In-Reply-To: <20090915211953.330725654F@rebar.astron.com> from Christos Zoulas (Sep 15, 5:19pm) Message-ID: <20090915223345.438C75654E@rebar.astron.com> On Sep 15, 5:19pm, christos at zoulas.com (Christos Zoulas) wrote: -- Subject: Re: HTML files classified as application/octet-stream and text/pl | On Sep 15, 9:49am, rbock at eudoxos.de (Roland Bock) wrote: | -- Subject: Re: HTML files classified as application/octet-stream and text/pl | | | Roland Bock wrote: | | > I posted a gzipped tarball. It currently awaits moderator approval | | > because of its size (600k), oops, about 40 real word examples... If | | > that's too much for this list, we'll have to think of something else :-) | | | | OK, it got dismissed for the obvious reason :-) | | | | As suggested by the rejection message, I filed a bug report with the | | attachment here | | | | http://bugs.gw.com/view.php?id=89 | This patch changes adds /bt [binary|text] to the regex and string tests, and changes b->w and B->W. Search lengths and keywords are added and adjusted. This is just a proposal, that's why no doc or version bump of the format yet. Please let me know what you think. christos Index: magic/Magdir/animation =================================================================== RCS file: /p/file/cvsroot/file/magic/Magdir/animation,v retrieving revision 1.37 diff -u -u -r1.37 animation --- magic/Magdir/animation 9 Jun 2009 21:11:59 -0000 1.37 +++ magic/Magdir/animation 15 Sep 2009 22:31:13 -0000 @@ -29,7 +29,7 @@ #!:mime image/x-quicktime 4 string pckg Apple QuickTime compressed archive !:mime application/x-quicktime-player -4 string/B jP JPEG 2000 image +4 string/W jP JPEG 2000 image !:mime image/jp2 4 string ftyp ISO Media >8 string isom \b, MPEG v4 system, version 1 @@ -41,7 +41,7 @@ !:mime video/mp4 >8 string mp7t \b, MPEG v4 system, MPEG v7 XML >8 string mp7b \b, MPEG v4 system, MPEG v7 binary XML ->8 string/B jp2 \b, JPEG 2000 +>8 string/W jp2 \b, JPEG 2000 !:mime image/jp2 >8 string 3gp \b, MPEG v4 system, 3GPP !:mime video/3gpp @@ -52,13 +52,13 @@ !:mime video/mp4 >8 string avc1 \b, MPEG v4 system, 3GPP JVT AVC !:mime video/3gpp ->8 string/B M4A \b, MPEG v4 system, iTunes AAC-LC +>8 string/W M4A \b, MPEG v4 system, iTunes AAC-LC !:mime audio/mp4 ->8 string/B M4V \b, MPEG v4 system, iTunes AVC-LC +>8 string/W M4V \b, MPEG v4 system, iTunes AVC-LC !:mime video/mp4 ->8 string/B M4P \b, MPEG v4 system, iTunes AES encrypted ->8 string/B M4B \b, MPEG v4 system, iTunes bookmarked ->8 string/B qt \b, Apple QuickTime movie +>8 string/W M4P \b, MPEG v4 system, iTunes AES encrypted +>8 string/W M4B \b, MPEG v4 system, iTunes bookmarked +>8 string/W qt \b, Apple QuickTime movie !:mime video/quicktime # MPEG sequences @@ -720,16 +720,16 @@ 3 string \x0D\x0AVersion:Vivo Vivo video data # VRML (Virtual Reality Modelling Language) -0 string/b #VRML\ V1.0\ ascii VRML 1 file +0 string/w #VRML\ V1.0\ ascii VRML 1 file !:mime model/vrml -0 string/b #VRML\ V2.0\ utf8 ISO/IEC 14772 VRML 97 file +0 string/w #VRML\ V2.0\ utf8 ISO/IEC 14772 VRML 97 file !:mime model/vrml # X3D (Extensible 3D) [http://www.web3d.org/specifications/x3d-3.0.dtd] # From Michel Briand <michelbriand at free.fr> 0 string \<?xml\ version=" !:strength +1 ->20 search/1000/cb \<!DOCTYPE\ X3D X3D (Extensible 3D) model xml text +>20 search/1000/cw \<!DOCTYPE\ X3D X3D (Extensible 3D) model xml text !:mime model/x3d #--------------------------------------------------------------------------- Index: magic/Magdir/cddb =================================================================== RCS file: /p/file/cvsroot/file/magic/Magdir/cddb,v retrieving revision 1.3 diff -u -u -r1.3 cddb --- magic/Magdir/cddb 7 Mar 2008 19:51:02 -0000 1.3 +++ magic/Magdir/cddb 15 Sep 2009 22:31:13 -0000 @@ -7,4 +7,4 @@ # CDDB-enabled CD player applications. # -0 search/1/b #\040xmcd CDDB(tm) format CD text data +0 search/1/w #\040xmcd CDDB(tm) format CD text data Index: magic/Magdir/commands =================================================================== RCS file: /p/file/cvsroot/file/magic/Magdir/commands,v retrieving revision 1.34 diff -u -u -r1.34 commands --- magic/Magdir/commands 8 Apr 2008 12:22:14 -0000 1.34 +++ magic/Magdir/commands 15 Sep 2009 22:31:13 -0000 @@ -3,66 +3,66 @@ # commands: file(1) magic for various shells and interpreters # #0 string : shell archive or script for antique kernel text -0 string/b #!\ /bin/sh POSIX shell script text executable +0 string/w #!\ /win/sh POSIX shell script text executable !:mime text/x-shellscript -0 string/b #!\ /bin/csh C shell script text executable +0 string/w #!\ /win/csh C shell script text executable !:mime text/x-shellscript # korn shell magic, sent by George Wu, gwu at clyde.att.com -0 string/b #!\ /bin/ksh Korn shell script text executable +0 string/w #!\ /win/ksh Korn shell script text executable !:mime text/x-shellscript -0 string/b #!\ /bin/tcsh Tenex C shell script text executable +0 string/w #!\ /win/tcsh Tenex C shell script text executable !:mime text/x-shellscript -0 string/b #!\ /usr/local/tcsh Tenex C shell script text executable +0 string/w #!\ /usr/local/tcsh Tenex C shell script text executable !:mime text/x-shellscript -0 string/b #!\ /usr/local/bin/tcsh Tenex C shell script text executable +0 string/w #!\ /usr/local/win/tcsh Tenex C shell script text executable !:mime text/x-shellscript # # zsh/ash/ae/nawk/gawk magic from cameron at cs.unsw.oz.au (Cameron Simpson) -0 string/b #!\ /bin/zsh Paul Falstad's zsh script text executable +0 string/w #!\ /win/zsh Paul Falstad's zsh script text executable !:mime text/x-shellscript -0 string/b #!\ /usr/bin/zsh Paul Falstad's zsh script text executable +0 string/w #!\ /usr/win/zsh Paul Falstad's zsh script text executable !:mime text/x-shellscript -0 string/b #!\ /usr/local/bin/zsh Paul Falstad's zsh script text executable +0 string/w #!\ /usr/local/win/zsh Paul Falstad's zsh script text executable !:mime text/x-shellscript -0 string/b #!\ /usr/local/bin/ash Neil Brown's ash script text executable +0 string/w #!\ /usr/local/win/ash Neil Brown's ash script text executable !:mime text/x-shellscript -0 string/b #!\ /usr/local/bin/ae Neil Brown's ae script text executable +0 string/w #!\ /usr/local/win/ae Neil Brown's ae script text executable !:mime text/x-shellscript -0 string/b #!\ /bin/nawk new awk script text executable +0 string/w #!\ /win/nawk new awk script text executable !:mime text/x-nawk -0 string/b #!\ /usr/bin/nawk new awk script text executable +0 string/w #!\ /usr/win/nawk new awk script text executable !:mime text/x-nawk -0 string/b #!\ /usr/local/bin/nawk new awk script text executable +0 string/w #!\ /usr/local/win/nawk new awk script text executable !:mime text/x-nawk -0 string/b #!\ /bin/gawk GNU awk script text executable +0 string/w #!\ /win/gawk GNU awk script text executable !:mime text/x-gawk -0 string/b #!\ /usr/bin/gawk GNU awk script text executable +0 string/w #!\ /usr/win/gawk GNU awk script text executable !:mime text/x-gawk -0 string/b #!\ /usr/local/bin/gawk GNU awk script text executable +0 string/w #!\ /usr/local/win/gawk GNU awk script text executable !:mime text/x-gawk # -0 string/b #!\ /bin/awk awk script text executable +0 string/w #!\ /win/awk awk script text executable !:mime text/x-awk -0 string/b #!\ /usr/bin/awk awk script text executable +0 string/w #!\ /usr/win/awk awk script text executable !:mime text/x-awk # update to distinguish from *.vcf files # this is broken because postscript has /EBEGIN{ for example. -#0 search/Bb BEGIN { awk script text +#0 search/Ww BEGIN { awk script text # AT&T Bell Labs' Plan 9 shell -0 string/b #!\ /bin/rc Plan 9 rc shell script text executable +0 string/w #!\ /win/rc Plan 9 rc shell script text executable # bash shell magic, from Peter Tobias (tobias at server.et-inf.fho-emden.de) -0 string/b #!\ /bin/bash Bourne-Again shell script text executable +0 string/w #!\ /win/wash Bourne-Again shell script text executable !:mime text/x-shellscript -0 string/b #!\ /usr/local/bin/bash Bourne-Again shell script text executable +0 string/w #!\ /usr/local/win/wash Bourne-Again shell script text executable !:mime text/x-shellscript # using env -0 string #!/usr/bin/env a +0 string #!/usr/win/env a >15 string >\0 %s script text executable -0 string #!\ /usr/bin/env a +0 string #!\ /usr/win/env a >16 string >\0 %s script text executable # PHP scripts @@ -73,9 +73,9 @@ !:mime text/x-php 0 search/1 =<?\r PHP script text !:mime text/x-php -0 search/1/b #!\ /usr/local/bin/php PHP script text executable +0 search/1/w #!\ /usr/local/win/php PHP script text executable !:mime text/x-php -0 search/1/b #!\ /usr/bin/php PHP script text executable +0 search/1/w #!\ /usr/win/php PHP script text executable !:mime text/x-php 0 string Zend\x00 PHP script Zend Optimizer data Index: magic/Magdir/inform =================================================================== RCS file: /p/file/cvsroot/file/magic/Magdir/inform,v retrieving revision 1.4 diff -u -u -r1.4 inform --- magic/Magdir/inform 7 Mar 2008 19:51:05 -0000 1.4 +++ magic/Magdir/inform 15 Sep 2009 22:31:13 -0000 @@ -5,4 +5,4 @@ # URL: http://www.inform-fiction.org/ # From: Reuben Thomas <rrt at sc3d.org> -0 search/cB/100 constant\ story Inform source text +0 search/100/cW constant\ story Inform source text Index: magic/Magdir/lua =================================================================== RCS file: /p/file/cvsroot/file/magic/Magdir/lua,v retrieving revision 1.4 diff -u -u -r1.4 lua --- magic/Magdir/lua 25 Aug 2008 23:56:32 -0000 1.4 +++ magic/Magdir/lua 15 Sep 2009 22:31:13 -0000 @@ -4,9 +4,9 @@ # From: Reuben Thomas <rrt at sc3d.org>, Seo Sanghyeon <tinuviel at sparcs.kaist.ac.kr> # Lua scripts -0 search/1/b #!\ /usr/bin/lua Lua script text executable +0 search/1/w #!\ /usr/bin/lua Lua script text executable !:mime text/x-lua -0 search/1/b #!\ /usr/local/bin/lua Lua script text executable +0 search/1/w #!\ /usr/local/bin/lua Lua script text executable !:mime text/x-lua 0 search/1 #!/usr/bin/env\ lua Lua script text executable !:mime text/x-lua Index: magic/Magdir/msdos =================================================================== RCS file: /p/file/cvsroot/file/magic/Magdir/msdos,v retrieving revision 1.64 diff -u -u -r1.64 msdos --- magic/Magdir/msdos 12 Dec 2008 21:05:21 -0000 1.64 +++ magic/Magdir/msdos 15 Sep 2009 22:31:13 -0000 @@ -6,13 +6,13 @@ # .BAT files (Daniel Quinlan, quinlan at yggdrasil.com) # updated by Joerg Jenderek at Oct 2008 0 string @ ->1 string/cB \ echo\ off DOS batch file text +>1 string/cW \ echo\ off DOS batch file text !:mime text/x-msdos-batch ->1 string/cB echo\ off DOS batch file text +>1 string/cW echo\ off DOS batch file text !:mime text/x-msdos-batch ->1 string/cB rem\ DOS batch file text +>1 string/cW rem\ DOS batch file text !:mime text/x-msdos-batch ->1 string/cB set\ DOS batch file text +>1 string/cW set\ DOS batch file text !:mime text/x-msdos-batch Index: magic/Magdir/perl =================================================================== RCS file: /p/file/cvsroot/file/magic/Magdir/perl,v retrieving revision 1.15 diff -u -u -r1.15 perl --- magic/Magdir/perl 2 Dec 2008 16:26:17 -0000 1.15 +++ magic/Magdir/perl 15 Sep 2009 22:31:13 -0000 @@ -4,15 +4,15 @@ # The `eval' lines recognizes an outrageously clever hack. # Keith Waclena <keith at cerberus.uchicago.edu> # Send additions to <perl5-porters at perl.org> -0 search/1/b #!\ /bin/perl Perl script text executable +0 search/1/w #!\ /bin/perl Perl script text executable !:mime text/x-perl 0 search/1 eval\ "exec\ /bin/perl Perl script text !:mime text/x-perl -0 search/1/b #!\ /usr/bin/perl Perl script text executable +0 search/1/w #!\ /usr/bin/perl Perl script text executable !:mime text/x-perl 0 search/1 eval\ "exec\ /usr/bin/perl Perl script text !:mime text/x-perl -0 search/1/b #!\ /usr/local/bin/perl Perl script text executable +0 search/1/w #!\ /usr/local/bin/perl Perl script text executable !:mime text/x-perl 0 search/1 eval\ "exec\ /usr/local/bin/perl Perl script text !:mime text/x-perl @@ -33,12 +33,12 @@ # Perl POD documents # From: Tom Hukins <tom at eborcom.com> -0 search/1/B \=pod\n Perl POD document text -0 search/1/B \n\=pod\n Perl POD document text -0 search/1/B \=head1\ Perl POD document text -0 search/1/B \n\=head1\ Perl POD document text -0 search/1/B \=head2\ Perl POD document text -0 search/1/B \n\=head2\ Perl POD document text +0 search/1/W \=pod\n Perl POD document text +0 search/1/W \n\=pod\n Perl POD document text +0 search/1/W \=head1\ Perl POD document text +0 search/1/W \n\=head1\ Perl POD document text +0 search/1/W \=head2\ Perl POD document text +0 search/1/W \n\=head2\ Perl POD document text # Perl Storable data files. 0 string perl-store perl Storable (v0.6) data Index: magic/Magdir/python =================================================================== RCS file: /p/file/cvsroot/file/magic/Magdir/python,v retrieving revision 1.10 diff -u -u -r1.10 python --- magic/Magdir/python 27 May 2009 22:25:48 -0000 1.10 +++ magic/Magdir/python 15 Sep 2009 22:31:13 -0000 @@ -16,9 +16,9 @@ 0 belong 0xd1f20d0a python 2.6 byte-compiled -0 search/1/b #!\ /usr/bin/python Python script text executable +0 search/1/w #!\ /usr/bin/python Python script text executable !:mime text/x-python -0 search/1/b #!\ /usr/local/bin/python Python script text executable +0 search/1/w #!\ /usr/local/bin/python Python script text executable !:mime text/x-python 0 search/1 #!/usr/bin/env\ python Python script text executable !:mime text/x-python Index: magic/Magdir/ruby =================================================================== RCS file: /p/file/cvsroot/file/magic/Magdir/ruby,v retrieving revision 1.2 diff -u -u -r1.2 ruby --- magic/Magdir/ruby 27 May 2009 22:25:48 -0000 1.2 +++ magic/Magdir/ruby 15 Sep 2009 22:31:13 -0000 @@ -4,9 +4,9 @@ # From: Reuben Thomas <rrt at sc3d.org> # Ruby scripts -0 search/1/b #!\ /usr/bin/ruby Ruby script text executable +0 search/1/w #!\ /usr/bin/ruby Ruby script text executable !:mime text/x-ruby -0 search/1/b #!\ /usr/local/bin/ruby Ruby script text executable +0 search/1/w #!\ /usr/local/bin/ruby Ruby script text executable !:mime text/x-ruby 0 search/1 #!/usr/bin/env\ ruby Ruby script text executable !:mime text/x-ruby Index: magic/Magdir/sgml =================================================================== RCS file: /p/file/cvsroot/file/magic/Magdir/sgml,v retrieving revision 1.22 diff -u -u -r1.22 sgml --- magic/Magdir/sgml 13 Sep 2009 19:02:03 -0000 1.22 +++ magic/Magdir/sgml 15 Sep 2009 22:31:13 -0000 @@ -3,15 +3,15 @@ # From: Noel Torres <tecnico at ejerciciosresueltos.com> 0 string \<?xml\ version=" >15 string >\0 ->>23 search/400 \<svg SVG Scalable Vector Graphics image +>>23 search/2048 \<svg SVG Scalable Vector Graphics image !:mime image/svg+xml ->>23 search/400 \<gnc-v2 GnuCash file +>>23 search/2048 \<gnc-v2 GnuCash file !:mime application/x-gnucash # Sitemap file 0 string \<?xml\ version=" >15 string >\0 ->>23 search/400 \<urlset XML Sitemap document text +>>23 search/2048 \<urlset XML Sitemap document text !:mime application/xml-sitemap #------------------------------------------------------------------------------ @@ -19,18 +19,24 @@ # HyperText Markup Language (HTML) is an SGML document type, # from Daniel Quinlan (quinlan at yggdrasil.com) # adapted to string extenstions by Anthon van der Neut <anthon at mnt.org) -0 search/400/cB \<!doctype\ html HTML document text +0 search/2048/cWbt \<!doctype\ html HTML document text !:mime text/html -0 search/400/cb \<head HTML document text +0 search/2048/cwbt \<head HTML document text !:mime text/html -0 search/400/cb \<title HTML document text +0 search/2048/cwbt \<title HTML document text !:mime text/html -0 search/400/cb \<html HTML document text +0 search/2048/cwbt \<html HTML document text +!:mime text/html +0 search/2048/cwbt \<script HTML document text +!:mime text/html +0 search/2048/cwbt \<style HTML document text +!:mime text/html +0 search/2048/cwbt \<table HTML document text !:mime text/html # Extensible markup language (XML), a subset of SGML # from Marc Prud'hommeaux (marc at apocalypse.org) -0 search/1/cb \<?xml XML document text +0 search/1/cwbt \<?xml XML document text !:mime application/xml 0 string \<?xml\ version\ " XML !:mime application/xml @@ -44,16 +50,16 @@ >15 search/1 >\0 %.3s document text >>23 search/1 \<xsl:stylesheet (XSL stylesheet) >>24 search/1 \<xsl:stylesheet (XSL stylesheet) -0 search/1/b \<?xml XML document text +0 search/1/wbt \<?xml XML document text !:mime application/xml -0 search/1/b \<?XML broken XML document text +0 search/1/wbt \<?XML broken XML document text !:mime application/xml # SGML, mostly from rph at sq -0 search/1/cb \<!doctype exported SGML document text -0 search/1/cb \<!subdoc exported SGML subdocument text -0 search/1/cb \<!-- exported SGML document text +0 search/2048/cwbt \<!doctype exported SGML document text +0 search/2048/cwbt \<!subdoc exported SGML subdocument text +0 search/2048/cwbt \<!-- exported SGML document text # Web browser cookie files # (Mozilla, Galeon, Netscape 4, Konqueror..) Index: src/apprentice.c =================================================================== RCS file: /p/file/cvsroot/file/src/apprentice.c,v retrieving revision 1.157 diff -u -u -r1.157 apprentice.c --- src/apprentice.c 14 Sep 2009 17:50:38 -0000 1.157 +++ src/apprentice.c 15 Sep 2009 22:31:13 -0000 @@ -592,10 +592,21 @@ break; case FILE_REGEX: case FILE_SEARCH: + /* Check for override */ + if (mstart->str_flags & STRING_BINTEST) + mstart->flag |= BINTEST; + if (mstart->str_flags & STRING_TEXTTEST) + mstart->flag |= TEXTTEST; + + if (mstart->flag & (TEXTTEST|BINTEST)) + break; + /* binary test if pattern is not text */ if (file_looks_utf8(m->value.us, (size_t)m->vallen, NULL, NULL) <= 0) mstart->flag |= BINTEST; + else + mstart->flag |= TEXTTEST; break; case FILE_DEFAULT: /* can't deduce anything; we shouldn't see this at the @@ -949,14 +960,14 @@ } break; case FILE_REGEX: - if ((m->str_flags & STRING_COMPACT_BLANK) != 0) { + if ((m->str_flags & STRING_COMPACT_WHITESPACE) != 0) { file_magwarn(ms, "'/%c' not allowed on regex\n", - CHAR_COMPACT_BLANK); + CHAR_COMPACT_WHITESPACE); return -1; } - if ((m->str_flags & STRING_COMPACT_OPTIONAL_BLANK) != 0) { + if ((m->str_flags & STRING_COMPACT_OPTIONAL_WHITESPACE) != 0) { file_magwarn(ms, "'/%c' not allowed on regex\n", - CHAR_COMPACT_OPTIONAL_BLANK); + CHAR_COMPACT_OPTIONAL_WHITESPACE); return -1; } break; @@ -1329,12 +1340,12 @@ "zero range"); l = t - 1; break; - case CHAR_COMPACT_BLANK: - m->str_flags |= STRING_COMPACT_BLANK; + case CHAR_COMPACT_WHITESPACE: + m->str_flags |= STRING_COMPACT_WHITESPACE; break; - case CHAR_COMPACT_OPTIONAL_BLANK: + case CHAR_COMPACT_OPTIONAL_WHITESPACE: m->str_flags |= - STRING_COMPACT_OPTIONAL_BLANK; + STRING_COMPACT_OPTIONAL_WHITESPACE; break; case CHAR_IGNORE_LOWERCASE: m->str_flags |= STRING_IGNORE_LOWERCASE; @@ -1345,6 +1356,12 @@ case CHAR_REGEX_OFFSET_START: m->str_flags |= REGEX_OFFSET_START; break; + case CHAR_BINTEST: + m->str_flags |= STRING_BINTEST; + break; + case CHAR_TEXTTEST: + m->str_flags |= STRING_TEXTTEST; + break; default: if (ms->flags & MAGIC_CHECK) file_magwarn(ms, Index: src/file.h =================================================================== RCS file: /p/file/cvsroot/file/src/file.h,v retrieving revision 1.122 diff -u -u -r1.122 file.h --- src/file.h 15 Jul 2009 15:16:52 -0000 1.122 +++ src/file.h 15 Sep 2009 22:31:13 -0000 @@ -139,7 +139,7 @@ #define NOSPACE 0x10 /* suppress space character before output */ #define BINTEST 0x20 /* test is for a binary type (set only for top-level tests) */ -#define TEXTTEST 0 /* for passing to file_softmagic */ +#define TEXTTEST 0x40 /* for passing to file_softmagic */ uint8_t factor; @@ -274,16 +274,20 @@ }; #define BIT(A) (1 << (A)) -#define STRING_COMPACT_BLANK BIT(0) -#define STRING_COMPACT_OPTIONAL_BLANK BIT(1) -#define STRING_IGNORE_LOWERCASE BIT(2) -#define STRING_IGNORE_UPPERCASE BIT(3) -#define REGEX_OFFSET_START BIT(4) -#define CHAR_COMPACT_BLANK 'B' -#define CHAR_COMPACT_OPTIONAL_BLANK 'b' -#define CHAR_IGNORE_LOWERCASE 'c' -#define CHAR_IGNORE_UPPERCASE 'C' -#define CHAR_REGEX_OFFSET_START 's' +#define STRING_COMPACT_WHITESPACE BIT(0) +#define STRING_COMPACT_OPTIONAL_WHITESPACE BIT(1) +#define STRING_IGNORE_LOWERCASE BIT(2) +#define STRING_IGNORE_UPPERCASE BIT(3) +#define REGEX_OFFSET_START BIT(4) +#define STRING_TEXTTEST BIT(5) +#define STRING_BINTEST BIT(6) +#define CHAR_COMPACT_WHITESPACE 'W' +#define CHAR_COMPACT_OPTIONAL_WHITESPACE 'w' +#define CHAR_IGNORE_LOWERCASE 'c' +#define CHAR_IGNORE_UPPERCASE 'C' +#define CHAR_REGEX_OFFSET_START 's' +#define CHAR_TEXTTEST 't' +#define CHAR_BINTEST 'b' #define STRING_IGNORE_CASE (STRING_IGNORE_LOWERCASE|STRING_IGNORE_UPPERCASE) #define STRING_DEFAULT_RANGE 100 Index: src/print.c =================================================================== RCS file: /p/file/cvsroot/file/src/print.c,v retrieving revision 1.66 diff -u -u -r1.66 print.c --- src/print.c 3 Feb 2009 20:27:51 -0000 1.66 +++ src/print.c 15 Sep 2009 22:31:13 -0000 @@ -77,10 +77,10 @@ if (IS_STRING(m->type)) { if (m->str_flags) { (void) fputc('/', stderr); - if (m->str_flags & STRING_COMPACT_BLANK) - (void) fputc(CHAR_COMPACT_BLANK, stderr); - if (m->str_flags & STRING_COMPACT_OPTIONAL_BLANK) - (void) fputc(CHAR_COMPACT_OPTIONAL_BLANK, + if (m->str_flags & STRING_COMPACT_WHITESPACE) + (void) fputc(CHAR_COMPACT_WHITESPACE, stderr); + if (m->str_flags & STRING_COMPACT_OPTIONAL_WHITESPACE) + (void) fputc(CHAR_COMPACT_OPTIONAL_WHITESPACE, stderr); if (m->str_flags & STRING_IGNORE_LOWERCASE) (void) fputc(CHAR_IGNORE_LOWERCASE, stderr); Index: src/softmagic.c =================================================================== RCS file: /p/file/cvsroot/file/src/softmagic.c,v retrieving revision 1.137 diff -u -u -r1.137 softmagic.c --- src/softmagic.c 8 May 2009 23:25:46 -0000 1.137 +++ src/softmagic.c 15 Sep 2009 22:31:13 -0000 @@ -123,7 +123,7 @@ int flush = 0; struct magic *m = &magic[magindex]; - if ((m->flag & BINTEST) != mode) { + if ((m->flag & mode) != mode) { /* Skip sub-tests */ while (magic[magindex + 1].cont_level != 0 && ++magindex < nmagic) @@ -1636,7 +1636,7 @@ if ((v = toupper(*b++) - *a++) != '\0') break; } - else if ((flags & STRING_COMPACT_BLANK) && + else if ((flags & STRING_COMPACT_WHITESPACE) && isspace(*a)) { a++; if (isspace(*b++)) { @@ -1648,7 +1648,7 @@ break; } } - else if ((flags & STRING_COMPACT_OPTIONAL_BLANK) && + else if ((flags & STRING_COMPACT_OPTIONAL_WHITESPACE) && isspace(*a)) { a++; while (isspace(*b)) From rbock at eudoxos.de Wed Sep 16 11:23:51 2009 From: rbock at eudoxos.de (Roland Bock) Date: Wed, 16 Sep 2009 10:23:51 +0200 Subject: HTML files classified as application/octet-stream and text/plain In-Reply-To: <20090915223345.438C75654E@rebar.astron.com> References: <20090915223345.438C75654E@rebar.astron.com> Message-ID: <4AB0A097.3000006@eudoxos.de> Christos Zoulas wrote: > On Sep 15, 5:19pm, christos at zoulas.com (Christos Zoulas) wrote: > -- Subject: Re: HTML files classified as application/octet-stream and text/pl > > | On Sep 15, 9:49am, rbock at eudoxos.de (Roland Bock) wrote: > | -- Subject: Re: HTML files classified as application/octet-stream and text/pl > | > | | Roland Bock wrote: > | | > I posted a gzipped tarball. It currently awaits moderator approval > | | > because of its size (600k), oops, about 40 real word examples... If > | | > that's too much for this list, we'll have to think of something else :-) > | | > | | OK, it got dismissed for the obvious reason :-) > | | > | | As suggested by the rejection message, I filed a bug report with the > | | attachment here > | | > | | http://bugs.gw.com/view.php?id=89 > | > > This patch changes adds /bt [binary|text] to the regex and string tests, > and changes b->w and B->W. Search lengths and keywords are added and adjusted. > This is just a proposal, that's why no doc or version bump of the format > yet. Please let me know what you think. > > christos Wow, that was fast! Bad luck: The machine I use for the tests is down for maintenance today, so I won't be able to do any tests before this evening. Before that: Could you send me a patch based on the original 5.03 release? The patch program did not know how to handle python and ruby and for some reason compilation failed for animation: ../src/file -C -m ../magic/Magdir ../magic/Magdir/animation, 32: Warning: string extension `B' invalid ../magic/Magdir/animation, 33: Warning: Current entry already has a MIME type `application/x-quicktime-player', new type ` image/jp2' ../magic/Magdir/animation, 44: Warning: string extension `B' invalid ../magic/Magdir/animation, 55: Warning: string extension `B' invalid ../magic/Magdir/animation, 57: Warning: string extension `B' invalid ../magic/Magdir/animation, 59: Warning: string extension `B' invalid ../magic/Magdir/animation, 60: Warning: string extension `B' invalid ../magic/Magdir/animation, 61: Warning: string extension `B' invalid Alternatively, I can remove animation from the Makefile and use the original python/ruby files, of course. Regards, Roland From christos at zoulas.com Wed Sep 16 15:48:25 2009 From: christos at zoulas.com (Christos Zoulas) Date: Wed, 16 Sep 2009 08:48:25 -0400 Subject: HTML files classified as application/octet-stream and text/plain In-Reply-To: <4AB0A097.3000006@eudoxos.de> from Roland Bock (Sep 16, 10:23am) Message-ID: <20090916124825.E09D75654E@rebar.astron.com> On Sep 16, 10:23am, rbock at eudoxos.de (Roland Bock) wrote: -- Subject: Re: HTML files classified as application/octet-stream and text/pl | Christos Zoulas wrote: | > On Sep 15, 5:19pm, christos at zoulas.com (Christos Zoulas) wrote: | > -- Subject: Re: HTML files classified as application/octet-stream and text/pl | > | > | On Sep 15, 9:49am, rbock at eudoxos.de (Roland Bock) wrote: | > | -- Subject: Re: HTML files classified as application/octet-stream and text/pl | > | | > | | Roland Bock wrote: | > | | > I posted a gzipped tarball. It currently awaits moderator approval | > | | > because of its size (600k), oops, about 40 real word examples... If | > | | > that's too much for this list, we'll have to think of something else :-) | > | | | > | | OK, it got dismissed for the obvious reason :-) | > | | | > | | As suggested by the rejection message, I filed a bug report with the | > | | attachment here | > | | | > | | http://bugs.gw.com/view.php?id=89 | > | | > | > This patch changes adds /bt [binary|text] to the regex and string tests, | > and changes b->w and B->W. Search lengths and keywords are added and adjusted. | > This is just a proposal, that's why no doc or version bump of the format | > yet. Please let me know what you think. | > | > christos | | Wow, that was fast! Bad luck: The machine I use for the tests is down | for maintenance today, so I won't be able to do any tests before this | evening. | | Before that: | Could you send me a patch based on the original 5.03 release? The patch | program did not know how to handle python and ruby and for some reason | compilation failed for animation: | | ../src/file -C -m ../magic/Magdir | ../magic/Magdir/animation, 32: Warning: string extension `B' invalid | ../magic/Magdir/animation, 33: Warning: Current entry already has a MIME | type `application/x-quicktime-player', new type ` image/jp2' | ../magic/Magdir/animation, 44: Warning: string extension `B' invalid | ../magic/Magdir/animation, 55: Warning: string extension `B' invalid | ../magic/Magdir/animation, 57: Warning: string extension `B' invalid | ../magic/Magdir/animation, 59: Warning: string extension `B' invalid | ../magic/Magdir/animation, 60: Warning: string extension `B' invalid | ../magic/Magdir/animation, 61: Warning: string extension `B' invalid | | | Alternatively, I can remove animation from the Makefile and use the | original python/ruby files, of course. Change the B to a W. And you should use file -e tokens. I will make a tar file for you once I've committed the changes. christos From rbock at eudoxos.de Wed Sep 16 19:35:19 2009 From: rbock at eudoxos.de (Roland Bock) Date: Wed, 16 Sep 2009 18:35:19 +0200 Subject: HTML files classified as application/octet-stream and text/plain In-Reply-To: <20090916124825.E09D75654E@rebar.astron.com> References: <20090916124825.E09D75654E@rebar.astron.com> Message-ID: <4AB113C7.4020708@eudoxos.de> Christos Zoulas wrote: > On Sep 16, 10:23am, rbock at eudoxos.de (Roland Bock) wrote: [...] > Change the B to a W. And you should use file -e tokens. I will make a tar > file for you once I've committed the changes. > > christos Those changes are very cool. The got rid of all the c, c++, pascal classifications. I am left with 736 files which are not classified as html. The two main fractions are: text/plain: 270 appication/xml: 476 There are some other formats detected like message/rfc822, gzip, application/mac-binhex40 which look OK (definitely not HTML :-) ) One application/octet-stream remains which looks like clean HTML with the exception that there is no html/head/body tag. When I call grep -l -i '<html' textPlain/* applicationXml/* I get nothing but HTML files (which partially has to do with the fact that most of the files are HTML, of course). text/plain: 61 application/xml: 373 The latter samples are xhtml files, so application/xml is not wrong, but application/xhtml+xml would be more specific (and preferable from, my point of view). How shall we proceed? I could add more samples to the ticket, of course. BTW: I am not using the file command but my own software which calls magic_open like this: magic_open(MAGIC_MIME | MAGIC_CHECK | MAGIC_NO_CHECK_TOKENS); The MAGIC_NO_CHECK_TOKENS does not seem to have any influence in my tests. Any other flags you want me to set? Regards, Roland From christos at zoulas.com Wed Sep 16 22:17:08 2009 From: christos at zoulas.com (Christos Zoulas) Date: Wed, 16 Sep 2009 15:17:08 -0400 Subject: HTML files classified as application/octet-stream and text/plain In-Reply-To: <4AB113C7.4020708@eudoxos.de> from Roland Bock (Sep 16, 6:35pm) Message-ID: <20090916191708.433EF5654E@rebar.astron.com> On Sep 16, 6:35pm, rbock at eudoxos.de (Roland Bock) wrote: -- Subject: Re: HTML files classified as application/octet-stream and text/pl | Those changes are very cool. The got rid of all the c, c++, pascal | classifications. I am left with 736 files which are not classified as | html. The two main fractions are: | | text/plain: 270 | appication/xml: 476 | | There are some other formats detected like message/rfc822, gzip, | application/mac-binhex40 which look OK (definitely not HTML :-) ) | | One application/octet-stream remains which looks like clean HTML with | the exception that there is no html/head/body tag. | | When I call | | grep -l -i '<html' textPlain/* applicationXml/* | | I get nothing but HTML files (which partially has to do with the fact | that most of the files are HTML, of course). | | text/plain: 61 | application/xml: 373 | | The latter samples are xhtml files, so application/xml is not wrong, but | application/xhtml+xml would be more specific (and preferable from, my | point of view). | | How shall we proceed? I could add more samples to the ticket, of course. Create two zip files one with all xhtml files and the other one with the text plain files and post them to the ticket... | BTW: I am not using the file command but my own software which calls | magic_open like this: | | magic_open(MAGIC_MIME | MAGIC_CHECK | MAGIC_NO_CHECK_TOKENS); | | The MAGIC_NO_CHECK_TOKENS does not seem to have any influence in my tests. | | Any other flags you want me to set? No that's fine... christos From gab at no-log.org Wed Sep 16 23:52:56 2009 From: gab at no-log.org (gab) Date: Wed, 16 Sep 2009 22:52:56 +0200 (CEST) Subject: [patch] Fix handling of ~/.magic for python wrapper In-Reply-To: <20090914174827.7526F5654E@rebar.astron.com> References: <20090914174827.7526F5654E@rebar.astron.com> Message-ID: <49127.AQoGVQwGXno=.1253134376.squirrel@webmail.no-log.org> Christos Zoulas wrote: > On Sep 14, 2:53am, gab at no-log.org ("gab") wrote: > | From what I saw in the code, I have further comments for the file > | interface, but I'll keep them for another round ;) > > I committed a change based on yours! Cool =) I hope you didn't have too much cleaning to do to merge it. By the way, I saw no reference to a public source repository. Is there one available ? As hinted in my previous email, there are some other concerns about the file interface that I'd like to give some more attention: 1/ sensitivity to command line options order As some processing is done while the command line is being parsed, the result of the process sometimes may depend on the order of the parameters when invoking file. Here is a explicit example: $ file -v -m /usr/share/misc/magic file-5.03 magic file from /etc/magic:/usr/share/misc/magic $ file -m /usr/share/misc/magic -v file-5.03 magic file from /usr/share/misc/magic Another side effect of processing some options within the getopt loop is that the behavior for programs directly using the library (say the python wrapper ;) is different from the behavior of the command line (which I find a bit upsetting) I'd like to remove all the processing outside of the getopt loop, so the parsing and interpretation of the options are completely separate. When possible, I also would like to try and move some of the code of main inside the library. The underlying idea is to try to provide as many features of file as possible to the external programs using the magic library. 2/ a new option to append some magic file to the default ones ? Maybe it could be interesting to add a new flag option modifying the handling of the -m option. Without this flag, the file command would use the magic files provided by the -m option. With the flag set, it would append the argument of the -m option to the default magic file. 3/ what about python ? For the command line, even if a bit inconvenient, it is possible to retrieve the default magic files (with file -v) and use them to build a -m option with some personal magic files. But I see no straightforward way to reach the same result in python. I have not really thought of how this could be done yet, as it probably depends on what is doable regarding the first two points. To conclude, those points matter enough for me that I am willing to spend some time on them. As this is a bit less anecdotal than my first patch, I'd be grateful to have some feedback from the list, (pointing out stupid ideas and/or providing guidance) before starting working on this. Regards, -- gab From christos at zoulas.com Thu Sep 17 15:42:36 2009 From: christos at zoulas.com (Christos Zoulas) Date: Thu, 17 Sep 2009 08:42:36 -0400 Subject: [patch] Fix handling of ~/.magic for python wrapper In-Reply-To: <49127.AQoGVQwGXno=.1253134376.squirrel@webmail.no-log.org> from "gab" (Sep 16, 10:52pm) Message-ID: <20090917124236.E0D7D5654E@rebar.astron.com> On Sep 16, 10:52pm, gab at no-log.org ("gab") wrote: -- Subject: Re: [patch] Fix handling of ~/.magic for python wrapper | Cool =) | I hope you didn't have too much cleaning to do to merge it. | By the way, I saw no reference to a public source repository. Is there one | available ? Not much cleaning needed and no public repository yet. Hopefully now that kim is back, we'll add some new hardware and make a public repo. | As hinted in my previous email, there are some other concerns about the | file interface that I'd like to give some more attention: | | 1/ sensitivity to command line options order | As some processing is done while the command line is being parsed, | the result of the process sometimes may depend on the order of the | parameters when invoking file. Here is a explicit example: | | $ file -v -m /usr/share/misc/magic | file-5.03 | magic file from /etc/magic:/usr/share/misc/magic | | $ file -m /usr/share/misc/magic -v | file-5.03 | magic file from /usr/share/misc/magic | | Another side effect of processing some options within the getopt loop is that | the behavior for programs directly using the library (say the python wrapper ;) | is different from the behavior of the command line (which I find a bit | upsetting) | I'd like to remove all the processing outside of the getopt loop, so the | parsing and interpretation of the options are completely separate. | When possible, I also would like to try and move some of the code of | main inside the library. | The underlying idea is to try to provide as many features of file as possible | to the external programs using the magic library. That is fine. | 2/ a new option to append some magic file to the default ones ? | | Maybe it could be interesting to add a new flag option modifying the handling | of the -m option. Without this flag, the file command would use the magic files | provided by the -m option. With the flag set, it would append the argument of | the -m option to the default magic file. I think that there is a way to do this in the POSIX spec for file. If there is, we should follow how it is done there. | | 3/ what about python ? | | For the command line, even if a bit inconvenient, it is possible to retrieve | the default magic files (with file -v) and use them to build a -m option | with some personal magic files. | But I see no straightforward way to reach the same result in python. | I have not really thought of how this could be done yet, as it probably | depends on what is doable regarding the first two points. | | To conclude, those points matter enough for me that I am willing to spend | some time on them. As this is a bit less anecdotal than my first patch, | I'd be grateful to have some feedback from the list, (pointing out stupid | ideas and/or providing guidance) before starting working on this. You'd need to add some more public functions to the python api to do that. christos From rbock at eudoxos.de Mon Sep 21 08:20:16 2009 From: rbock at eudoxos.de (Roland Bock) Date: Mon, 21 Sep 2009 07:20:16 +0200 Subject: HTML files classified as application/octet-stream and text/plain In-Reply-To: <20090916191708.433EF5654E@rebar.astron.com> References: <20090916191708.433EF5654E@rebar.astron.com> Message-ID: <4AB70D10.7060108@eudoxos.de> For those who might have followed this thread: We continued the exchange of data for a bit outside the list. The final results are phenomenal, many thanks to Christos! With the original 5.03 release of file, I had about 5.000 mis-classifications in 50.000 web documents. After applying the attached patch (which sums up the changes discussed in this thread), I am down to zero(!!) mis-classifications when using a two-staged approach: 1) Each document is checked with the MAGIC_NO_CHECK_TOKENS flag set. 2) If the document is determined to be text/plain, it is checked again with MAGIC_NO_CHECK_TOKENS not set. Any text/*** result is considered valid. Otherwise text/plain is assumed. It will take some time for Christos to make a formal release. Until then, it would be cool if many of you could check for any degradation of accuracy due to the attached patch. Personally, I love the results so far, but it would not be good to have perfect classification of my collection of web documents if other formats suffered :-) Thanks and regards, Roland -------------- next part -------------- A non-text attachment was scrubbed... Name: file.patch Type: text/x-diff Size: 26115 bytes Desc: not available URL: <http://mx.gw.com/pipermail/file/attachments/20090921/6e0867be/attachment.bin> From mtmkls at freemail.hu Sun Sep 27 20:54:37 2009 From: mtmkls at freemail.hu (Mate Miklos) Date: Sun, 27 Sep 2009 19:54:37 +0200 Subject: some mime detection magic Message-ID: <200909271954.37479.mtmkls@freemail.hu> Hi, These changes improved the detection for me, but they might not be 100% correct. Some details: - I'm not entirely sure the mpeg4-generic is the correct IANA mime type for that paricular container format, but it seemed like a good candidate. Maybe someone on this list knows it better than me... - The DjVu in its original state didn't work - The matroska one might be audio/matroska as well, but the check is not sophisticated enough to distinguish them. Audio only matroska files are rare though... MM -- #!/usr/bin/perl -l m,[A-Z],?@;:@?=(@;=>$_)for(split$,,[A-Z]);push@@,lc(pack(q=c*==> ord($;[$[+1])-chr(ord@;)));$_?${@}[++$#{@}]:$,=chr(ord($@[$^W])- ($#{y-++--;qq,;,}>>$_))for(0,1);$_.=join$",@@,ke and print split -------------- next part -------------- A non-text attachment was scrubbed... Name: filemime.diff Type: text/x-patch Size: 2223 bytes Desc: not available URL: <http://mx.gw.com/pipermail/file/attachments/20090927/ce6eda62/attachment.bin> From christos at zoulas.com Sun Sep 27 22:03:46 2009 From: christos at zoulas.com (Christos Zoulas) Date: Sun, 27 Sep 2009 15:03:46 -0400 Subject: some mime detection magic In-Reply-To: <200909271954.37479.mtmkls@freemail.hu> from Mate Miklos (Sep 27, 7:54pm) Message-ID: <20090927190346.6814C5654E@rebar.astron.com> On Sep 27, 7:54pm, mtmkls at freemail.hu (Mate Miklos) wrote: -- Subject: some mime detection magic | Hi, | | These changes improved the detection for me, but they might not be 100% | correct. | | Some details: | - I'm not entirely sure the mpeg4-generic is the correct IANA mime type for | that paricular container format, but it seemed like a good candidate. Maybe | someone on this list knows it better than me... | - The DjVu in its original state didn't work Why? I think the idea is that if you just fine AT&TFORM then you ignore it if you don't find the other strings. | - The matroska one might be audio/matroska as well, but the check is not | sophisticated enough to distinguish them. Audio only matroska files are rare | though... Thanks, christos From mtmkls at freemail.hu Mon Sep 28 00:21:59 2009 From: mtmkls at freemail.hu (Mate Miklos) Date: Sun, 27 Sep 2009 23:21:59 +0200 Subject: some mime detection magic In-Reply-To: <20090927190346.6814C5654E@rebar.astron.com> References: <20090927190346.6814C5654E@rebar.astron.com> Message-ID: <200909272321.59297.mtmkls@freemail.hu> On 2009 September 27 Sonntag, Christos Zoulas wrote: > On Sep 27, 7:54pm, mtmkls at freemail.hu (Mate Miklos) wrote: > -- Subject: some mime detection magic > > | Hi, > | > | These changes improved the detection for me, but they might not be 100% > | correct. > | > | Some details: > | - I'm not entirely sure the mpeg4-generic is the correct IANA mime type > | for that paricular container format, but it seemed like a good candidate. > | Maybe someone on this list knows it better than me... > | - The DjVu in its original state didn't work > > Why? I think the idea is that if you just fine AT&TFORM then you ignore > it if you don't find the other strings. If this false positive is a real concern (I doubt it), then the following is the right solution: --- file-5.03-orig/magic/Magdir/images 2009-02-02 16:55:49.000000000 +0100 +++ file-5.03/magic/Magdir/images 2009-09-27 23:09:11.000000000 +0200 @@ -546,11 +546,14 @@ # Submitted by: Stephane Loeuillet <stephane.loeuillet at tiscali.fr> # Modified by (1): Abel Cheung <abelcheung at gmail.com> 0 string AT&TFORM -!:mime image/vnd.djvu >12 string DJVM DjVu multiple page document +!:mime image/vnd.djvu >12 string DJVU DjVu image or single page document +!:mime image/vnd.djvu >12 string DJVI DjVu shared document +!:mime image/vnd.djvu >12 string THUM DjVu page thumbnails +!:mime image/vnd.djvu The main point is that the ``!:mime'' line must stand after a verdict line, which was not the case in the original solution. MM -- #!/usr/bin/perl -l m,[A-Z],?@;:@?=(@;=>$_)for(split$,,[A-Z]);push@@,lc(pack(q=c*==> ord($;[$[+1])-chr(ord@;)));$_?${@}[++$#{@}]:$,=chr(ord($@[$^W])- ($#{y-++--;qq,;,}>>$_))for(0,1);$_.=join$",@@,ke and print split From christos at zoulas.com Mon Sep 28 00:48:32 2009 From: christos at zoulas.com (Christos Zoulas) Date: Sun, 27 Sep 2009 17:48:32 -0400 Subject: some mime detection magic In-Reply-To: <200909272321.59297.mtmkls@freemail.hu> from Mate Miklos (Sep 27, 11:21pm) Message-ID: <20090927214832.4ED585654E@rebar.astron.com> On Sep 27, 11:21pm, mtmkls at freemail.hu (Mate Miklos) wrote: -- Subject: Re: some mime detection magic | On 2009 September 27 Sonntag, Christos Zoulas wrote: | > On Sep 27, 7:54pm, mtmkls at freemail.hu (Mate Miklos) wrote: | > -- Subject: some mime detection magic | > | > | Hi, | > | | > | These changes improved the detection for me, but they might not be 100% | > | correct. | > | | > | Some details: | > | - I'm not entirely sure the mpeg4-generic is the correct IANA mime type | > | for that paricular container format, but it seemed like a good candidate. | > | Maybe someone on this list knows it better than me... | > | - The DjVu in its original state didn't work | > | > Why? I think the idea is that if you just fine AT&TFORM then you ignore | > it if you don't find the other strings. | If this false positive is a real concern (I doubt it), then the following is | the right solution: | | --- file-5.03-orig/magic/Magdir/images 2009-02-02 16:55:49.000000000 +0100 | +++ file-5.03/magic/Magdir/images 2009-09-27 23:09:11.000000000 +0200 | @@ -546,11 +546,14 @@ | # Submitted by: Stephane Loeuillet <stephane.loeuillet at tiscali.fr> | # Modified by (1): Abel Cheung <abelcheung at gmail.com> | 0 string AT&TFORM | -!:mime image/vnd.djvu | >12 string DJVM DjVu multiple page document | +!:mime image/vnd.djvu | >12 string DJVU DjVu image or single page document | +!:mime image/vnd.djvu | >12 string DJVI DjVu shared document | +!:mime image/vnd.djvu | >12 string THUM DjVu page thumbnails | +!:mime image/vnd.djvu | | The main point is that the ``!:mime'' line must stand after a verdict line, | which was not the case in the original solution. Ok, I will change it. Thanks, christos From Jens.deSmit at surfnet.nl Wed Oct 7 22:05:05 2009 From: Jens.deSmit at surfnet.nl (Jens de Smit) Date: Wed, 07 Oct 2009 21:05:05 +0200 Subject: Magic info for MPEG-4 in SonyPSP format Message-ID: <4ACCE661.70108@surfnet.nl> Hello list, I've encountered some video files in our video platform that come from a Flip HD (a cellphone-sized video camera). These are classified by file(1) as file type "ISO media" and MIME type "application/octet-stream". It actually is MPEG-4 video in SonyPSP format. I've attached a patch for magic/Magdir/animation to add this media type. Could this be integrated into the next release? Thanks! Regards, Jens de Smit -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: animation.patch URL: <http://mx.gw.com/pipermail/file/attachments/20091007/85d56116/attachment.ksh> From dottedmag at dottedmag.net Fri Oct 16 10:26:21 2009 From: dottedmag at dottedmag.net (Mikhail Gusarov) Date: Fri, 16 Oct 2009 14:26:21 +0700 Subject: Plain text file interpreted as Lisp/Scheme Message-ID: <87vdidui4d.fsf@vertex.dottedmag.net> Hello. file-5.03 interpretes attached file as Lisp/Scheme, though it's just a plain text without lots of parenthesis. -- http://fossarchy.blogspot.com/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: not available URL: <http://mx.gw.com/pipermail/file/attachments/20091016/6e68b4b5/attachment.bin> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: index.help URL: <http://mx.gw.com/pipermail/file/attachments/20091016/6e68b4b5/attachment.ksh> From ldv at altlinux.org Mon Oct 19 15:07:48 2009 From: ldv at altlinux.org (Dmitry V. Levin) Date: Mon, 19 Oct 2009 16:07:48 +0400 Subject: Plain text file interpreted as Lisp/Scheme In-Reply-To: <87vdidui4d.fsf@vertex.dottedmag.net> References: <87vdidui4d.fsf@vertex.dottedmag.net> Message-ID: <20091019120748.GA17721@wo.int.altlinux.org> Hi, On Fri, Oct 16, 2009 at 02:26:21PM +0700, Mikhail Gusarov wrote: > > file-5.03 interpretes attached file as Lisp/Scheme, though it's just a > plain text without lots of parenthesis. It seems to be a regression: my old good file-4.26-alt3 package recognizes this file as an "ASCII English text". -- ldv -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: <http://mx.gw.com/pipermail/file/attachments/20091019/88bf8776/attachment.bin> From dnovotny at redhat.com Mon Oct 19 15:23:53 2009 From: dnovotny at redhat.com (Daniel Novotny) Date: Mon, 19 Oct 2009 08:23:53 -0400 (EDT) Subject: Plain text file interpreted as Lisp/Scheme In-Reply-To: <1249268779.341861255954999087.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> Message-ID: <1966933609.341881255955033590.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> hello, there's a string "(if" inside... I posted a fix a while ago: http://mx.gw.com/pipermail/file/2009/000422.html - Daniel Novotny, Red Hat inc. ----- "Mikhail Gusarov" <dottedmag at dottedmag.net> wrote: > Hello. > > file-5.03 interpretes attached file as Lisp/Scheme, though it's just > a > plain text without lots of parenthesis. > > -- > http://fossarchy.blogspot.com/ > > _______________________________________________ > File mailing list > File at mx.gw.com > http://mx.gw.com/mailman/listinfo/file -------------- next part -------------- A non-text attachment was scrubbed... Name: file-5.03-ifany.patch Type: text/x-patch Size: 507 bytes Desc: not available URL: <http://mx.gw.com/pipermail/file/attachments/20091019/24a33224/attachment.bin> From christos at zoulas.com Mon Oct 19 16:10:46 2009 From: christos at zoulas.com (Christos Zoulas) Date: Mon, 19 Oct 2009 09:10:46 -0400 Subject: Plain text file interpreted as Lisp/Scheme In-Reply-To: <87vdidui4d.fsf@vertex.dottedmag.net> from Mikhail Gusarov (Oct 16, 2:26pm) Message-ID: <20091019131046.EE3855654E@rebar.astron.com> On Oct 16, 2:26pm, dottedmag at dottedmag.net (Mikhail Gusarov) wrote: -- Subject: Plain text file interpreted as Lisp/Scheme | Hello. | | file-5.03 interpretes attached file as Lisp/Scheme, though it's just a | plain text without lots of parenthesis. Thanks, this has been fixed on head. christos From dnovotny at redhat.com Wed Oct 21 17:14:30 2009 From: dnovotny at redhat.com (Daniel Novotny) Date: Wed, 21 Oct 2009 10:14:30 -0400 (EDT) Subject: new magic entry: Linux swap file/device for PowerPC In-Reply-To: <40224919.501751256134449881.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> Message-ID: <564723347.501831256134470500.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> hello, due to one of our Bugzilla reports, we found out that even if there is magic for Linux swap file or swap device (devices can be tested with "file -s"), it is only for Intel platform. Swapfile for PowerPC has the same identifier/header, but due to different page size it is on different offset this new magic entry describes the identifier: 65526 string SWAPSPACE2 Linux/ppc swap file I've put it in the "linux" magic file, patch against 5.03 attached regards, Daniel Novotny, Red Hat inc. -------------- next part -------------- A non-text attachment was scrubbed... Name: file-5.03-ppcswap.patch Type: text/x-patch Size: 595 bytes Desc: not available URL: <http://mx.gw.com/pipermail/file/attachments/20091021/a145ae96/attachment.bin> From cemeyer at u.washington.edu Thu Oct 22 02:04:15 2009 From: cemeyer at u.washington.edu (Conrad Meyer) Date: Wed, 21 Oct 2009 16:04:15 -0700 Subject: File errors recognizing a file Message-ID: <200910211604.15215.cemeyer@u.washington.edu> Hi, My use case is that I am a Fedora packager and I'm trying to include some files in a package. rpmbuild uses file at the very end of the rpm creation process, and quits early if file(1) returns non-zero. file(1) happens to return 1 for a certain file, which makes it difficult to package. I *think* this is a bug in file(1). So, I'm looking for input. (Is it a bug? Is it not a bug? If not, should rpm be fixed and not care about file's return value?) Version of file(1): 5.03 Example file that fails: http://abgx360.net/Apps/StealthFiles/DMI_12C11D10.bin Output from file(1): DMI_12C11D10.bin: ERROR: Infocom game data (Z-machine 2, Release 0 / vasprintf failed (Invalid or incomplete multibyte or wide character) Thanks, -- Conrad Meyer <cemeyer at u.washington.edu> From kimmo at suominen.com Thu Oct 22 15:46:38 2009 From: kimmo at suominen.com (Kimmo Suominen) Date: Thu, 22 Oct 2009 08:46:38 -0400 Subject: File errors recognizing a file In-Reply-To: <200910211604.15215.cemeyer@u.washington.edu> References: <200910211604.15215.cemeyer@u.washington.edu> Message-ID: <b32e77100910220546n465b175cgd63df2027164f5b9@mail.gmail.com> Hi Conrad, If you set your LANG (and/or LC_CTYPE) environment variable to a single-byte locale or unset it completely, do you still get the error? Best regards, + Kim On Wed, Oct 21, 2009 at 19:04, Conrad Meyer <cemeyer at u.washington.edu>wrote: > Hi, > > My use case is that I am a Fedora packager and I'm trying to include some > files in a package. rpmbuild uses file at the very end of the rpm creation > process, and quits early if file(1) returns non-zero. file(1) happens to > return 1 for a certain file, which makes it difficult to package. I *think* > this is a bug in file(1). So, I'm looking for input. (Is it a bug? Is it > not a > bug? If not, should rpm be fixed and not care about file's return value?) > > Version of file(1): 5.03 > Example file that fails: > http://abgx360.net/Apps/StealthFiles/DMI_12C11D10.bin > > Output from file(1): > DMI_12C11D10.bin: ERROR: Infocom game data (Z-machine 2, Release 0 / > vasprintf > failed (Invalid or incomplete multibyte or wide character) > > Thanks, > -- > Conrad Meyer <cemeyer at u.washington.edu> > > _______________________________________________ > File mailing list > File at mx.gw.com > http://mx.gw.com/mailman/listinfo/file > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mx.gw.com/pipermail/file/attachments/20091022/5bbf2f54/attachment.html> From christos at zoulas.com Thu Oct 22 17:18:46 2009 From: christos at zoulas.com (Christos Zoulas) Date: Thu, 22 Oct 2009 10:18:46 -0400 Subject: File errors recognizing a file In-Reply-To: <200910211604.15215.cemeyer@u.washington.edu> from Conrad Meyer (Oct 21, 4:04pm) Message-ID: <20091022141846.45C3F5654E@rebar.astron.com> On Oct 21, 4:04pm, cemeyer at u.washington.edu (Conrad Meyer) wrote: -- Subject: File errors recognizing a file | Hi, | | My use case is that I am a Fedora packager and I'm trying to include some | files in a package. rpmbuild uses file at the very end of the rpm creation | process, and quits early if file(1) returns non-zero. file(1) happens to | return 1 for a certain file, which makes it difficult to package. I *think* | this is a bug in file(1). So, I'm looking for input. (Is it a bug? Is it not a | bug? If not, should rpm be fixed and not care about file's return value?) | | Version of file(1): 5.03 | Example file that fails: http://abgx360.net/Apps/StealthFiles/DMI_12C11D10.bin | | Output from file(1): | DMI_12C11D10.bin: ERROR: Infocom game data (Z-machine 2, Release 0 / vasprintf | failed (Invalid or incomplete multibyte or wide character) What OS version and locale settings do you have? christos From ordo.ad at gmail.com Thu Oct 22 17:35:04 2009 From: ordo.ad at gmail.com (Alessandro Doro) Date: Thu, 22 Oct 2009 16:35:04 +0200 Subject: File errors recognizing a file In-Reply-To: <b32e77100910220546n465b175cgd63df2027164f5b9@mail.gmail.com> References: <200910211604.15215.cemeyer@u.washington.edu> <b32e77100910220546n465b175cgd63df2027164f5b9@mail.gmail.com> Message-ID: <20091022143504.GA25404@M50> On Thu, Oct 22, 2009 at 08:46:38AM -0400, Kimmo Suominen wrote: > If you set your LANG (and/or LC_CTYPE) environment variable to a single-byte > locale or unset it completely, do you still get the error? Hi, Machine 1 (Arch Linux i686) $ uname -srvmpio Linux 2.6.31-ARCH #1 SMP PREEMPT Tue Oct 13 13:36:23 CEST 2009 i686 Intel(R) Pentium(R) 4 CPU 1400MHz GenuineIntel GNU/Linux Machine 2 (Arch Linux x86_64) $ uname -srvmpio Linux 2.6.31-ARCH #1 SMP PREEMPT Tue Oct 13 11:33:39 CEST 2009 x86_64 Intel(R) Core(TM)2 Duo CPU T9550 @ 2.66GHz GenuineIntel GNU/Linux Same results: $ file DMI_12C11D10.bin DMI_12C11D10.bin: ERROR: Infocom game data (Z-machine 2, Release 0 / vasprintf failed (Invalid or incomplete multibyte or wide character) $ LANG=C file DMI_12C11D10.bin DMI_12C11D10.bin: Infocom game data (Z-machine 2, Release 0 / Serial ;9\002\307) $ locale LANG=it_IT.utf8 LC_CTYPE="it_IT.utf8" LC_NUMERIC="it_IT.utf8" LC_TIME="it_IT.utf8" LC_COLLATE=C LC_MONETARY="it_IT.utf8" LC_MESSAGES="it_IT.utf8" LC_PAPER="it_IT.utf8" LC_NAME="it_IT.utf8" LC_ADDRESS="it_IT.utf8" LC_TELEPHONE="it_IT.utf8" LC_MEASUREMENT="it_IT.utf8" LC_IDENTIFICATION="it_IT.utf8" LC_ALL= Tried also with en_US.utf8 (gives error), it_IT.iso885915 at euro (no errors). From cemeyer at u.washington.edu Thu Oct 22 19:32:51 2009 From: cemeyer at u.washington.edu (Conrad Meyer) Date: Thu, 22 Oct 2009 09:32:51 -0700 Subject: File errors recognizing a file In-Reply-To: <20091022141846.45C3F5654E@rebar.astron.com> References: <20091022141846.45C3F5654E@rebar.astron.com> Message-ID: <200910220932.51147.cemeyer@u.washington.edu> On Thursday 22 October 2009 07:18:46 am Christos Zoulas wrote: > On Oct 21, 4:04pm, cemeyer at u.washington.edu (Conrad Meyer) wrote: > -- Subject: File errors recognizing a file > > | Hi, > | > | My use case is that I am a Fedora packager and I'm trying to include some > | files in a package. rpmbuild uses file at the very end of the rpm > | creation process, and quits early if file(1) returns non-zero. file(1) > | happens to return 1 for a certain file, which makes it difficult to > | package. I *think* this is a bug in file(1). So, I'm looking for input. > | (Is it a bug? Is it not a bug? If not, should rpm be fixed and not care > | about file's return value?) > | > | Version of file(1): 5.03 > | Example file that fails: > | http://abgx360.net/Apps/StealthFiles/DMI_12C11D10.bin > | > | Output from file(1): > | DMI_12C11D10.bin: ERROR: Infocom game data (Z-machine 2, Release 0 / > | vasprintf failed (Invalid or incomplete multibyte or wide character) > > What OS version and locale settings do you have? > > christos Fedora 11, x86_64. $ locale LANG=en_US.utf8 LC_CTYPE="en_US.utf8" LC_NUMERIC="en_US.utf8" LC_TIME="en_US.utf8" LC_COLLATE="en_US.utf8" LC_MONETARY="en_US.utf8" LC_MESSAGES="en_US.utf8" LC_PAPER="en_US.utf8" LC_NAME="en_US.utf8" LC_ADDRESS="en_US.utf8" LC_TELEPHONE="en_US.utf8" LC_MEASUREMENT="en_US.utf8" LC_IDENTIFICATION="en_US.utf8" LC_ALL= Regards, -- Conrad Meyer <cemeyer at u.washington.edu> From christos at zoulas.com Thu Oct 22 20:04:38 2009 From: christos at zoulas.com (Christos Zoulas) Date: Thu, 22 Oct 2009 13:04:38 -0400 Subject: File errors recognizing a file In-Reply-To: <20091022143504.GA25404@M50> from Alessandro Doro (Oct 22, 4:35pm) Message-ID: <20091022170438.D36505654F@rebar.astron.com> On Oct 22, 4:35pm, ordo.ad at gmail.com (Alessandro Doro) wrote: -- Subject: Re: File errors recognizing a file | On Thu, Oct 22, 2009 at 08:46:38AM -0400, Kimmo Suominen wrote: | > If you set your LANG (and/or LC_CTYPE) environment variable to a single-byte | > locale or unset it completely, do you still get the error? | Hi, | | Machine 1 (Arch Linux i686) | $ uname -srvmpio | Linux 2.6.31-ARCH #1 SMP PREEMPT Tue Oct 13 13:36:23 CEST 2009 i686 Intel(R) Pentium(R) 4 CPU 1400MHz GenuineIntel GNU/Linux | | Machine 2 (Arch Linux x86_64) | $ uname -srvmpio | Linux 2.6.31-ARCH #1 SMP PREEMPT Tue Oct 13 11:33:39 CEST 2009 x86_64 Intel(R) Core(TM)2 Duo CPU T9550 @ 2.66GHz GenuineIntel GNU/Linux | | Same results: | | $ file DMI_12C11D10.bin | DMI_12C11D10.bin: ERROR: Infocom game data (Z-machine 2, Release 0 / vasprintf failed (Invalid or incomplete multibyte or wide character) | | $ LANG=C file DMI_12C11D10.bin | DMI_12C11D10.bin: Infocom game data (Z-machine 2, Release 0 / Serial ;9\002\307) | | $ locale | LANG=it_IT.utf8 | LC_CTYPE="it_IT.utf8" | LC_NUMERIC="it_IT.utf8" | LC_TIME="it_IT.utf8" | LC_COLLATE=C | LC_MONETARY="it_IT.utf8" | LC_MESSAGES="it_IT.utf8" | LC_PAPER="it_IT.utf8" | LC_NAME="it_IT.utf8" | LC_ADDRESS="it_IT.utf8" | LC_TELEPHONE="it_IT.utf8" | LC_MEASUREMENT="it_IT.utf8" | LC_IDENTIFICATION="it_IT.utf8" | LC_ALL= | | Tried also with en_US.utf8 (gives error), it_IT.iso885915 at euro (no errors). This is expected, as utf8 is a multi-byte encoding and glibc throws errors when invalid character sequences are found. christos From cemeyer at u.washington.edu Thu Oct 22 19:31:59 2009 From: cemeyer at u.washington.edu (Conrad Meyer) Date: Thu, 22 Oct 2009 09:31:59 -0700 Subject: File errors recognizing a file In-Reply-To: <20091022143504.GA25404@M50> References: <200910211604.15215.cemeyer@u.washington.edu> <b32e77100910220546n465b175cgd63df2027164f5b9@mail.gmail.com> <20091022143504.GA25404@M50> Message-ID: <200910220932.00118.cemeyer@u.washington.edu> On Thursday 22 October 2009 07:35:04 am Alessandro Doro wrote: > On Thu, Oct 22, 2009 at 08:46:38AM -0400, Kimmo Suominen wrote: > > If you set your LANG (and/or LC_CTYPE) environment variable to a > > single-byte locale or unset it completely, do you still get the error? > > *snip* > > Same results: > > $ file DMI_12C11D10.bin > DMI_12C11D10.bin: ERROR: Infocom game data (Z-machine 2, Release 0 / > vasprintf failed (Invalid or incomplete multibyte or wide character) > > $ LANG=C file DMI_12C11D10.bin > DMI_12C11D10.bin: Infocom game data (Z-machine 2, Release 0 / Serial > ;9\002\307) > > *snip* > > Tried also with en_US.utf8 (gives error), it_IT.iso885915 at euro (no errors). I get the same results (my LANG is en_US.utf8 by default). -- Conrad Meyer <cemeyer at u.washington.edu> From cemeyer at u.washington.edu Thu Oct 22 21:35:03 2009 From: cemeyer at u.washington.edu (Conrad Meyer) Date: Thu, 22 Oct 2009 11:35:03 -0700 Subject: File errors recognizing a file In-Reply-To: <20091022170438.D36505654F@rebar.astron.com> References: <20091022170438.D36505654F@rebar.astron.com> Message-ID: <200910221135.04049.cemeyer@u.washington.edu> On Thursday 22 October 2009 10:04:38 am Christos Zoulas wrote: > On Oct 22, 4:35pm, ordo.ad at gmail.com (Alessandro Doro) wrote: > | $ LANG=C file DMI_12C11D10.bin > | DMI_12C11D10.bin: Infocom game data (Z-machine 2, Release 0 / Serial > | ;9\002\307) > > This is expected, as utf8 is a multi-byte encoding and glibc throws errors > when invalid character sequences are found. > > christos Ok, so is it wrong for RPM to rely on file exiting with a zero status (though the file type was recognize more or less correctly)? -- Conrad Meyer <cemeyer at u.washington.edu> From guy at alum.mit.edu Thu Oct 22 21:39:47 2009 From: guy at alum.mit.edu (Guy Harris) Date: Thu, 22 Oct 2009 11:39:47 -0700 Subject: File errors recognizing a file In-Reply-To: <20091022170438.D36505654F@rebar.astron.com> References: <20091022170438.D36505654F@rebar.astron.com> Message-ID: <13A0FACD-5701-4864-B5E8-14BC4F5CA750@alum.mit.edu> On Oct 22, 2009, at 10:04 AM, Christos Zoulas wrote: > This is expected, as utf8 is a multi-byte encoding and glibc throws > errors > when invalid character sequences are found. Since we cannot guarantee that a file will have only valid UTF-8 strings in the fields we display as strings, should we do one or more of: 1) when displaying strings, display all bytes with the 8th bit set, and all non-printable characters, as escapes; 2) support, in the magic file, specifying the character encoding of strings (just as we specify the byte order of integers), and use the iconv routines to convert them to the display character set; 3) reject all magic file entries that specify strings if the string isn't valid in the encoding we're using; etc.? From kimmo at suominen.com Thu Oct 22 22:41:07 2009 From: kimmo at suominen.com (Kimmo Suominen) Date: Thu, 22 Oct 2009 15:41:07 -0400 Subject: File errors recognizing a file In-Reply-To: <200910221135.04049.cemeyer@u.washington.edu> References: <20091022170438.D36505654F@rebar.astron.com> <200910221135.04049.cemeyer@u.washington.edu> Message-ID: <b32e77100910221241g4396d2b5ne89789790ef00bb1@mail.gmail.com> I don't think it is advisable to ignore the exit code from file. In the rpm build case, it would be best to remove LANG and all LC_* environment variables before proceeding with building the binary package. + Kim On 2009-10-22, Conrad Meyer <cemeyer at u.washington.edu> wrote: > On Thursday 22 October 2009 10:04:38 am Christos Zoulas wrote: >> On Oct 22, 4:35pm, ordo.ad at gmail.com (Alessandro Doro) wrote: >> | $ LANG=C file DMI_12C11D10.bin >> | DMI_12C11D10.bin: Infocom game data (Z-machine 2, Release 0 / Serial >> | ;9\002\307) >> >> This is expected, as utf8 is a multi-byte encoding and glibc throws errors >> when invalid character sequences are found. >> >> christos > > Ok, so is it wrong for RPM to rely on file exiting with a zero status > (though > the file type was recognize more or less correctly)? > > -- > Conrad Meyer <cemeyer at u.washington.edu> > > _______________________________________________ > File mailing list > File at mx.gw.com > http://mx.gw.com/mailman/listinfo/file > From christos at zoulas.com Thu Oct 22 23:59:50 2009 From: christos at zoulas.com (Christos Zoulas) Date: Thu, 22 Oct 2009 16:59:50 -0400 Subject: File errors recognizing a file In-Reply-To: <200910221135.04049.cemeyer@u.washington.edu> from Conrad Meyer (Oct 22, 11:35am) Message-ID: <20091022205950.9B6FD5654F@rebar.astron.com> On Oct 22, 11:35am, cemeyer at u.washington.edu (Conrad Meyer) wrote: -- Subject: Re: File errors recognizing a file | On Thursday 22 October 2009 10:04:38 am Christos Zoulas wrote: | > On Oct 22, 4:35pm, ordo.ad at gmail.com (Alessandro Doro) wrote: | > | $ LANG=C file DMI_12C11D10.bin | > | DMI_12C11D10.bin: Infocom game data (Z-machine 2, Release 0 / Serial | > | ;9\002\307) | > | > This is expected, as utf8 is a multi-byte encoding and glibc throws errors | > when invalid character sequences are found. | > | > christos | | Ok, so is it wrong for RPM to rely on file exiting with a zero status (though | the file type was recognize more or less correctly)? It is probably wrong to build RPM's with random environment settings. christos From christos at zoulas.com Fri Oct 23 00:07:41 2009 From: christos at zoulas.com (Christos Zoulas) Date: Thu, 22 Oct 2009 17:07:41 -0400 Subject: File errors recognizing a file In-Reply-To: <13A0FACD-5701-4864-B5E8-14BC4F5CA750@alum.mit.edu> from Guy Harris (Oct 22, 11:39am) Message-ID: <20091022210741.2CF6F5654E@rebar.astron.com> On Oct 22, 11:39am, guy at alum.mit.edu (Guy Harris) wrote: -- Subject: Re: File errors recognizing a file | | On Oct 22, 2009, at 10:04 AM, Christos Zoulas wrote: | | > This is expected, as utf8 is a multi-byte encoding and glibc throws | > errors | > when invalid character sequences are found. | | Since we cannot guarantee that a file will have only valid UTF-8 | strings in the fields we display as strings, should we do one or more | of: | | 1) when displaying strings, display all bytes with the 8th bit set, | and all non-printable characters, as escapes; That would display things incorrectly sometimes. | 2) support, in the magic file, specifying the character encoding of | strings (just as we specify the byte order of integers), and use the | iconv routines to convert them to the display character set; That would be difficult, and perhaps not appropriate for many magic enties. | 3) reject all magic file entries that specify strings if the string | isn't valid in the encoding we're using; This will make file behave differently depending on the locale setting [find different magic]. What do you think about: 4) check the string for validity in the current locale; if it is valid print it, if not print an escaped version of it. christos From guy at alum.mit.edu Fri Oct 23 00:12:57 2009 From: guy at alum.mit.edu (Guy Harris) Date: Thu, 22 Oct 2009 14:12:57 -0700 Subject: File errors recognizing a file In-Reply-To: <20091022210741.2CF6F5654E@rebar.astron.com> References: <20091022210741.2CF6F5654E@rebar.astron.com> Message-ID: <2C5082EA-A4D7-4CEB-B937-57AB57E64D13@alum.mit.edu> On Oct 22, 2009, at 2:07 PM, Christos Zoulas wrote: > On Oct 22, 11:39am, guy at alum.mit.edu (Guy Harris) wrote: > | 1) when displaying strings, display all bytes with the 8th bit set, > | and all non-printable characters, as escapes; > > That would display things incorrectly sometimes. ... > What do you think about: > > 4) check the string for validity in the current locale; if it is > valid print it, if not print an escaped version of it. Well, that could also display things incorrectly sometimes (if the string isn't in the encoding for the current locale, but happens to be valid in that locale), but, at least when the encoding in the file is the same as the current locale's encoding, it will display strings as they're intended to be viewed, unlike 1), which would never display any non-ASCII string as it's intended to be viewed. From christos at zoulas.com Fri Oct 23 19:46:39 2009 From: christos at zoulas.com (Christos Zoulas) Date: Fri, 23 Oct 2009 12:46:39 -0400 Subject: new magic entry: Linux swap file/device for PowerPC In-Reply-To: <564723347.501831256134470500.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> from Daniel Novotny (Oct 21, 10:14am) Message-ID: <20091023164639.33F325654E@rebar.astron.com> On Oct 21, 10:14am, dnovotny at redhat.com (Daniel Novotny) wrote: -- Subject: new magic entry: Linux swap file/device for PowerPC | hello, | | due to one of our Bugzilla reports, we found out that even if there is magic for Linux swap file or swap device (devices can be tested with "file -s"), it is only for Intel platform. Swapfile for PowerPC has the same identifier/header, but due to different page size it is on different offset | | this new magic entry describes the identifier: | | 65526 string SWAPSPACE2 Linux/ppc swap file | | I've put it in the "linux" magic file, patch against 5.03 attached Thanks, added. christos From dmalcolm at redhat.com Mon Oct 26 22:26:31 2009 From: dmalcolm at redhat.com (David Malcolm) Date: Mon, 26 Oct 2009 16:26:31 -0400 Subject: PATCH: add support for Python 3 bytecode files Message-ID: <1256588791.20689.574.camel@brick> Attached is a patch to file-5.0.3 to add support for python 3 bytecode files. With this patch, "file" can detect a python 3 bytecode file: $ file /usr/lib/python3.1/re.pyo /usr/lib/python3.1/re.pyo: python 3.1 byte-compiled I also slightly fixed-up an out-of-date comment Caveats: (i) patch is actually against Fedora's somewhat patched downstream file-5.0.3; is there a SCM I should be generating a patch against?) (ii) only tested with 3.1; I calculated up the value for 3.0 from the source in python's py3k branch's Python/import.c: [snip] Python 3.0a5: 3130 (lexical exception stacking, including POP_EXCEPT) [snip] Python 3.1a0: 3150 (optimize conditional branches: introduce POP_JUMP_IF_FALSE and POP_JUMP_IF_TRUE) [snip] Note that the value has 1 added to it in py3k's _PyImport_Init, giving: python 3.0: magic=3131 = 0x0c3b (hex) = 0x3b 0x0c 0xc0 0x0a (header) and python 3.1: magic=3151 = 0x0c4f (hex) = 0x4f 0x0c 0x0d 0x0a (header) FWIW I'm tracking this downstream here: https://bugzilla.redhat.com/show_bug.cgi?id=531082 Hope this is helpful Dave -------------- next part -------------- A non-text attachment was scrubbed... Name: file-5.03-add-python-3.patch Type: text/x-patch Size: 1062 bytes Desc: not available URL: <http://mx.gw.com/pipermail/file/attachments/20091026/e2700f2e/attachment.bin> From christos at zoulas.com Tue Oct 27 17:10:53 2009 From: christos at zoulas.com (Christos Zoulas) Date: Tue, 27 Oct 2009 11:10:53 -0400 Subject: PATCH: add support for Python 3 bytecode files In-Reply-To: <1256588791.20689.574.camel@brick> from David Malcolm (Oct 26, 4:26pm) Message-ID: <20091027151053.796B45654F@rebar.astron.com> On Oct 26, 4:26pm, dmalcolm at redhat.com (David Malcolm) wrote: -- Subject: PATCH: add support for Python 3 bytecode files | Attached is a patch to file-5.0.3 to add support for python 3 bytecode | files. | | With this patch, "file" can detect a python 3 bytecode file: | $ file /usr/lib/python3.1/re.pyo | /usr/lib/python3.1/re.pyo: python 3.1 byte-compiled | | I also slightly fixed-up an out-of-date comment | | Caveats: | (i) patch is actually against Fedora's somewhat patched downstream | file-5.0.3; is there a SCM I should be generating a patch against?) | (ii) only tested with 3.1; I calculated up the value for 3.0 from the source | in python's py3k branch's Python/import.c: | [snip] | Python 3.0a5: 3130 (lexical exception stacking, including POP_EXCEPT) | [snip] | Python 3.1a0: 3150 (optimize conditional branches: | introduce POP_JUMP_IF_FALSE and POP_JUMP_IF_TRUE) | [snip] | | Note that the value has 1 added to it in py3k's _PyImport_Init, | giving: | python 3.0: magic=3131 = 0x0c3b (hex) = 0x3b 0x0c 0xc0 0x0a (header) | and | python 3.1: magic=3151 = 0x0c4f (hex) = 0x4f 0x0c 0x0d 0x0a (header) | | FWIW I'm tracking this downstream here: | https://bugzilla.redhat.com/show_bug.cgi?id=531082 Thanks a lot. It is all applied now. christos From vincent.gilfaut at ciridd.org Mon Nov 2 09:48:39 2009 From: vincent.gilfaut at ciridd.org (Vincent Gilfaut) Date: Mon, 02 Nov 2009 08:48:39 +0100 Subject: File 5.03 magic Message-ID: <4AEE8ED7.2030201@ciridd.org> Hi, fileinfo need magic. I compiled the version 5.03 and only the magic.mgc is generated. You mentioned in a previous message, it was a bug, there 's a patch ? Thank you Vincent -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mx.gw.com/pipermail/file/attachments/20091102/2afc1d05/attachment.html> From christos at zoulas.com Mon Nov 2 15:29:31 2009 From: christos at zoulas.com (Christos Zoulas) Date: Mon, 2 Nov 2009 08:29:31 -0500 Subject: File 5.03 magic In-Reply-To: <4AEE8ED7.2030201@ciridd.org> from Vincent Gilfaut (Nov 2, 8:48am) Message-ID: <20091102132931.65FDB5654E@rebar.astron.com> On Nov 2, 8:48am, vincent.gilfaut at ciridd.org (Vincent Gilfaut) wrote: -- Subject: File 5.03 magic | Hi, | fileinfo need magic. | I compiled the version 5.03 and only the magic.mgc is generated. | You mentioned in a previous message, it was a bug, there 's a patch ? | I think that there is a misunderstanding. Only magic.mgc will be generated from now on. christos From vincent.gilfaut at ciridd.org Mon Nov 2 16:15:20 2009 From: vincent.gilfaut at ciridd.org (Vincent Gilfaut) Date: Mon, 02 Nov 2009 15:15:20 +0100 Subject: File 5.03 magic In-Reply-To: <20091102132931.65FDB5654E@rebar.astron.com> References: <20091102132931.65FDB5654E@rebar.astron.com> Message-ID: <4AEEE978.3090805@ciridd.org> Thank you for your answer that means that its'nt possible to use file 5.03 with PHP fileinfo ? One possibility is to use file command with exec php commande, but i know file is not able to test remote file, no ? vincent Le 02/11/2009 14:29, Christos Zoulas a ?crit : > On Nov 2, 8:48am, vincent.gilfaut at ciridd.org (Vincent Gilfaut) wrote: > -- Subject: File 5.03 magic > > | Hi, > | fileinfo need magic. > | I compiled the version 5.03 and only the magic.mgc is generated. > | You mentioned in a previous message, it was a bug, there 's a patch ? > | > > I think that there is a misunderstanding. Only magic.mgc will be generated > from now on. > > christos > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mx.gw.com/pipermail/file/attachments/20091102/9abcf832/attachment.html> From dnovotny at redhat.com Mon Nov 2 16:17:56 2009 From: dnovotny at redhat.com (Daniel Novotny) Date: Mon, 2 Nov 2009 09:17:56 -0500 (EST) Subject: File 5.03 magic In-Reply-To: <4AEEE978.3090805@ciridd.org> Message-ID: <341328756.1117411257171476075.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> you can allways create the magic file manually with cat magic/Magdir/* > /path/magic ----- "Vincent Gilfaut" <vincent.gilfaut at ciridd.org> wrote: > Thank you for your answer > that means that its'nt possible to use file 5.03 with PHP fileinfo ? > > One possibility is to use file command with exec php commande, but i > know file is not able to test remote file, no ? > > vincent > > Le 02/11/2009 14:29, Christos Zoulas a ?crit : > > On Nov 2, 8:48am, vincent.gilfaut at ciridd.org (Vincent Gilfaut) wrote: > -- Subject: File 5.03 magic > > | Hi, > | fileinfo need magic. > | I compiled the version 5.03 and only the magic.mgc is generated. > | You mentioned in a previous message, it was a bug, there 's a patch > ? > | > > I think that there is a misunderstanding. Only magic.mgc will be > generated > from now on. > > christos > _______________________________________________ > File mailing list > File at mx.gw.com > http://mx.gw.com/mailman/listinfo/file From christos at zoulas.com Mon Nov 2 17:24:16 2009 From: christos at zoulas.com (Christos Zoulas) Date: Mon, 2 Nov 2009 10:24:16 -0500 Subject: File 5.03 magic In-Reply-To: <4AEEE978.3090805@ciridd.org> from Vincent Gilfaut (Nov 2, 3:15pm) Message-ID: <20091102152416.24AF65654E@rebar.astron.com> On Nov 2, 3:15pm, vincent.gilfaut at ciridd.org (Vincent Gilfaut) wrote: -- Subject: Re: File 5.03 magic | Thank you for your answer | that means that its'nt possible to use file 5.03 with PHP fileinfo ? | | One possibility is to use file command with exec php commande, but i | know file is not able to test remote file, no ? The extension uses libmagic so it should be perfectly able to parse the binary file format. christos From dnovotny at redhat.com Thu Nov 5 17:46:06 2009 From: dnovotny at redhat.com (Daniel Novotny) Date: Thu, 5 Nov 2009 10:46:06 -0500 (EST) Subject: deltarpm and delta ISO magic In-Reply-To: <1730665816.1352041257435872148.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> Message-ID: <167168733.1352241257435966044.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> hello, one of our users lacked magic entries for rpm-only deltarpms (files created by "makedeltarpm" command in rpm-based linux distros) and for delta ISOs (files created by "makedeltaiso" command) I created those entries, patch attached #delta RPM and delta ISO Daniel Novotny (dnovotny at redhat.com) 0 string DISO Delta ISO data >4 belong x version %d 0 string drpm Delta RPM !:mime application/x-rpm >12 string x %s downstream bugzilla reference: https://bugzilla.redhat.com/show_bug.cgi?id=533151 https://www.redhat.com/archives/fedora-devel-list/2009-November/msg00259.html best regards, Daniel Novotny, Red Hat inc. -------------- next part -------------- A non-text attachment was scrubbed... Name: file-5.03-delta.patch Type: text/x-patch Size: 903 bytes Desc: not available URL: <http://mx.gw.com/pipermail/file/attachments/20091105/803e90ed/attachment.bin> From christos at zoulas.com Fri Nov 6 15:54:07 2009 From: christos at zoulas.com (Christos Zoulas) Date: Fri, 6 Nov 2009 08:54:07 -0500 Subject: deltarpm and delta ISO magic In-Reply-To: <167168733.1352241257435966044.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> from Daniel Novotny (Nov 5, 10:46am) Message-ID: <20091106135407.98C585654E@rebar.astron.com> On Nov 5, 10:46am, dnovotny at redhat.com (Daniel Novotny) wrote: -- Subject: deltarpm and delta ISO magic | hello, | | one of our users lacked magic entries for rpm-only deltarpms | (files created by "makedeltarpm" command in rpm-based linux distros) | and for delta ISOs (files created by "makedeltaiso" command) | | I created those entries, patch attached | | #delta RPM and delta ISO Daniel Novotny (dnovotny at redhat.com) | 0 string DISO Delta ISO data | >4 belong x version %d | | 0 string drpm Delta RPM | !:mime application/x-rpm | >12 string x %s | | downstream bugzilla reference: | https://bugzilla.redhat.com/show_bug.cgi?id=533151 | https://www.redhat.com/archives/fedora-devel-list/2009-November/msg00259.html | | best regards, | | Daniel Novotny, Red Hat inc. Added, thanks! christos From dottedmag at dottedmag.net Tue Nov 10 21:31:31 2009 From: dottedmag at dottedmag.net (Mikhail Gusarov) Date: Wed, 11 Nov 2009 01:31:31 +0600 Subject: NekoVM magic Message-ID: <87k4xyi3cc.fsf@vertex.dottedmag.net> Hello. Attached are magic file for NekoVM (http://nekovm.org/) bytecode, and simple testcase. -- http://fossarchy.blogspot.com/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: not available URL: <http://mx.gw.com/pipermail/file/attachments/20091111/c69c0747/attachment.bin> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: neko URL: <http://mx.gw.com/pipermail/file/attachments/20091111/c69c0747/attachment.ksh> -------------- next part -------------- A non-text attachment was scrubbed... Name: test_nekovm.testfile Type: application/octet-stream Size: 278 bytes Desc: not available URL: <http://mx.gw.com/pipermail/file/attachments/20091111/c69c0747/attachment.obj> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: test_nekovm.result URL: <http://mx.gw.com/pipermail/file/attachments/20091111/c69c0747/attachment-0001.ksh> From christos at zoulas.com Tue Nov 10 22:36:35 2009 From: christos at zoulas.com (Christos Zoulas) Date: Tue, 10 Nov 2009 15:36:35 -0500 Subject: NekoVM magic In-Reply-To: <87k4xyi3cc.fsf@vertex.dottedmag.net> from Mikhail Gusarov (Nov 11, 1:31am) Message-ID: <20091110203635.D8BC45654E@rebar.astron.com> On Nov 11, 1:31am, dottedmag at dottedmag.net (Mikhail Gusarov) wrote: -- Subject: NekoVM magic | Hello. | | Attached are magic file for NekoVM (http://nekovm.org/) bytecode, and | simple testcase. Thanks, added. christos From c.cerbo at gmail.com Wed Nov 11 17:47:00 2009 From: c.cerbo at gmail.com (Costantino Cerbo) Date: Wed, 11 Nov 2009 16:47:00 +0100 Subject: Set of testfiles for Regression testing? Message-ID: <6fbbec30911110747s726fc48ew5294edadc48c2897@mail.gmail.com> Hallo, there is a public avaliable set of testfiles for the 'file' utility? How do we uncover if changes in the source code lead to unintended consequences or bugs? Or are there some unit tests? Thanks in advance for the answer, Costantino From c.cerbo at gmail.com Thu Nov 12 18:02:33 2009 From: c.cerbo at gmail.com (Costantino Cerbo) Date: Thu, 12 Nov 2009 17:02:33 +0100 Subject: Set of testfiles for Regression testing? In-Reply-To: <6fbbec30911110747s726fc48ew5294edadc48c2897@mail.gmail.com> References: <6fbbec30911110747s726fc48ew5294edadc48c2897@mail.gmail.com> Message-ID: <6fbbec30911120802g6db8ed7dy69fbdd881a56304b@mail.gmail.com> Hallo, there is a public avaliable set of testfiles for the 'file' utility? How do we uncover if changes in the source code lead to unintended consequences or bugs? Or are there some unit tests? Thanks in advance for the answer, Costantino P.S.: I experienced a problem with my subscription, therefore I don't know if I send this emal twice. If so, sorry for the double posting. From ian at darwinsys.com Thu Nov 12 18:10:56 2009 From: ian at darwinsys.com (Ian Darwin) Date: Thu, 12 Nov 2009 11:10:56 -0500 Subject: Set of testfiles for Regression testing? In-Reply-To: <6fbbec30911120802g6db8ed7dy69fbdd881a56304b@mail.gmail.com> References: <6fbbec30911110747s726fc48ew5294edadc48c2897@mail.gmail.com> <6fbbec30911120802g6db8ed7dy69fbdd881a56304b@mail.gmail.com> Message-ID: <4AFC3390.2010807@darwinsys.com> Costantino Cerbo wrote: > Hallo, > > there is a public avaliable set of testfiles for the 'file' utility? > How do we uncover if changes in the source code lead to unintended > consequences or bugs? > Or are there some unit tests? > > Thanks in advance for the answer, > Costantino > > The OpenBSD project has a small set of basic tests. We could look at backporting this into the file repository. For now you can get it from OpenBSD's CVS (e.g. cvsweb, see their site, I think it's in regress/usr.bin/file/* From christos at zoulas.com Thu Nov 12 20:06:08 2009 From: christos at zoulas.com (Christos Zoulas) Date: Thu, 12 Nov 2009 13:06:08 -0500 Subject: Set of testfiles for Regression testing? In-Reply-To: <6fbbec30911110747s726fc48ew5294edadc48c2897@mail.gmail.com> from Costantino Cerbo (Nov 11, 4:47pm) Message-ID: <20091112180608.DA6555654E@rebar.astron.com> On Nov 11, 4:47pm, c.cerbo at gmail.com (Costantino Cerbo) wrote: -- Subject: Set of testfiles for Regression testing? | Hallo, | | there is a public avaliable set of testfiles for the 'file' utility? | How do we uncover if changes in the source code lead to unintended | consequences or bugs? | Or are there some unit tests? | | Thanks in advance for the answer, | Costantino Sorry there are none. I wanted to create a website for people to submit signature files for testing and then write an automatic testsuite perhaps based on autotest for it. christos From tkantani at gmail.com Fri Nov 13 02:40:29 2009 From: tkantani at gmail.com (Toshit Antani) Date: Thu, 12 Nov 2009 16:40:29 -0800 Subject: File command (magic_file api) crashes for specific file Message-ID: <af24dabd0911121640m37e02fdn12c4f4bee464b118@mail.gmail.com> Hello, I am using File command - version 5.03. I am using GnuWin32 port of file and also the magic1.dll. For a specific file, file.exe crashes. Using same file with magic1.dll (version:5.3.3414.16721), I notice that magic_file API crashes. Here is the link to download the file. http://rapidshare.com/files/306197890/MANIFEST.MF.html MD5: 7FB7A3F22F9B43C1622C197486E1C8C9 Thanks, Tk -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mx.gw.com/pipermail/file/attachments/20091112/bd5088a4/attachment.html> From c.cerbo at gmail.com Fri Nov 13 15:58:22 2009 From: c.cerbo at gmail.com (Costantino Cerbo) Date: Fri, 13 Nov 2009 14:58:22 +0100 Subject: Set of testfiles for Regression testing? Message-ID: <6fbbec30911130558m5664be05x98ed0835c173169a@mail.gmail.com> Thank you, Ian. I've found 23 testfiles in http://www.openbsd.org/cgi-bin/cvsweb/src/regress/usr.bin/file/ I also think, that it would be nice to put this into the file repository. Besides, if all the members of this mailing list cooperate, we can double or more the size of the test set relative fastly. By the way, there is a list of all file types, that the file utility is able to detect? Many Regards, Costantino > Date: Thu, 12 Nov 2009 11:10:56 -0500 > From: Ian Darwin <ian at darwinsys.com> > Subject: Re: Set of testfiles for Regression testing? > To: File Utility <file at mx.gw.com> > Message-ID: <4AFC3390.2010807 at darwinsys.com> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Costantino Cerbo wrote: >> Hallo, >> >> there is a public avaliable set of testfiles for the 'file' utility? >> How do we uncover if changes in the source code lead to unintended >> consequences or bugs? >> Or are there some unit tests? >> >> Thanks in advance for the answer, >> Costantino >> >> > The OpenBSD project has a small set of basic tests. We could look at > backporting > this into the file repository. For now you can get it from OpenBSD's CVS > (e.g. cvsweb, > see their site, I think it's in regress/usr.bin/file/* > > > > ------------------------------ > > Message: 4 > Date: Thu, 12 Nov 2009 13:06:08 -0500 > From: christos at zoulas.com (Christos Zoulas) > Subject: Re: Set of testfiles for Regression testing? > To: File Utility <file at mx.gw.com> > Message-ID: <20091112180608.DA6555654E at rebar.astron.com> > > On Nov 11, ?4:47pm, c.cerbo at gmail.com (Costantino Cerbo) wrote: > -- Subject: Set of testfiles for Regression testing? > > | Hallo, > | > | there is a public avaliable set of testfiles for the 'file' utility? > | How do we uncover if changes in the source code lead to unintended > | consequences or bugs? > | Or are there some unit tests? > | > | Thanks in advance for the answer, > | Costantino > > Sorry there are none. I wanted to create a website for people to submit > signature files for testing and then write an automatic testsuite perhaps > based on autotest for it. > > christos From ian at darwinsys.com Fri Nov 13 17:32:28 2009 From: ian at darwinsys.com (Ian Darwin) Date: Fri, 13 Nov 2009 10:32:28 -0500 Subject: Set of testfiles for Regression testing? In-Reply-To: <6fbbec30911130558m5664be05x98ed0835c173169a@mail.gmail.com> References: <6fbbec30911130558m5664be05x98ed0835c173169a@mail.gmail.com> Message-ID: <4AFD7C0C.7060104@darwinsys.com> Costantino Cerbo wrote: > Thank you, Ian. > I've found 23 testfiles in > http://www.openbsd.org/cgi-bin/cvsweb/src/regress/usr.bin/file/ > > I also think, that it would be nice to put this into the file repository. > Besides, if all the members of this mailing list cooperate, we can > double or more the size of the test set relative fastly. > Size is not important. Coverage is. See below. > By the way, there is a list of all file types, that the file utility > is able to detect? > There is no separate list per se. There is this file called /etc/magic, which is built from a directory containing a couple of hundred individual fragments, each containing the test(s) for one type of file. It is therefore not practical to test them all, but it would be good if somebody would go through the OpenBSD tests and add a few tests that might exercise *kinds* of tests that are not present already. Looking at the magic file or the man page will reveal the different kinds of tests that are available. From christos at zoulas.com Fri Nov 13 18:50:36 2009 From: christos at zoulas.com (Christos Zoulas) Date: Fri, 13 Nov 2009 11:50:36 -0500 Subject: File command (magic_file api) crashes for specific file In-Reply-To: <af24dabd0911121640m37e02fdn12c4f4bee464b118@mail.gmail.com> from Toshit Antani (Nov 12, 4:40pm) Message-ID: <20091113165036.219D15654E@rebar.astron.com> On Nov 12, 4:40pm, tkantani at gmail.com (Toshit Antani) wrote: -- Subject: File command (magic_file api) crashes for specific file | Hello, | | I am using File command - version 5.03. I am using GnuWin32 port of file and | also the magic1.dll. | | For a specific file, file.exe crashes. Using same file with magic1.dll | (version:5.3.3414.16721), I notice that magic_file API crashes. | | Here is the link to download the file. | http://rapidshare.com/files/306197890/MANIFEST.MF.html | MD5: 7FB7A3F22F9B43C1622C197486E1C8C9 | | Thanks, | Tk For me it says: ./file -m ../magic/magic.mgc MANIFEST.MF MANIFEST.MF: Macintosh HFS Extended version 27747 data (spared blocks) (unclean) last mounted by: 'Z1kk', created: Wed Dec 7 20:26:53 1949, last modified: Wed May 5 11:46:06 1909, block size: 795703660, number of blocks: 2037264233, free blocks: 1853121906 christos From christos at zoulas.com Fri Nov 13 18:59:11 2009 From: christos at zoulas.com (Christos Zoulas) Date: Fri, 13 Nov 2009 11:59:11 -0500 Subject: Set of testfiles for Regression testing? In-Reply-To: <6fbbec30911130558m5664be05x98ed0835c173169a@mail.gmail.com> from Costantino Cerbo (Nov 13, 2:58pm) Message-ID: <20091113165911.60C925654E@rebar.astron.com> On Nov 13, 2:58pm, c.cerbo at gmail.com (Costantino Cerbo) wrote: -- Subject: Re: Set of testfiles for Regression testing? | Thank you, Ian. | I've found 23 testfiles in | http://www.openbsd.org/cgi-bin/cvsweb/src/regress/usr.bin/file/ | | I also think, that it would be nice to put this into the file repository. | Besides, if all the members of this mailing list cooperate, we can | double or more the size of the test set relative fastly. | By the way, there is a list of all file types, that the file utility | is able to detect? | | Many Regards, | Costantino I guess you can reverse engineer this from the list of magic entries. But it is fairly large! christos From tkantani at gmail.com Fri Nov 13 19:52:54 2009 From: tkantani at gmail.com (Toshit Antani) Date: Fri, 13 Nov 2009 09:52:54 -0800 Subject: File command (magic_file api) crashes for specific file In-Reply-To: <20091113165036.219D15654E@rebar.astron.com> References: <af24dabd0911121640m37e02fdn12c4f4bee464b118@mail.gmail.com> <20091113165036.219D15654E@rebar.astron.com> Message-ID: <af24dabd0911130952w2b3ec437se3304b6254f31dd1@mail.gmail.com> Hello Chirstos, I believe you are using linux/unix version of file command. I am using GnuWin32 port to Windows. I am also using file.exe, magic, and magic.mgc file that come with standard package. http://downloads.sourceforge.net/gnuwin32/file-5.03-setup.exe Is there a separate group that maintains "File for Windows"? Thanks, Tk On Fri, Nov 13, 2009 at 8:50 AM, Christos Zoulas <christos at zoulas.com>wrote: > On Nov 12, 4:40pm, tkantani at gmail.com (Toshit Antani) wrote: > -- Subject: File command (magic_file api) crashes for specific file > > | Hello, > | > | I am using File command - version 5.03. I am using GnuWin32 port of file > and > | also the magic1.dll. > | > | For a specific file, file.exe crashes. Using same file with magic1.dll > | (version:5.3.3414.16721), I notice that magic_file API crashes. > | > | Here is the link to download the file. > | http://rapidshare.com/files/306197890/MANIFEST.MF.html > | MD5: 7FB7A3F22F9B43C1622C197486E1C8C9 > | > | Thanks, > | Tk > > For me it says: > ./file -m ../magic/magic.mgc MANIFEST.MF > MANIFEST.MF: Macintosh HFS Extended version 27747 data (spared blocks) > (unclean) last mounted by: 'Z1kk', created: Wed Dec 7 20:26:53 1949, last > modified: Wed May 5 11:46:06 1909, block size: 795703660, number of blocks: > 2037264233, free blocks: 1853121906 > > christos > > _______________________________________________ > File mailing list > File at mx.gw.com > http://mx.gw.com/mailman/listinfo/file > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mx.gw.com/pipermail/file/attachments/20091113/599497c0/attachment.html> From christos at zoulas.com Fri Nov 13 22:10:16 2009 From: christos at zoulas.com (Christos Zoulas) Date: Fri, 13 Nov 2009 15:10:16 -0500 Subject: File command (magic_file api) crashes for specific file In-Reply-To: <af24dabd0911130952w2b3ec437se3304b6254f31dd1@mail.gmail.com> from Toshit Antani (Nov 13, 9:52am) Message-ID: <20091113201016.4FF9C5654F@rebar.astron.com> On Nov 13, 9:52am, tkantani at gmail.com (Toshit Antani) wrote: -- Subject: Re: File command (magic_file api) crashes for specific file | Hello Chirstos, | | I believe you are using linux/unix version of file command. | I am using GnuWin32 port to Windows. | I am also using file.exe, magic, and magic.mgc file that come with standard | package. | http://downloads.sourceforge.net/gnuwin32/file-5.03-setup.exe | | Is there a separate group that maintains "File for Windows"? Not really. But can you run file in gdb and find where it core-dumps? christos From tkantani at gmail.com Mon Nov 16 22:35:06 2009 From: tkantani at gmail.com (Toshit Antani) Date: Mon, 16 Nov 2009 12:35:06 -0800 Subject: File command (magic_file api) crashes for specific file In-Reply-To: <20091113201016.4FF9C5654F@rebar.astron.com> References: <af24dabd0911130952w2b3ec437se3304b6254f31dd1@mail.gmail.com> <20091113201016.4FF9C5654F@rebar.astron.com> Message-ID: <af24dabd0911161235q61ccc11dn7824b7da97af7a12@mail.gmail.com> Hello Christos, I finally got chance to collect the information from gdb. #0 0x77c47879 in strcspn () from /cygdrive/c/WINDOWS/system32/msvcrt.dll #1 0x6998dec8 in magic_errno () from /cygdrive/c/Program Files/GnuWin32/bin/magic1.dll #2 0x00000000 in ?? () Hope this is of use. If there is way to collect more data, let me know and I can try that. Thanks, Tk On Fri, Nov 13, 2009 at 12:10 PM, Christos Zoulas <christos at zoulas.com>wrote: > On Nov 13, 9:52am, tkantani at gmail.com (Toshit Antani) wrote: > -- Subject: Re: File command (magic_file api) crashes for specific file > > | Hello Chirstos, > | > | I believe you are using linux/unix version of file command. > | I am using GnuWin32 port to Windows. > | I am also using file.exe, magic, and magic.mgc file that come with > standard > | package. > | http://downloads.sourceforge.net/gnuwin32/file-5.03-setup.exe > | > | Is there a separate group that maintains "File for Windows"? > > Not really. But can you run file in gdb and find where it core-dumps? > > christos > > _______________________________________________ > File mailing list > File at mx.gw.com > http://mx.gw.com/mailman/listinfo/file > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mx.gw.com/pipermail/file/attachments/20091116/62f085d4/attachment.html> From christos at zoulas.com Mon Nov 16 23:27:29 2009 From: christos at zoulas.com (Christos Zoulas) Date: Mon, 16 Nov 2009 16:27:29 -0500 Subject: File command (magic_file api) crashes for specific file In-Reply-To: <af24dabd0911161235q61ccc11dn7824b7da97af7a12@mail.gmail.com> from Toshit Antani (Nov 16, 12:35pm) Message-ID: <20091116212729.765055654F@rebar.astron.com> On Nov 16, 12:35pm, tkantani at gmail.com (Toshit Antani) wrote: -- Subject: Re: File command (magic_file api) crashes for specific file | | --===============0690181944== | Content-Type: multipart/alternative; boundary=000e0cd1536cecd3f2047882f158 | | | --000e0cd1536cecd3f2047882f158 | Content-Type: text/plain; charset=ISO-8859-1 | | Hello Christos, | | I finally got chance to collect the information from gdb. | | #0 0x77c47879 in strcspn () from /cygdrive/c/WINDOWS/system32/msvcrt.dll | #1 0x6998dec8 in magic_errno () | from /cygdrive/c/Program Files/GnuWin32/bin/magic1.dll | #2 0x00000000 in ?? () | | Hope this is of use. | If there is way to collect more data, let me know and I can try that. | Try compiling with -g; it should give you line numbers... In the meantime, I am having our systems people figure out why our cygwin install is broken on vista64... christos From tkantani at gmail.com Wed Nov 18 21:22:07 2009 From: tkantani at gmail.com (Toshit Antani) Date: Wed, 18 Nov 2009 11:22:07 -0800 Subject: File command (magic_file api) crashes for specific file In-Reply-To: <20091116212729.765055654F@rebar.astron.com> References: <af24dabd0911161235q61ccc11dn7824b7da97af7a12@mail.gmail.com> <20091116212729.765055654F@rebar.astron.com> Message-ID: <af24dabd0911181122p66984974y3750322b9871090a@mail.gmail.com> Crashes when pp==NULL. In print.c (Line 234): pp[strcspn(pp, "\n")] = '\0'; Looking at code, strcspn is also used in similar fashion at other places in code. On Mon, Nov 16, 2009 at 1:27 PM, Christos Zoulas <christos at zoulas.com>wrote: > On Nov 16, 12:35pm, tkantani at gmail.com (Toshit Antani) wrote: > -- Subject: Re: File command (magic_file api) crashes for specific file > > | > | --===============0690181944== > | Content-Type: multipart/alternative; > boundary=000e0cd1536cecd3f2047882f158 > | > | > | --000e0cd1536cecd3f2047882f158 > | Content-Type: text/plain; charset=ISO-8859-1 > | > | Hello Christos, > | > | I finally got chance to collect the information from gdb. > | > | #0 0x77c47879 in strcspn () from /cygdrive/c/WINDOWS/system32/msvcrt.dll > | #1 0x6998dec8 in magic_errno () > | from /cygdrive/c/Program Files/GnuWin32/bin/magic1.dll > | #2 0x00000000 in ?? () > | > | Hope this is of use. > | If there is way to collect more data, let me know and I can try that. > | > > Try compiling with -g; it should give you line numbers... In the meantime, > I am having our systems people figure out why our cygwin install is broken > on vista64... > > christos > > _______________________________________________ > File mailing list > File at mx.gw.com > http://mx.gw.com/mailman/listinfo/file > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mx.gw.com/pipermail/file/attachments/20091118/c2330d68/attachment.html> From christos at zoulas.com Wed Nov 18 21:30:21 2009 From: christos at zoulas.com (Christos Zoulas) Date: Wed, 18 Nov 2009 14:30:21 -0500 Subject: File command (magic_file api) crashes for specific file In-Reply-To: <af24dabd0911181122p66984974y3750322b9871090a@mail.gmail.com> from Toshit Antani (Nov 18, 11:22am) Message-ID: <20091118193021.347FD5654E@rebar.astron.com> On Nov 18, 11:22am, tkantani at gmail.com (Toshit Antani) wrote: -- Subject: Re: File command (magic_file api) crashes for specific file | Crashes when pp==NULL. | | In print.c (Line 234): pp[strcspn(pp, "\n")] = '\0'; | | Looking at code, strcspn is also used in similar fashion at other places in | code. But neither asctime() or ctime() are allowed to return NULL. Can you add some printfs and see what's going on? christos From mdorey at bluearc.com Wed Nov 18 22:30:57 2009 From: mdorey at bluearc.com (Martin Dorey) Date: Wed, 18 Nov 2009 12:30:57 -0800 Subject: File command (magic_file api) crashes for specific file In-Reply-To: <20091118193021.347FD5654E@rebar.astron.com> References: <af24dabd0911181122p66984974y3750322b9871090a@mail.gmail.com> from Toshit Antani (Nov 18, 11:22am) <20091118193021.347FD5654E@rebar.astron.com> Message-ID: <54A098E33E92A04EB0DD9A2E8B546CB05F8F1D49@us-ex-mbx1.terastack.bluearc.com> > But neither asctime() or ctime() are allowed to return NULL. I used to assume that. I think I was wrong. For an existence proof that, in practice, ctime may return NULL, try this on eg an AMD64 Linux box - a platform where time_t is 64 bits: #include <limits.h> #include <stdio.h> #include <stdlib.h> #include <time.h> int main(int, const char** argValues) { ++ argValues; time_t time = strtoll(*argValues, 0, 0); const char* str = ctime(&time); if (str != 0) { puts(str); } else { puts("(null)"); } } martind at whitewater:~/playpen$ make ctime && ./ctime 0x8000000000000000 g++ -W -Wall -pedantic -g -Wno-long-long ctime.cpp -o ctime (null) martind at whitewater:~/playpen$ Knock a couple of zeroes off to show that the program's sane and other similar tests: martind at whitewater:~/playpen$ make ctime && ./ctime 0x80000000000000 make: `ctime' is up to date. Sat Jun 12 22:26:08 1141709097 martind at whitewater:~/playpen$ make ctime && ./ctime `date +%s` make: `ctime' is up to date. Wed Nov 18 12:28:48 2009 martind at whitewater:~/playpen$ make ctime && ./ctime 0 make: `ctime' is up to date. Wed Dec 31 16:00:00 1969 martind at whitewater:~/playpen$ -----Original Message----- From: file-bounces at mx.gw.com [mailto:file-bounces at mx.gw.com] On Behalf Of Christos Zoulas Sent: Wednesday, November 18, 2009 11:30 To: File Utility Subject: Re: File command (magic_file api) crashes for specific file On Nov 18, 11:22am, tkantani at gmail.com (Toshit Antani) wrote: -- Subject: Re: File command (magic_file api) crashes for specific file | Crashes when pp==NULL. | | In print.c (Line 234): pp[strcspn(pp, "\n")] = '\0'; | | Looking at code, strcspn is also used in similar fashion at other places in | code. But neither asctime() or ctime() are allowed to return NULL. Can you add some printfs and see what's going on? christos _______________________________________________ File mailing list File at mx.gw.com http://mx.gw.com/mailman/listinfo/file From christos at zoulas.com Thu Nov 19 00:56:55 2009 From: christos at zoulas.com (Christos Zoulas) Date: Wed, 18 Nov 2009 17:56:55 -0500 Subject: File command (magic_file api) crashes for specific file In-Reply-To: <54A098E33E92A04EB0DD9A2E8B546CB05F8F1D49@us-ex-mbx1.terastack.bluearc.com> from Martin Dorey (Nov 18, 12:30pm) Message-ID: <20091118225655.133675654E@rebar.astron.com> On Nov 18, 12:30pm, mdorey at bluearc.com (Martin Dorey) wrote: -- Subject: RE: File command (magic_file api) crashes for specific file | martind at whitewater:~/playpen$ make ctime && ./ctime 0x8000000000000000 | g++ -W -Wall -pedantic -g -Wno-long-long ctime.cpp -o ctime | (null) | martind at whitewater:~/playpen$ mx4:~ [5:48pm] 2508>uname -a NetBSD mx4.twosigma.com 5.99.21 NetBSD 5.99.21 (TWOSIGMA) #0: Fri Oct 30 13:37:46 EDT 2009 christos at mx4.twosigma.com:/usr/src/sys/arch/amd64/compile/TWOSIGMA amd64 mx4:~ [5:48pm] 2509>./ctime 0x8000000000000000 Sun Dec 4 10:30:07 219250468 I would file a bug report against linux, and in the meantime I'll protect against NULL. christos From mdorey at bluearc.com Thu Nov 19 01:36:13 2009 From: mdorey at bluearc.com (Martin Dorey) Date: Wed, 18 Nov 2009 15:36:13 -0800 Subject: File command (magic_file api) crashes for specific file In-Reply-To: <20091118225655.133675654E@rebar.astron.com> References: <54A098E33E92A04EB0DD9A2E8B546CB05F8F1D49@us-ex-mbx1.terastack.bluearc.com> from Martin Dorey (Nov 18, 12:30pm) <20091118225655.133675654E@rebar.astron.com> Message-ID: <54A098E33E92A04EB0DD9A2E8B546CB05F8F1D92@us-ex-mbx1.terastack.bluearc.com> > mx4:~ [5:48pm] 2509>./ctime 0x8000000000000000 > Sun Dec 4 10:30:07 219250468 But that date is: martind at whitewater:~$ ~/playpen/ctime 0x1894a0000037af Tue Dec 4 10:30:07 219250468 martind at whitewater:~$ > I would file a bug report against linux 0x1894a0000037af has two fewer hex digits than 0x8000000000000000, so, when given the full LONG_LONG_MAX, BSD's libc is returning an answer that has the wrong year, probably due to a wrap-around. Which is worse - no answer or the wrong answer? http://www.opengroup.org/onlinepubs/000095399/functions/ctime.html documents a requirement to return NULL in the case of error but a careful re-reading shows that this requirement is marked as an extension to ISO C, as part of the "Thread-Safe Functions" extension. That, along with the paragraph breaks could be interpreted as meaning that it only applies to ctime_r, and not to ctime. My reading of ISO/IEC 9899 1999-12-01 ("ISO C") is that it makes no provision for error, which lends further weight to the idea that ctime should never return NULL, though ctime_r is clearly free to. Interesting! Thanks for making me check. > in the meantime I'll protect against NULL. An excellent choice, sir. -----Original Message----- From: file-bounces at mx.gw.com [mailto:file-bounces at mx.gw.com] On Behalf Of Christos Zoulas Sent: Wednesday, November 18, 2009 14:57 To: File Utility Subject: RE: File command (magic_file api) crashes for specific file On Nov 18, 12:30pm, mdorey at bluearc.com (Martin Dorey) wrote: -- Subject: RE: File command (magic_file api) crashes for specific file | martind at whitewater:~/playpen$ make ctime && ./ctime 0x8000000000000000 | g++ -W -Wall -pedantic -g -Wno-long-long ctime.cpp -o ctime | (null) | martind at whitewater:~/playpen$ mx4:~ [5:48pm] 2508>uname -a NetBSD mx4.twosigma.com 5.99.21 NetBSD 5.99.21 (TWOSIGMA) #0: Fri Oct 30 13:37:46 EDT 2009 christos at mx4.twosigma.com:/usr/src/sys/arch/amd64/compile/TWOSIGMA amd64 mx4:~ [5:48pm] 2509>./ctime 0x8000000000000000 Sun Dec 4 10:30:07 219250468 I would file a bug report against linux, and in the meantime I'll protect against NULL. christos _______________________________________________ File mailing list File at mx.gw.com http://mx.gw.com/mailman/listinfo/file From c.cerbo at gmail.com Fri Nov 20 01:13:25 2009 From: c.cerbo at gmail.com (Costantino Cerbo) Date: Fri, 20 Nov 2009 00:13:25 +0100 Subject: < and > in string tests in the magic file format Message-ID: <6fbbec30911191513q4243739asb1c130e6fcb08b99@mail.gmail.com> Hello, In the magic man page is written that: For string values, the string from the file must match the specified string. The operators =, < and > (but not &) can be applied to strings. But I don't understand the meaning of < and > in context with string. For example, how should I interpret this test for the .ogv video format? >>>>>>(84.b+120) string <20000508 (<beta1, prepublic) >>>>>>(84.b+120) string 20000508 (1.0 beta 1 or beta 2) >>>>>>(84.b+120) string >20000508 >>>>>>>(84.b+120) string <20001031 (beta2-3) or this other one: >>>>>>>>43 string <NO\ NAME \b, label: "%11.11s" >>>>>>>>43 string >NO\ NAME \b, label: "%11.11s" Thanks in advance for the reply, Costantino From christos at zoulas.com Fri Nov 20 01:19:24 2009 From: christos at zoulas.com (Christos Zoulas) Date: Thu, 19 Nov 2009 18:19:24 -0500 Subject: < and > in string tests in the magic file format In-Reply-To: <6fbbec30911191513q4243739asb1c130e6fcb08b99@mail.gmail.com> from Costantino Cerbo (Nov 20, 12:13am) Message-ID: <20091119231924.A12145654E@rebar.astron.com> On Nov 20, 12:13am, c.cerbo at gmail.com (Costantino Cerbo) wrote: -- Subject: < and > in string tests in the magic file format | Hello, | | In the magic man page is written that: | For string values, the string from the file must match | the specified string. The operators =, < and > (but not | &) can be applied to strings. | | But I don't understand the meaning of < and > in context with string. | For example, how should I interpret this test for the .ogv video format? | | >>>>>>(84.b+120) string <20000508 (<beta1, prepublic) | >>>>>>(84.b+120) string 20000508 (1.0 beta 1 or beta 2) | >>>>>>(84.b+120) string >20000508 | >>>>>>>(84.b+120) string <20001031 (beta2-3) | | | or this other one: | | >>>>>>>>43 string <NO\ NAME \b, label: "%11.11s" | >>>>>>>>43 string >NO\ NAME \b, label: "%11.11s" lexicographically compare string from file with string from magic. christos From dnovotny at redhat.com Mon Nov 30 13:51:05 2009 From: dnovotny at redhat.com (Daniel Novotny) Date: Mon, 30 Nov 2009 06:51:05 -0500 (EST) Subject: how much machine-dependent is the magic.mgc file? In-Reply-To: <1947473897.662371259581773735.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> Message-ID: <2008597905.662391259581865667.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> hello, after some more issues in Fedora development I found, that my solution is wrong: you cannot pass all the magic files on the commandline, because the "-m" switch applies only to one argument I did not realized this, because I patched Makefile.am, but for some reason it did not generated new Makefile.in (there was no "automake" in the build tree) and by accident the magic files generated by this "fixed" build were the same (this can happen sometimes even without the patch) after patching Makefile.in I found a way to fix this: if you cat all the magic files into one and compile this, you get compiled magic file with magic order, which is allways the same, thus allowing multilib i386 and x86_64 packages to share the same magic file cat $(MAGIC_FRAGMENT_DIR)/* > all-magic $(FILE-COMPILE) -C -m all-magic rm all-magic the patch is attached, perhaps it can be cleaned if you think of some apropriate variable name for the "all-magic" file best regards, Daniel Novotny ----- "Christos Zoulas" <christos at zoulas.com> wrote: > On Aug 25, 7:15am, dnovotny at redhat.com (Daniel Novotny) wrote: > -- Subject: Re: how much machine-dependent is the magic.mgc file? > > | > Sorting of the magic entries is different/broken? > | > > | > christos > | > > | > | in Makefile.am you have: > | > | $(FILE_COMPILE) -C -m $(MAGIC_FRAGMENT_DIR) > | > | this can produce different order on different environments > (filesystem etc.) > | I have change it to > | > | $(FILE_COMPILE) -C -m $(MAGIC_FRAGMENT_DIR)/* > | > | ensuring alphabetic order > | > | patch attached, > | > | Daniel Novotny, Red Hat inc. > | ------=_Part_6834_908321754.1251198928482 > | Content-Type: text/x-patch; name=file-5.03-multilib.patch > | Content-Transfer-Encoding: 7bit > | Content-Disposition: attachment; filename=file-5.03-multilib.patch > | > | diff -up file-5.03/magic/Makefile.am.multilib > file-5.03/magic/Makefile.am > | --- file-5.03/magic/Makefile.am.multilib 2009-08-25 > 12:45:46.000000000 +0200 > | +++ file-5.03/magic/Makefile.am 2009-08-25 12:50:09.000000000 +0200 > | @@ -234,5 +234,5 @@ FILE_COMPILE_DEP = $(FILE_COMPILE) > | endif > | > | ${MAGIC}: $(EXTRA_DIST) $(FILE_COMPILE_DEP) > | - $(FILE_COMPILE) -C -m $(MAGIC_FRAGMENT_DIR) > | + $(FILE_COMPILE) -C -m $(MAGIC_FRAGMENT_DIR)/* > | @mv $(MAGIC_FRAGMENT_BASE).mgc $@ > | > > Ok, I will do that for now! > > Thanks, > > christos > > _______________________________________________ > File mailing list > File at mx.gw.com > http://mx.gw.com/mailman/listinfo/file -------------- next part -------------- A non-text attachment was scrubbed... Name: file-5.03-multilib.patch Type: text/x-patch Size: 1192 bytes Desc: not available URL: <http://mx.gw.com/pipermail/file/attachments/20091130/3be0878a/attachment.bin> From christos at zoulas.com Mon Nov 30 18:37:17 2009 From: christos at zoulas.com (Christos Zoulas) Date: Mon, 30 Nov 2009 11:37:17 -0500 Subject: how much machine-dependent is the magic.mgc file? In-Reply-To: <2008597905.662391259581865667.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> from Daniel Novotny (Nov 30, 6:51am) Message-ID: <20091130163717.7652B5654E@rebar.astron.com> On Nov 30, 6:51am, dnovotny at redhat.com (Daniel Novotny) wrote: -- Subject: Re: how much machine-dependent is the magic.mgc file? | hello, | | after some more issues in Fedora development I found, that | my solution is wrong: you cannot pass all the magic | files on the commandline, because the "-m" switch | applies only to one argument | | I did not realized this, because I patched Makefile.am, | but for some reason it did not generated new Makefile.in | (there was no "automake" in the build tree) | and by accident the magic files generated by this "fixed" | build were the same (this can happen sometimes even | without the patch) | | after patching Makefile.in I found a way to fix this: | if you cat all the magic files into one and compile | this, you get compiled magic file with magic order, | which is allways the same, thus allowing multilib i386 and x86_64 | packages to share the same magic file | | cat $(MAGIC_FRAGMENT_DIR)/* > all-magic | $(FILE-COMPILE) -C -m all-magic | rm all-magic | | the patch is attached, perhaps it can be cleaned | if you think of some apropriate variable name for the "all-magic" | file Hmm, I went through that path before, and it led to problems (having the CVS directory in there for example). What we are trying to solve here is the order of inclusion, right? Then I think it is fixed already because in apprentice.c:1.156 we qsort() the directory entries. Isn't that good enough? christos From dnovotny at redhat.com Tue Dec 1 13:34:09 2009 From: dnovotny at redhat.com (Daniel Novotny) Date: Tue, 1 Dec 2009 06:34:09 -0500 (EST) Subject: how much machine-dependent is the magic.mgc file? In-Reply-To: <854687638.747211259667137159.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> Message-ID: <1917098842.747251259667249957.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> ----- "Christos Zoulas" <christos at zoulas.com> wrote: > Then I think it is fixed > already > because in apprentice.c:1.156 we qsort() the directory entries. Isn't > that good enough? > > christos oh, I see now, there's qsort() there... but nevertheless, the magic.mgc files for i686 and x86_64 are sometimes built different - I'll guess that between those entries that have the same weight the ordering is random >Hmm, I went through that path before, and it led to problems (having >the CVS directory in there for example). OK, apparently noone else seems to be hurt by this, so my patch will then stay as a temporary downstream-only solution (people here need to do multilib testing and this was blocking them, we have temporary build trees with no CVS directories inside) and meanwhile I can try to tweak the comparison function of the qsort() in the way the order will be more "defined" and let's see if I can come with some patch (some string I can strcmp() if the weight is the same comes in mind) best regards, Daniel Novotny, Red Hat inc. From christos at zoulas.com Tue Dec 1 15:42:23 2009 From: christos at zoulas.com (Christos Zoulas) Date: Tue, 1 Dec 2009 08:42:23 -0500 Subject: how much machine-dependent is the magic.mgc file? In-Reply-To: <1917098842.747251259667249957.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> from Daniel Novotny (Dec 1, 6:34am) Message-ID: <20091201134223.E4AE25654E@rebar.astron.com> On Dec 1, 6:34am, dnovotny at redhat.com (Daniel Novotny) wrote: -- Subject: Re: how much machine-dependent is the magic.mgc file? | | ----- "Christos Zoulas" <christos at zoulas.com> wrote: | | > Then I think it is fixed | > already | > because in apprentice.c:1.156 we qsort() the directory entries. Isn't | > that good enough? | > | > christos | oh, I see now, there's qsort() there... but nevertheless, | the magic.mgc files for i686 and x86_64 are sometimes | built different - I'll guess that between those entries | that have the same weight the ordering is random | | >Hmm, I went through that path before, and it led to problems (having | >the CVS directory in there for example). | OK, apparently noone else seems to be hurt by this, | so my patch will then stay as a temporary downstream-only | solution (people here need to do multilib testing | and this was blocking them, we have temporary | build trees with no CVS directories inside) and meanwhile I can | try to tweak the comparison function of the qsort() | in the way the order will be more "defined" | and let's see if I can come with some patch | (some string I can strcmp() if the weight is | the same comes in mind) But strcmp in a directory is deterministic (unless there are NLS settings that are different) since there cannot be equal entries... Something else must be going on. Are you sure the environment is scrubbed and the same in both builds? christos From dnovotny at redhat.com Tue Dec 1 16:18:06 2009 From: dnovotny at redhat.com (Daniel Novotny) Date: Tue, 1 Dec 2009 09:18:06 -0500 (EST) Subject: how much machine-dependent is the magic.mgc file? In-Reply-To: <20091201134223.E4AE25654E@rebar.astron.com> Message-ID: <639424779.756211259677086539.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> > | OK, apparently noone else seems to be hurt by this, > | so my patch will then stay as a temporary downstream-only > | solution (people here need to do multilib testing > | and this was blocking them, we have temporary > | build trees with no CVS directories inside) and meanwhile I can > | try to tweak the comparison function of the qsort() > | in the way the order will be more "defined" > | and let's see if I can come with some patch > | (some string I can strcmp() if the weight is > | the same comes in mind) > > But strcmp in a directory is deterministic (unless there are NLS > settings > that are different) since there cannot be equal entries... Something > else > must be going on. Are you sure the environment is scrubbed and the > same > in both builds? > > christos > currently there's no strcmp in apprentice.c, I tried to add something like this right now: diff -up file-5.03/src/apprentice.c.memcmp file-5.03/src/apprentice.c --- file-5.03/src/apprentice.c.memcmp 2009-12-01 15:00:48.000000000 +0100 +++ file-5.03/src/apprentice.c 2009-12-01 15:03:11.000000000 +0100 @@ -543,7 +543,7 @@ apprentice_sort(const void *a, const voi size_t sa = apprentice_magic_strength(ma->mp); size_t sb = apprentice_magic_strength(mb->mp); if (sa == sb) - return 0; + return memcmp(a, b, sizeof(struct magic_entry)); else if (sa > sb) return -1; else but the files were *also* different, so you are right, something else is going on here. For reference, those different files are stored at http://danielsoft.sweb.cz/f/m32b http://danielsoft.sweb.cz/f/m64b both function the same, but cmp(1) says they are different - Daniel From kimmo at suominen.com Tue Dec 1 16:11:55 2009 From: kimmo at suominen.com (Kimmo Suominen) Date: Tue, 1 Dec 2009 09:11:55 -0500 Subject: how much machine-dependent is the magic.mgc file? In-Reply-To: <20091201134223.E4AE25654E@rebar.astron.com> References: <1917098842.747251259667249957.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> <20091201134223.E4AE25654E@rebar.astron.com> Message-ID: <b32e77100912010611t76e971d8p8525a058866f4912@mail.gmail.com> I was suspecting NLS as well, so I checked the manual for strcmp, and it apparently should be ignoring your locale. There is strcoll for locale-based comparison. + Kimmo On Tue, Dec 1, 2009 at 08:42, Christos Zoulas <christos at zoulas.com> wrote: > On Dec 1, 6:34am, dnovotny at redhat.com (Daniel Novotny) wrote: > -- Subject: Re: how much machine-dependent is the magic.mgc file? > > | > | ----- "Christos Zoulas" <christos at zoulas.com> wrote: > | > | > Then I think it is fixed > | > already > | > because in apprentice.c:1.156 we qsort() the directory entries. Isn't > | > that good enough? > | > > | > christos > | oh, I see now, there's qsort() there... but nevertheless, > | the magic.mgc files for i686 and x86_64 are sometimes > | built different - I'll guess that between those entries > | that have the same weight the ordering is random > | > | >Hmm, I went through that path before, and it led to problems (having > | >the CVS directory in there for example). > | OK, apparently noone else seems to be hurt by this, > | so my patch will then stay as a temporary downstream-only > | solution (people here need to do multilib testing > | and this was blocking them, we have temporary > | build trees with no CVS directories inside) and meanwhile I can > | try to tweak the comparison function of the qsort() > | in the way the order will be more "defined" > | and let's see if I can come with some patch > | (some string I can strcmp() if the weight is > | the same comes in mind) > > But strcmp in a directory is deterministic (unless there are NLS settings > that are different) since there cannot be equal entries... Something else > must be going on. Are you sure the environment is scrubbed and the same > in both builds? > > christos > > _______________________________________________ > File mailing list > File at mx.gw.com > http://mx.gw.com/mailman/listinfo/file > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mx.gw.com/pipermail/file/attachments/20091201/97ff8328/attachment.html> From kimmo at suominen.com Tue Dec 1 16:22:21 2009 From: kimmo at suominen.com (Kimmo Suominen) Date: Tue, 1 Dec 2009 09:22:21 -0500 Subject: how much machine-dependent is the magic.mgc file? In-Reply-To: <639424779.756211259677086539.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> References: <20091201134223.E4AE25654E@rebar.astron.com> <639424779.756211259677086539.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> Message-ID: <b32e77100912010622s2fda55cci333cff32518a8989@mail.gmail.com> There are 2 places using qsort in (at least in the CVS version of) src/apprentice.c. One calls cmpstrp (which calls strcmp) and the other calls apprentice_sort for sorting. + Kimmo On Tue, Dec 1, 2009 at 09:18, Daniel Novotny <dnovotny at redhat.com> wrote: > > > | OK, apparently noone else seems to be hurt by this, > > | so my patch will then stay as a temporary downstream-only > > | solution (people here need to do multilib testing > > | and this was blocking them, we have temporary > > | build trees with no CVS directories inside) and meanwhile I can > > | try to tweak the comparison function of the qsort() > > | in the way the order will be more "defined" > > | and let's see if I can come with some patch > > | (some string I can strcmp() if the weight is > > | the same comes in mind) > > > > But strcmp in a directory is deterministic (unless there are NLS > > settings > > that are different) since there cannot be equal entries... Something > > else > > must be going on. Are you sure the environment is scrubbed and the > > same > > in both builds? > > > > christos > > > > currently there's no strcmp in apprentice.c, I tried to > add something like this right now: > > diff -up file-5.03/src/apprentice.c.memcmp file-5.03/src/apprentice.c > --- file-5.03/src/apprentice.c.memcmp 2009-12-01 15:00:48.000000000 +0100 > +++ file-5.03/src/apprentice.c 2009-12-01 15:03:11.000000000 +0100 > @@ -543,7 +543,7 @@ apprentice_sort(const void *a, const voi > size_t sa = apprentice_magic_strength(ma->mp); > size_t sb = apprentice_magic_strength(mb->mp); > if (sa == sb) > - return 0; > + return memcmp(a, b, sizeof(struct magic_entry)); > else if (sa > sb) > return -1; > else > > but the files were *also* different, so you are right, something > else is going on here. For reference, those different files > are stored at > http://danielsoft.sweb.cz/f/m32b > http://danielsoft.sweb.cz/f/m64b > both function the same, but cmp(1) says they are different > > - Daniel > > _______________________________________________ > File mailing list > File at mx.gw.com > http://mx.gw.com/mailman/listinfo/file > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mx.gw.com/pipermail/file/attachments/20091201/902f4105/attachment.html> From martin.hamrle at gmail.com Thu Dec 3 23:35:02 2009 From: martin.hamrle at gmail.com (Martin Hamrle) Date: Thu, 3 Dec 2009 22:35:02 +0100 Subject: new magic entry for multivolume zip archive Message-ID: <38e3ae9b0912031335t2eb2d622v5264b4495aac9bc9@mail.gmail.com> Hello, I've attached a patch for recognize multivolume zip archive. Regards, Martin Hamrle -------------- next part -------------- A non-text attachment was scrubbed... Name: file-5.03-zipmultivolume.diff Type: text/x-patch Size: 525 bytes Desc: not available URL: <http://mx.gw.com/pipermail/file/attachments/20091203/62b12bca/attachment.bin> From christos at zoulas.com Fri Dec 4 17:00:59 2009 From: christos at zoulas.com (Christos Zoulas) Date: Fri, 4 Dec 2009 10:00:59 -0500 Subject: new magic entry for multivolume zip archive In-Reply-To: <38e3ae9b0912031335t2eb2d622v5264b4495aac9bc9@mail.gmail.com> from Martin Hamrle (Dec 3, 10:35pm) Message-ID: <20091204150059.996A65654E@rebar.astron.com> On Dec 3, 10:35pm, martin.hamrle at gmail.com (Martin Hamrle) wrote: -- Subject: new magic entry for multivolume zip archive | Hello, | | I've attached a patch for recognize multivolume zip archive. | | Regards, | | Martin Hamrle Thanks a lot, just added. christos From dardoguidobono at gmail.com Fri Dec 11 23:27:20 2009 From: dardoguidobono at gmail.com (Dardo Guidobono) Date: Fri, 11 Dec 2009 18:27:20 -0300 Subject: crash with specific file in windows (mingw) Message-ID: <74f8ce50912111327q76b96fedw429870c5f5c51f5e@mail.gmail.com> i created some tool that iterates over all the files in the fs and using file search for its mimetype. in a word document file cmd crashes. (downloaded from gnuwin32, version 5.03) the file cmd works fine in macosx. so i compiled 5.03 version in mingw and using gdb I found that in: readcdf.c ---------------------------------- c = ctime(&ts.tv_sec); if ((ec = strchr(c, '\n')) != NULL) *ec = '\0'; ---------------------------------- ctime CAN return null so i changed with this. ------------------------------------- if (c == NULL) { ec=""; } else { if ((ec = strchr(c, '\n')) != NULL) *ec = '\0'; } --------------------- and i got working. i saw a lot of ctime calls not protected. Thanks, Dardo Guidobono -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mx.gw.com/pipermail/file/attachments/20091211/2a2c23aa/attachment.html> From quel at quelrod.net Fri Dec 25 05:38:43 2009 From: quel at quelrod.net (James Nobis) Date: Thu, 24 Dec 2009 21:38:43 -0600 Subject: disk image magic additions Message-ID: <4B3433C3.5040101@quelrod.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Attached are additions for .vdi (Sun Virtualbox) and .vhd (Microsoft Virtual PC.) I believe these both should go in msdos based on similar entries already there. I tested them on linux x86 and linux x86-64. I'm also curious if anyone has any specifications or documentation for this new zip format that only winzip seems able to handle. It shows up with a .zip and additional .z01, .z02, etc. files. My first encounter of it was via http://nvd.nist.gov/fdcc/download_fdcc.cfm downloading vhds. James -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iQIcBAEBCgAGBQJLNDPDAAoJEEMnv34Ar9phqOIQAJTTBG5qZctfO9EOBrrOtQNo O4Aw5vPp1EDYeoeDfHTKhgC4X6KXQXYxAiYNZ6pIR4OiesimNBgr2vDititLOF93 TgDtaKIa+QioBBByUpadje6RoBNNrGLpArwYrNIqrsIKR1WsvSCbRPyD+3fCae4V +RbRWalxC6ku9UWnVfaX/XfGrfyc1B4+u/GkX4RJ8tlzrgbOF2z3zUoiv3KOlJJ4 GJbZGWwANVUDiXTfDgyUFsn9Vz/FoUdmNMffk/7fcTZtawtjKpuFPoQ+vZ4a/nYu CJJFbq4OOl0uqQjDIOsAmoliEtOa2RJDSgFTRpFWdqkEhOg+IJTg5i+muK2OJ3bN 1DujjsXxwouqmgEChuh0bOhHSdWRbb8zkYyjxXzuO1oHkQ1ydbXS1eriLlaZPkka iKiZGhaLPmA6qTCpCt7xv1xqo7Ryp9PaTIwyq4jMdqFRHj1LGoW1lzVn+MAkX8fJ x9iuSzDAmdf7foD5gq/ztNMNSCuzN9YV5VylGfOOIQHUd5hNJBjaVz78UUaZ8A1H BHtM5opeyWOeadhLg4CGQw2a4+4aYTWOhSywiuVBrXP9EHuS66+p7vmGVcavchNr nPhLscWsbIT7TXu8c3WqQDmGQyQBFVW5m573akPNZVl53Mciziy+L3yDU7hjrYXs YuQZk7b2Sv09ZrTKKUDs =N47w -----END PGP SIGNATURE----- -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: file_additions.txt URL: <http://mx.gw.com/pipermail/file/attachments/20091224/3921e4d9/attachment.txt> From christos at zoulas.com Fri Dec 25 18:03:53 2009 From: christos at zoulas.com (Christos Zoulas) Date: Fri, 25 Dec 2009 11:03:53 -0500 Subject: disk image magic additions In-Reply-To: <4B3433C3.5040101@quelrod.net> from James Nobis (Dec 24, 9:38pm) Message-ID: <20091225160353.7C23D5654E@rebar.astron.com> On Dec 24, 9:38pm, quel at quelrod.net (James Nobis) wrote: -- Subject: disk image magic additions | Attached are additions for .vdi (Sun Virtualbox) and .vhd (Microsoft | Virtual PC.) I believe these both should go in msdos based on similar | entries already there. I tested them on linux x86 and linux x86-64. Thanks. | I'm also curious if anyone has any specifications or documentation for | this new zip format that only winzip seems able to handle. It shows up | with a .zip and additional .z01, .z02, etc. files. My first encounter | of it was via http://nvd.nist.gov/fdcc/download_fdcc.cfm downloading vhds. I have not seen that. christos From breadncup at gmail.com Sun Dec 27 07:11:02 2009 From: breadncup at gmail.com (Daniel (Youngwhan) Song) Date: Sat, 26 Dec 2009 21:11:02 -0800 Subject: printing mime type. Message-ID: <549ce3f90912262111x3dd5ceb2te283032cdd520af6@mail.gmail.com> Hi, I dumped mysql file by "mysqldump DB > db1.sql", and tried to see the encoding type, so I did "file -i db1.sql" in Ubuntu 9.10 and Cygwin 1.5.25, and found that it gives different output. Ubuntu 9.10 gives "UTF-8", but Cygwin gives "charset=binary". The "file" in Ubuntu 9.10 is installed when the Ubuntu is installed, and the "file" in Cygwin is installed by myself after downloading it from ftp://ftp.astron.com/pub/file/. Both versions are as same as 5.03. My questions are 1. Why the result in both cases is different? 2. If the installed "file" in Cygwin gives wrong information, how I can fix the problem? Any help would be appreciated. Thank you, Best Regards, Daniel (Youngwhan) Song -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mx.gw.com/pipermail/file/attachments/20091226/bdbef6d9/attachment.html> From breadncup at gmail.com Sun Dec 27 22:52:18 2009 From: breadncup at gmail.com (Daniel (Youngwhan) Song) Date: Sun, 27 Dec 2009 12:52:18 -0800 Subject: printing mime type Message-ID: <549ce3f90912271252nf8a176eg9c500111b14ade2@mail.gmail.com> Hi, I dumped mysql file by "mysqldump DB > db1.sql", and tried to see the encoding type, so I did "file -i db1.sql" in Ubuntu 9.10 and Cygwin 1.5.25, and found that it gives different output. Ubuntu 9.10 gives "UTF-8", but Cygwin gives "charset=binary". The "file" in Ubuntu 9.10 is installed when the Ubuntu is installed, and the "file" in Cygwin is installed by myself after downloading it from ftp://ftp.astron.com/pub/file/ . Both versions are as same as 5.03. My questions are 1. Why the result in both cases is different? 2. If the installed "file" in Cygwin gives wrong information, how I can fix the problem? Any help would be appreciated. Thank you, Best Regards, Daniel (Youngwhan) Song -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mx.gw.com/pipermail/file/attachments/20091227/8b86bbf7/attachment.html> From quel at quelrod.net Mon Dec 28 19:52:10 2009 From: quel at quelrod.net (James Nobis) Date: Mon, 28 Dec 2009 11:52:10 -0600 Subject: Password Safe (psafe3) magic addition Message-ID: <4B38F04A.8070504@quelrod.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Magic file addition for psafe3 files. It should probably go in msdos though there are native *nix applications as well. Maybe we need a separate file in the Magdir for some cross-platform file formats like virtualization images. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iQIcBAEBCgAGBQJLOPBJAAoJEEMnv34Ar9ph/44P/0x5mR5ejhFuoR2RaUrq/ppT JlwigQXwAFbGwSegjT7vwxTJQI62uNfCaMslSVDoKJ5Yopobu6o7xH2mQ2jsCe4i cqZYrwYW7y7loXP1aLDfFpJPseIzsBtJnJrdezmcGvW3G8tV4vtEzYQggeF8Kfzi l3BG+4GHQu3ce6BVSppBN1Lxfu0oQiQt2Z1nWIiwoIhSImodu8uy2DzxZ4XKGKRH REVQ61kH1holYJPEKpvijqu15gNOnBHZSWBvpIgZoXPEDPzoIZfaQLIzoSSKffLY cT6dK1GDn8d6KYfSlpwc1qXJTISjq13xpO4BwKP6kd2NKJN1QLrmcW136JNczKO5 A7btAtQxBI91IJFWFhSvcvqjuOv1P6X43ICMdotfEWS+7VNORBkaNwRWQ2s5kPE1 ul7i9OxNT/815++crRty/uZ9e3MwM2TfulYNj4tOt7/W4PnIW3kg0pCmsfYXbAvz g8v8RFZtdn37H6rp1EJY/9eVIyBBBLFX5RHPqBHPqpGDQkudYOLFq3fkLw6u+io6 ML4OugEa9tYhDmAiZ/nahnPlEbYHA32pPpc7MgfHbL4WYLOe9IFKiTc3gRoC/aQy WQpbXIpilwuusUiYFch7vTJbvWCNixbsoYdbDWG8E3L2B3wuYaMJvQaAcZuT9QCD RcFvvXQJWxT/2wDLPilr =vv9j -----END PGP SIGNATURE----- -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: psafe3_additions.txt URL: <http://mx.gw.com/pipermail/file/attachments/20091228/f23f1dd4/attachment.txt> From ian at darwinsys.com Mon Dec 28 20:16:05 2009 From: ian at darwinsys.com (Ian Darwin) Date: Mon, 28 Dec 2009 13:16:05 -0500 Subject: Password Safe (psafe3) magic addition In-Reply-To: <4B38F04A.8070504@quelrod.net> References: <4B38F04A.8070504@quelrod.net> Message-ID: <4B38F5E5.9080506@darwinsys.com> James Nobis wrote: > Magic file addition for psafe3 files. It should probably go in msdos Why does everybody say that about anything that might run on MS-Windows? My original intent was that msdos would contain files provided as part of that operating system, not that it would become a 1,000-line catch-all for anything that might conceivably show up on DOS and its derivatives. So yes: > Maybe we need a separate file in the Magdir for some cross-platform file > formats like virtualization images. > We need separate files. I'm not sure how we get from psafe to virtualization, but I agree these should not be under msdos. --Ian From christos at zoulas.com Mon Dec 28 21:38:48 2009 From: christos at zoulas.com (Christos Zoulas) Date: Mon, 28 Dec 2009 14:38:48 -0500 Subject: Password Safe (psafe3) magic addition In-Reply-To: <4B38F5E5.9080506@darwinsys.com> from Ian Darwin (Dec 28, 1:16pm) Message-ID: <20091228193848.0F73B5654E@rebar.astron.com> On Dec 28, 1:16pm, ian at darwinsys.com (Ian Darwin) wrote: -- Subject: Re: Password Safe (psafe3) magic addition | James Nobis wrote: | > Magic file addition for psafe3 files. It should probably go in msdos | Why does everybody say that about anything that might run on MS-Windows? | My original intent was | that msdos would contain files provided as part of that operating | system, not that it would become a | 1,000-line catch-all for anything that might conceivably show up on DOS | and its derivatives. So yes: | > Maybe we need a separate file in the Magdir for some cross-platform file | > formats like virtualization images. | > | We need separate files. I'm not sure how we get from psafe to | virtualization, but I agree these should | not be under msdos. Yes, we should have separate files, the Magdir needs some housekeeping... christos From quel at quelrod.net Tue Dec 29 01:40:17 2009 From: quel at quelrod.net (James Nobis) Date: Mon, 28 Dec 2009 17:40:17 -0600 Subject: Password Safe (psafe3) magic addition In-Reply-To: <4B38F5E5.9080506@darwinsys.com> References: <4B38F04A.8070504@quelrod.net> <4B38F5E5.9080506@darwinsys.com> Message-ID: <4B3941E1.4000302@quelrod.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Ian, > We need separate files. I'm not sure how we get from psafe to > virtualization, but I agree these should > not be under msdos. I mentioned virtualization because I emailed a couple additions to the list just a few days ago and all the vmware, qemu, etc. images were stored in msdos. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iQIcBAEBCgAGBQJLOUHhAAoJEEMnv34Ar9phV50P/3RVxfO4sFvBYetDntZY/VzC RYHU55spqH7RqJIhudzLJP4rVuivwvBTWtbgGIZ/GrAKNe61lU/NPFsO96rCkVFV G4SppVqlKuslMP4fWlo8U80Xej9A2gtYD6dtaYFS3ZZZt+HpklTJ3TsO77VYuWRD Fo/ur8j1XA7EaL8pftCsoEPUTi4oQafj8ELdhZknO2UItjAsjkakaBRa6KXtFhjf 8rDs6Ymw2cG+oC2qHH0ZhB6kKwsXNEC3xUrrz0qyQUJn2j7gTu1sUJOr9teUA7C6 buOsCO7+u6aolc0nbhlGc5+9dZtXz5iDXd35UNlbk4mmczmtE3qngaX0e4jWJ1+s pAmZyq/DLv101tr3L7sR1bMI4QsbCRXhIerS/2bNeNO05aMpSnpateL/kIhT4Bdm OvgSp1dH6nnx8zibx0aLQj0doqmSoFiB1Wq94ZIZk91aiUwj5esBjdR/HbUUB747 gNqk7daGxFvwBxcXDJ7cUbnFQm3bpOYhNjhLubGQYdrxoVxt9PFyL6obqOfnvT36 M+p6w9f+bTqzennImdHcGxd78erAjVTYretBqdTunJcKsehcKFMJ/s9DVC80X13/ 9UIjUf1rFS4VnZGquw/XlDFPwGfgTV/08BZDhRLuuZL8eutWJhZEUJWPPRS90qGR DS4ji6zxZH/zSdLVZK4D =2fZb -----END PGP SIGNATURE-----