From endler at eecs.tulane.edu Sat Mar 5 00:36:55 2005 From: endler at eecs.tulane.edu (David Endler) Date: Sat Mar 5 00:36:55 2005 Subject: Fwd: security vulnerability in File Message-ID: <200302271513.JAA23277@juno.eecs.tulane.edu> > Actually, the program headers can be taken advantage of the same way > I have a more complete patch Christos, Ian, etc al, Can you send the final patch along so we can plan a public advisory and inform the OS vendors? Thanks, -dave From christos at zoulas.com Wed Feb 9 14:43:59 2005 From: christos at zoulas.com (Christos Zoulas) Date: Sat Mar 5 00:36:57 2005 Subject: file-4.13 is now available Message-ID: <20050209194359.7C7102AC92@beowulf.gw.com> Hello, file-4.13 is now available. It contains a few magic changes, plus the following from the changelog: 2005-01-12 00:00 Stepan Kasal * src/ascmagic.c (file_ascmagic): Fix three bugs about text files: If a CRLF text file happens to have CR at offset HOWMANY - 1 (currently 0xffff), it should not be counted as CR line terminator. If a line has length exactly MAXLINELEN, it should not yet be treated as a ``very long line'', as MAXLINELEN is ``longest sane line length''. With CRLF, the line length was not computed correctly, and even lines of length MAXLINELEN - 1 were treated as ``very long''. 2004-12-07 14:15 Christos Zoulas * bzip2 needs a lot of input buffer space on some files before it can begin uncompressing. This makes file -z fail on some bz2 files. Fix it by giving it a copy of the file descriptor to read as much as it wants if we have access to it. It is available on: ftp://ftp.astron.com/pub/file/file-4.13.tar.gz Enjoy, christos PS: Don't let the 13 scare you ;) From christos at zoulas.com Sat Jun 25 19:10:57 2005 From: christos at zoulas.com (Christos Zoulas) Date: Sat Jun 25 19:36:12 2005 Subject: file-4.14 is now available Message-ID: <20050625161057.5CB98564D0@rebar.astron.com> >From ftp.astron.com:/pub/file. This version adds DragonFly support for ELF notes, makes the read buffer dynamically allocated, and increases its size to 256K. As always, lots of magic changes. christos From ian at darwinsys.com Wed Aug 3 22:59:41 2005 From: ian at darwinsys.com (Ian Darwin) Date: Wed Aug 3 23:20:18 2005 Subject: Thinking out loud (re: file(1) plugins In-Reply-To: <20050625161057.5CB98564D0@rebar.astron.com> References: <20050625161057.5CB98564D0@rebar.astron.com> Message-ID: <42F1222D.9080908@darwinsys.com> I had this juxtaposition just now: reading this blog entry... http://weblogs.java.net/blog/jive/archive/2005/07/factors_of_succ.html and, thinking about one of my clients that has a high-level file format and API that is based on top of both NetCDF and HDF (both listed in magdir/images). It would probably not be worth this group's while to write (or maintain) custom code for identifying such files (you really have to dig to find out whether it's a "MINC" file or "just" a CDF or HDF file), but it might be worth the clients' time to write a MINC "plugin" for the file command. That is, if we supported a plugin API. In Java it's practically trivial to generate a plugin API; in C you have to work a bit harder but it's not unreasonable. It would want to be something like this, called from file_buffer in funcs.c: /* Returns 1 if identified and printed, 0 otherwise */ extern int fileplugin(int fd, ...); where any number of such plugins would be dlopen'd from .so files in either /usr/local/libexec/file/ or ~/file/. From kim at tac.nyc.ny.us Thu Aug 4 13:15:00 2005 From: kim at tac.nyc.ny.us (Kimmo Suominen) Date: Thu Aug 4 13:15:17 2005 Subject: Thinking out loud (re: file(1) plugins In-Reply-To: <42F1222D.9080908@darwinsys.com> References: <20050625161057.5CB98564D0@rebar.astron.com> <42F1222D.9080908@darwinsys.com> Message-ID: <20050804101500.GB1970@kimmo.suominen.com> On Wed, Aug 03, 2005 at 03:59:41PM -0400, Ian Darwin wrote: > where any number of such plugins would be dlopen'd from .so files in > either /usr/local/libexec/file/ or ~/file/. Opening files from $HOME (or any environment variable) is such a can of worms that is it really worth it? Using a hardcoded directory for the .so files is much more straight forward when it comes to security. Regards, + Kimmo -- Kimmo Suominen From christos at zoulas.com Thu Aug 4 15:54:06 2005 From: christos at zoulas.com (Christos Zoulas) Date: Thu Aug 4 16:22:45 2005 Subject: Thinking out loud (re: file(1) plugins In-Reply-To: <20050804101500.GB1970@kimmo.suominen.com> from Kimmo Suominen (Aug 4, 1:15pm) Message-ID: <20050804125406.2340856527@rebar.astron.com> On Aug 4, 1:15pm, kim@tac.nyc.ny.us (Kimmo Suominen) wrote: -- Subject: Re: Thinking out loud (re: file(1) plugins | Opening files from $HOME (or any environment variable) is such a can of | worms that is it really worth it? | | Using a hardcoded directory for the .so files is much more straight | forward when it comes to security. I agree with kim here. The file plugin idea is nice though. BTW, the next thing I want to experiment with, is sorting the file magic entries. What I am considering is sorting them by magic "length" as the primary key and offset as the secondary. christos From ian at darwinsys.com Thu Aug 4 17:08:44 2005 From: ian at darwinsys.com (Ian Darwin) Date: Thu Aug 4 17:09:02 2005 Subject: Thinking out loud (re: file(1) plugins In-Reply-To: <20050804125406.2340856527@rebar.astron.com> References: <20050804125406.2340856527@rebar.astron.com> Message-ID: <42F2216C.4070209@darwinsys.com> Christos Zoulas wrote: >| Using a hardcoded directory for the .so files is much more straight >| forward when it comes to security. > >I agree with kim here. > Me three. /usr/{local/}libexec/file, probably. >The file plugin idea is nice though. > Thanks. >BTW, the >next thing I want to experiment with, is sorting the file magic entries. > >What I am considering is sorting them by magic "length" as the primary >key and offset as the secondary. > > Sounds good. From christos at zoulas.com Thu Aug 18 18:59:14 2005 From: christos at zoulas.com (Christos Zoulas) Date: Thu Aug 18 18:59:36 2005 Subject: file-4.15 is now available Message-ID: <20050818155914.21F7C56527@rebar.astron.com> >From ftp://ftp.astron.com/pub/file/file-4.15.tar.gz 2005-08-18 09:53 Christos Zoulas * Remove erroreous mention of /etc/magic in the file man page This is gentoo bug 101639. (Mike Frysinger) * Cross-compile support and detection (Mike Frysinger) 2005-08-12 10:17 Christos Zoulas * Add -h flag and dereference symlinks if POSIXLY_CORRECT is set. 2005-07-29 13:57 Christos Zoulas * Avoid search and regex buffer overflows (Kelledin) 2005-07-12 11:48 Christos Zoulas * Provide stub implementations for {v,}nsprintf() for older OS's that don't have them. * Change mbstate_t autoconf detection macro from AC_MBSTATE_T to AC_TYPE_MBSTATE_T. And as always many magic additions! Enjoy, christos From rvokal at redhat.com Wed Sep 21 15:11:28 2005 From: rvokal at redhat.com (Radek =?ISO-8859-1?Q?Vok=E1l?=) Date: Wed Sep 21 15:12:00 2005 Subject: file and berkley db Message-ID: <1127304688.12888.3.camel@localhost.localdomain> File doesn't recognize Berkley db file Steps to Reproduce: 1. echo "one\ntwo"|db_load -T -t hash /tmp/test.db 2. file /tmp/test.db Actual results: /tmp/test.db: (1628767744 words) Expected results: /tmp/test.db: Berkeley DB (Hash, version 8, native byte-order) -- Radek Vok?l -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://mx.gw.com/pipermail/file/attachments/20050921/1351290c/attachment.bin From ian at darwinsys.com Wed Sep 21 17:15:00 2005 From: ian at darwinsys.com (Ian Darwin) Date: Wed Sep 21 17:47:15 2005 Subject: File Command Web Page? Message-ID: <43316AE4.3020508@darwinsys.com> Hi, is there someplace that you maintain a web page about file? If my page at www.darwinsys.com/freeware/file.html is the only one, I may glitz it up a bit and put in links to things that are based on our code (I was reminded to ask this after somebody contacted me about getting the latest code for a proposed port to C#). Cheers Ian From christos at zoulas.com Wed Sep 21 18:31:16 2005 From: christos at zoulas.com (Christos Zoulas) Date: Wed Sep 21 18:31:33 2005 Subject: File Command Web Page? In-Reply-To: <43316AE4.3020508@darwinsys.com> from Ian Darwin (Sep 21, 10:15am) Message-ID: <20050921153116.9F4D656527@rebar.astron.com> On Sep 21, 10:15am, ian@darwinsys.com (Ian Darwin) wrote: -- Subject: File Command Web Page? | Hi, is there someplace that you maintain a web page about file? | If my page at www.darwinsys.com/freeware/file.html is the only one, | I may glitz it up a bit and put in links to things that are based on our | code | (I was reminded to ask this after somebody contacted me about | getting the latest code for a proposed port to C#). | | Cheers | Ian No, the only page I know is the one on FreshMeat... Feel free to add a page and if you want I can host it. christos From rvokal at redhat.com Mon Sep 19 11:24:32 2005 From: rvokal at redhat.com (Radek =?ISO-8859-1?Q?Vok=E1l?=) Date: Thu Sep 22 09:02:27 2005 Subject: [BUG] file has funky output for SPARC core file Message-ID: <1127118272.3018.12.camel@localhost.localdomain> Skipped content of type multipart/mixed-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://mx.gw.com/pipermail/file/attachments/20050919/9268a814/attachment-0001.bin From christos at zoulas.com Mon Oct 17 20:30:26 2005 From: christos at zoulas.com (Christos Zoulas) Date: Mon Oct 17 20:30:46 2005 Subject: file-4.16 is now available Message-ID: <20051017173027.1144B564FA@rebar.astron.com> from ftp://ftp.astron.com/pub/file/file-4.16.tar.gz Changes: 1. don't close stdin in the library. 2. search for elf notes in shared libraries too. 3. open files with O_BINARY on cygwin And as always magic fixes and additions. christos From christos at zoulas.com Mon Oct 17 22:20:58 2005 From: christos at zoulas.com (Christos Zoulas) Date: Mon Oct 17 22:21:08 2005 Subject: replaced tar file 4.16 Message-ID: <20051017192058.07248564FA@rebar.astron.com> To fix some problems the NetBSD lint found and a broken msdos file. christos From vapier at gentoo.org Tue Oct 18 01:01:39 2005 From: vapier at gentoo.org (Mike Frysinger) Date: Tue Oct 18 01:26:48 2005 Subject: segfault when using libmagic and an empty buffer in file-4.16 Message-ID: <200510171801.39294.vapier@gentoo.org> thought i already sent this but i must of forgotten if you call magic_buffer(m, NULL, 0) it'll segfault on you because file_ascmagic() assumes that the count is always at least 2 bytes. once this has been fixed, magic_buffer(m, NULL, 0) will return NULL instead of "empty" ... so, the first fix is: --- src/ascmagic.c +++ src/ascmagic.c @@ -179,6 +179,9 @@ file_ascmagic(struct magic_set *ms, cons } } + if (nbytes <= 1) + goto done; + if ((*buf == 'c' || *buf == 'C') && ISSPC(buf[1])) { subtype_mime = "text/fortran"; subtype = "fortran program"; while the second fix should be something like this i think: file_buffer(struct magic_set *ms, int fd, const void *buf, size_t nb) ... /* abandon hope, all ye who remain here */ if (file_printf(ms, ms->flags & MAGIC_MIME ? (nb ? "application/octet-stream" : "application/empty") : (nb ? "data" : "empty")) == -1) return -1; ... -mike From christos at zoulas.com Tue Oct 18 01:49:25 2005 From: christos at zoulas.com (Christos Zoulas) Date: Tue Oct 18 01:49:45 2005 Subject: segfault when using libmagic and an empty buffer in file-4.16 In-Reply-To: <200510171801.39294.vapier@gentoo.org> from Mike Frysinger (Oct 17, 6:01pm) Message-ID: <20051017224925.49392564FA@rebar.astron.com> On Oct 17, 6:01pm, vapier@gentoo.org (Mike Frysinger) wrote: -- Subject: segfault when using libmagic and an empty buffer in file-4.16 | thought i already sent this but i must of forgotten Nope, I have not seen this before :-) | if you call magic_buffer(m, NULL, 0) it'll segfault on you because | file_ascmagic() assumes that the count is always at least 2 bytes. once this | has been fixed, magic_buffer(m, NULL, 0) will return NULL instead of | "empty" ... | | so, the first fix is: | --- src/ascmagic.c | +++ src/ascmagic.c | @@ -179,6 +179,9 @@ file_ascmagic(struct magic_set *ms, cons | } | } | | + if (nbytes <= 1) | + goto done; | + | if ((*buf == 'c' || *buf == 'C') && ISSPC(buf[1])) { | subtype_mime = "text/fortran"; | subtype = "fortran program"; | | while the second fix should be something like this i think: | file_buffer(struct magic_set *ms, int fd, const void *buf, size_t nb) | ... | /* abandon hope, all ye who remain here */ | if (file_printf(ms, ms->flags & MAGIC_MIME ? | (nb ? "application/octet-stream" : "application/empty") : | (nb ? "data" : "empty")) == -1) | return -1; | ... Ok, I will take a look! thanks christos From vapier at gentoo.org Thu Oct 20 02:56:18 2005 From: vapier at gentoo.org (Mike Frysinger) Date: Thu Oct 20 02:54:20 2005 Subject: magic file for rzip archives Message-ID: <200510191956.18276.vapier@gentoo.org> found this in the rzip package: http://rzip.samba.org/ -mike -------------- next part -------------- # Magic local data for file(1) command. # Insert here your local magic data. Format is described in magic(5). # Supplementary magic data for the file(1) command to support # rzip(1). The format is described in magic(5). # # Copyright (C) 2003 by Andrew Tridgell. You may do whatever you want with # this file. # 0 string RZIP rzip compressed data >4 byte x - version %d >5 byte x \b.%d >6 belong x (%d bytes) From rvokal at redhat.com Mon Oct 31 08:44:22 2005 From: rvokal at redhat.com (Radek =?ISO-8859-1?Q?Vok=E1l?=) Date: Mon Oct 31 08:45:35 2005 Subject: file-4.16 is now available In-Reply-To: <20051017173027.1144B564FA@rebar.astron.com> References: <20051017173027.1144B564FA@rebar.astron.com> Message-ID: <1130741062.3368.12.camel@localhost.localdomain> file-4.16 stopped reporting a source on core files $ file ./core.12863 ./core.12863: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), SVR4-style (missing from `') Radek On Mon, 2005-10-17 at 13:30 -0400, Christos Zoulas wrote: > from ftp://ftp.astron.com/pub/file/file-4.16.tar.gz > > Changes: > > 1. don't close stdin in the library. > 2. search for elf notes in shared libraries too. > 3. open files with O_BINARY on cygwin > > And as always magic fixes and additions. > > christos > > _______________________________________________ > File mailing list > File@mx.gw.com > http://mx.gw.com/mailman/listinfo/file -- Radek Vok?l -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://mx.gw.com/pipermail/file/attachments/20051031/582f0a2f/attachment.bin From rvokal at redhat.com Mon Oct 31 09:38:56 2005 From: rvokal at redhat.com (Radek =?ISO-8859-1?Q?Vok=E1l?=) Date: Mon Oct 31 09:40:04 2005 Subject: file-4.16 is now available In-Reply-To: <1130741062.3368.12.camel@localhost.localdomain> References: <20051017173027.1144B564FA@rebar.astron.com> <1130741062.3368.12.camel@localhost.localdomain> Message-ID: <1130744336.3368.21.camel@localhost.localdomain> On Mon, 2005-10-31 at 07:44 +0100, Radek Vok?l wrote: > file-4.16 stopped reporting a source on core files > > $ file ./core.12863 > ./core.12863: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), > SVR4-style > > (missing from `') > The problem seems to be in readelf.c with the FLAGS_DID_CORE patch, I guess it shouldn't return unless the flag is set, eg. the last return statement is useless. Quick patching fixes it... --- file-4.16/src/readelf.c.old 2005-10-17 20:41:44.000000000 +0200 +++ file-4.16/src/readelf.c 2005-10-31 08:34:35.000000000 +0100 @@ -546,11 +546,11 @@ donote(struct magic_set *ms, unsigned ch return size; *flags |= FLAGS_DID_CORE; - } else - return size; + } /* else + return size;*/ } switch (os_style) { Radek > > > On Mon, 2005-10-17 at 13:30 -0400, Christos Zoulas wrote: > > from ftp://ftp.astron.com/pub/file/file-4.16.tar.gz > > > > Changes: > > > > 1. don't close stdin in the library. > > 2. search for elf notes in shared libraries too. > > 3. open files with O_BINARY on cygwin > > > > And as always magic fixes and additions. > > > > christos > > > > _______________________________________________ > > File mailing list > > File@mx.gw.com > > http://mx.gw.com/mailman/listinfo/file > _______________________________________________ > File mailing list > File@mx.gw.com > http://mx.gw.com/mailman/listinfo/file -- Radek Vok?l -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://mx.gw.com/pipermail/file/attachments/20051031/55eb5576/attachment.bin From christos at zoulas.com Mon Oct 31 15:47:41 2005 From: christos at zoulas.com (Christos Zoulas) Date: Mon Oct 31 15:48:05 2005 Subject: file-4.16 is now available In-Reply-To: <1130741062.3368.12.camel@localhost.localdomain> from Radek =?ISO-8859-1?Q?Vok=E1l?= (Oct 31, 7:44am) Message-ID: <20051031134741.36B8F564FA@rebar.astron.com> On Oct 31, 7:44am, rvokal@redhat.com (Radek =?ISO-8859-1?Q?Vok=E1l?=) wrote: -- Subject: Re: file-4.16 is now available | file-4.16 stopped reporting a source on core files | | $ file ./core.12863 | ./core.12863: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), | SVR4-style | | (missing from `') | | Radek Thanks, I will take a look. christos From christos at zoulas.com Mon Oct 31 15:52:58 2005 From: christos at zoulas.com (Christos Zoulas) Date: Mon Oct 31 15:53:05 2005 Subject: file-4.16 is now available In-Reply-To: <1130744336.3368.21.camel@localhost.localdomain> from Radek =?ISO-8859-1?Q?Vok=E1l?= (Oct 31, 8:38am) Message-ID: <20051031135258.41173564FA@rebar.astron.com> On Oct 31, 8:38am, rvokal@redhat.com (Radek =?ISO-8859-1?Q?Vok=E1l?=) wrote: -- Subject: Re: file-4.16 is now available Thanks a lot! christos | The problem seems to be in readelf.c with the FLAGS_DID_CORE patch, I | guess it shouldn't return unless the flag is set, eg. the last return | statement is useless. Quick patching fixes it...=20 | | --- file-4.16/src/readelf.c.old 2005-10-17 20:41:44.000000000 +0200 | +++ file-4.16/src/readelf.c 2005-10-31 08:34:35.000000000 +0100 | @@ -546,11 +546,11 @@ donote(struct magic_set *ms, unsigned ch | return size; | *flags |=3D FLAGS_DID_CORE; | - } else | - return size; | + } /* else | + return size;*/ | } | =20 | switch (os_style) { | From rvokal at redhat.com Fri Nov 11 10:55:02 2005 From: rvokal at redhat.com (=?ISO-8859-1?Q?Radek_Vok=E1l?=) Date: Fri Nov 11 10:55:14 2005 Subject: Colision between Cracklib and Ext filesystems Message-ID: <43745C66.9030703@redhat.com> See bug https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=172904 with proposed patch Radek From vapier at gentoo.org Fri Nov 11 16:53:56 2005 From: vapier at gentoo.org (Mike Frysinger) Date: Fri Nov 11 16:54:14 2005 Subject: Colision between Cracklib and Ext filesystems In-Reply-To: <43745C66.9030703@redhat.com> References: <43745C66.9030703@redhat.com> Message-ID: <20051111145356.GA20969@toucan.gentoo.org> On Fri, Nov 11, 2005 at 09:55:02AM +0100, Radek Vok?l wrote: > See bug this was reported & fixed on the mailing list and is in file-4.16 -mike From christos at zoulas.com Fri Nov 11 17:10:57 2005 From: christos at zoulas.com (Christos Zoulas) Date: Fri Nov 11 17:11:12 2005 Subject: Colision between Cracklib and Ext filesystems In-Reply-To: <43745C66.9030703@redhat.com> from =?ISO-8859-1?Q?Radek_Vok=E1l?= (Nov 11, 9:55am) Message-ID: <20051111151057.8913D564FA@rebar.astron.com> On Nov 11, 9:55am, rvokal@redhat.com (=?ISO-8859-1?Q?Radek_Vok=E1l?=) wrote: -- Subject: Colision between Cracklib and Ext filesystems | See bug | | https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=172904 | | with proposed patch This is what the current version of file has: 0 lelong 0x70775631 Cracklib password index, little endian >4 long >0 (%i words) >4 long 0 ("64-bit") >>8 long >-1 (%i words) 0 belong 0x70775631 Cracklib password index, big endian >4 belong >-1 (%i words) 4 belong 0x70775631 Cracklib password index, big endian ("64-bit") >12 belong >0 (%i words) I think that this fixes the problem. christos From rvokal at redhat.com Sat Nov 12 11:26:18 2005 From: rvokal at redhat.com (=?UTF-8?B?UmFkZWsgVm9rw6Fs?=) Date: Sat Nov 12 11:26:38 2005 Subject: Colision between Cracklib and Ext filesystems In-Reply-To: <20051111151057.8913D564FA@rebar.astron.com> References: <20051111151057.8913D564FA@rebar.astron.com> Message-ID: <4375B53A.20007@redhat.com> Christos Zoulas wrote: > On Nov 11, 9:55am, rvokal@redhat.com (=?ISO-8859-1?Q?Radek_Vok=E1l?=) wrote: > -- Subject: Colision between Cracklib and Ext filesystems > > | See bug > | > | https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=172904 > | > | with proposed patch > > This is what the current version of file has: > > 0 lelong 0x70775631 Cracklib password index, little endian > >>4 long >0 (%i words) >>4 long 0 ("64-bit") >> >>>8 long >-1 (%i words) > > 0 belong 0x70775631 Cracklib password index, big endian > >>4 belong >-1 (%i words) > > 4 belong 0x70775631 Cracklib password index, big endian ("64-bit") > >>12 belong >0 (%i words) > > > > I think that this fixes the problem. > > christos > Yep, true. I've tested this issue with file-4.14 .. shame on me :) Radek From enrio at online.no Tue Nov 15 01:56:41 2005 From: enrio at online.no (Enrique Perez-Terron) Date: Tue Nov 15 02:21:36 2005 Subject: Forcing "failure" of level-zero test from dependent test, and other issues. Message-ID: Hello, I have been looking into the source a few days, since the cracklib issue. While that particular issue has been solved nicely by removing a test, it strikes me that perhaps it would be usefull to be able to combine more than one test to determine the broader category of a file. That is, even if the first test in a group of dependent tests (like the test for "0 long 0" in the cracklib case) is successfull, there should be a way to back out of a decision "this is some sort of XXX file". Just to explore the possibilities I have made a patch, which I include below. I do not consider it to be "ready for application to the cvs", I just don't want to spend days testing if nobody wants to know about it. Along the way I have also changed a number of other things, which are also included in the patch. But first the main topic. I suggest introducing a new operator '?' to be used in the "value" or "test" field of the magic file. The operator is an optional prefix in addition to any [=<>!x&^] operator. It has no effect on level-zero test (those with no leading '>'), but in a dependent test, if the dependent test fails, it behaves as if the root test had failed. First it removes any description text delivered by successfull uplevel tests and their siblings and children, and it makes the program continue searching the magic file for a match starting from the next zero-level test. In the case of "file -k", only the descriptions added since the last zero-level test was begun, are removed. The implementation does not store this operator in the (struct magic).reln member, it renames one of the "dummy" members. I could perhaps have found an unused flag bit instead. I sent a mail to Christos Zoulas a few days ago about a few lines of code that I suggested removed. I hereby transfer that discussion to this forum. (I was not aware of this list's existence.) "Christos Zoulas" wrote on Sun, 13 Nov 2005 05:35:19 +0100: > On Nov 12, 8:49pm, enrio@online.no ("Enrique Perez-Terron") wrote: > -- Subject: file-4.16, softmagic.c, mcheck(): "0 long 0x78 AlwaysMatch" match >| Hello, > || A test (magic-file line) of > || 0 long 0x78 AlwaysMatch > || matches any file having at least four bytes. > || Looking into the code I found that the first statements in the function > | mcheck() reads: > || #1165 "softmagic.c" > | if ( (m->value.s[0] == 'x') && (m->value.s[1] == '\0') ) { > | return 1; > | } > || This does not look right, since the match-all operator 'x' is stored in > | ((struct magic *)m)->reln. [...] > This "x" as a wild-character > match and it is documented in magic(5): > [...] > or x, to specify that any value will match. [...] >This seems silly, but it has always been in the implementation of the file > command. I believe changing this will make file(1) non POSIX compliant. > christos My point is that the code I want to delete is not needed for the functionality that is documented. I have just tested it again, with all the patch below, i.e. the code removed, and given the files: $ od -c t.dat 0000000 a b c d e f g h i j k l m n o p 0000020 020 \0 \0 \0 020 \0 \0 \0 1 2 3 4 5 6 7 8 0000040 A B C D E F G H I J K L M N O P 0000060 Q R S T U V W X Y Z \n 0000073 $ cat t.mag 0 string abcd >&0 string x Here is %s. >>&(16+(4)) string x And here is %.4s. $ cat t2.mag 0 long 0x78 Hello World! The new "file" command works like this: $ file -m t.mag t.dat t.dat: Here is efghijklmnop\020. And here is RSTU. $ file -m t2.mag t.dat t.dat: data With the code in question not excluded, the latter command gives: $ file -m t2.mag t.dat t.dat: Hello World! which is not in accordance with the documentation. In any case, the documentation seems to suggest that 'x' only works as a catch-all with *numeric* data types. However, the implementation clearly collects the same operators in parse() in apprentice.c independent of the data type (line 685 and on), and implements the x operator semantics in softmagic.c, line 1309 and on, for all data types. The effect of the lines I suggest removed, is to act on a lone 'x' in the value buffer instead of acting on the 'x' in the reln member of the struct magic. The 'x' in a test like 0 string x Oh My does not land in the value buffer. It is snatched by the reln member, and the value buffer remains empty. The code that does this is signed with a comment "Bill The Cat", in apprentice.c. To actually trigger the code I suggest removed, you need a test like 0 long 0x78 Oh No! where you evade the operator 'x' parsing and place the 'x' byte in the value buffer by way of a numeric code. I guess you could get the same effect with 0 long 0x1110078 Help! on a little-endian computer. On a big-endian, 0x78000000 would be required. ---- Enough of that, the patch also changes how the ! operator works, and that is perhaps not a good idea. The documentation seems to suggest that ! inverts the value of arbitrary tests of numeric types: For all tests except string and regex, operation ! specifies that the line matches if the test does not succeed The implementation only implements ! as a "not-equal" operator. You can only specify one operator (but an additional "=" like in ">=" or "!=" is ignored). I have changed the implementation so that ! is like ? an optional prefix in addition to the "reln" operator. It is my hope that this change does not break existing magic files. Where the ! operator is used alone, an implicit "=" will be placed in the "reln" member, and the test will first test for equality. Then it will invert the result. This is really a stupid change for no very convincing reason, I just did it because I was playing with the code to learn about it. It brings the capability of creating a true "<=" (less-or-equal) test, in the form of "!>", and correspondingly with the other operators. It permits rephrasing ^ as !& or & as !^. It even makes !x a fail-allways. Perhaps it would be interesting to redefine ^ to mean "all bits set in the magic must be zero in the target", since the present meaning can be expressed with !&, and this would also introduce the possibility of !^ as "some bits set in the magic must be set in the target", for whatever that is worth. I have no idea what that would do to POSIX compliance. Posix seems to freeze and perpetuate all stupid design details made in the past. ----- What else? Some bugs, if I read the code right, fixed. Things that are seldom or never exercised, but looked plain wrong. I have tried to update the magic.man page, but now I have read it so many times that I can't see what it says any more. file_printf() had two bugs, only exercisable when the initial 1024 bytes allocated for description is not enough. When computing the new size, it forgets to count the size of what is already in the buffer. This is mostly offset by the additional 1024 byte headroom allocated. Second, when computing the new free room, the old size is used, instead of the new size. The manual says about "string/B": The ??B?? flag compacts whitespace in the target, which must contain at least one whitespace character. If the magic has n consecutive blanks, the target needs at least n consecutive blanks to match. The implementation would swallow all blanks in the target when it saw the first blank in the magic. If the magic has n>=2 consecutive blanks, the implementation would not find any blanks in the target to match with the second and further blanks, and determine the strings were different. I rewrote the code to something I think will match the manual page. Untested. diff -u -r file-4.16/doc/magic.man file-4.16-quique/doc/magic.man --- file-4.16/doc/magic.man 2005-04-13 19:16:22.000000000 +0200 +++ file-4.16-quique/doc/magic.man 2005-11-14 10:53:57.000000000 +0100 @@ -8,7 +8,7 @@ .BR file (__CSECTION__) command, version __VERSION__. The -.BR file +.B file command identifies the type of a file using, among other tests, a test for whether the file begins with a certain @@ -18,16 +18,52 @@ specifies what magic numbers are to be tested for, what message to print if a particular magic number is found, and additional information to extract from the file. +If the test succeeds, a message is printed. .PP Each line of the file specifies a test to be performed. A test compares the data starting at a particular offset in the file with a 1-byte, 2-byte, or 4-byte numeric value or a string. -If the test succeeds, a message is printed. -The line consists of the following fields: +.PP +Many formats can be determined with a single test, but some require more. +Such follow-up tests are organized in a tree-like manner. +Lines starting with one +.B > +character are children of the last preceding line without such a character. +Lines starting with +.IR n +1 +.B > +characters are children of the preceding line with +.I n +leading +.B > +characters. +One may consider the tests without leading +.B > +characters to be +.B root +tests of their respective trees, and the entire magic file a forest of trees. +.PP +Root-level tests are executed until one is found that succeeds. +After that no further root-level tests are normally executed. +If the successfull root-level test has child tests, +these are all executed, and those who pass may add their messages to the parent's message. +This is used to add detail information to a basic file type. +For each child test that succeeds, its children are executed too, and so on recursively. +Siblings of a child test are executed regardless of the success or failure of the child in question. +.PP +Occasionally it is desirable to determine from the failure of a child test that the root +test should have failed, so the search for a matching test can continue among the remaining root tests. +For this purpose, see the '?' operator below. The "file" command also has a command line option to +find all matching root tests, not just the first one. +.PP +Appart from any leading +.B > +characters, each line consists of the following fields: .IP offset \w'message'u+2n -A number specifying the offset, in bytes, into the file of the data -which is to be tested. +A number specifying the offset, in bytes, into the file of the data which is to be tested. +This field can also contain quite complicated formulas for computing the offset to use. +The expression language is explained with examples below. .IP type The type of the data to be tested. The possible values are: @@ -39,21 +75,30 @@ .IP long A four-byte value (on most systems) in this machine's native byte order. .IP string -A string of bytes. +A string of bytes. Up to 31 bytes are read from the file into a buffer, and padded with a zero. +Any null-valued bytes coming from he file are converted to spaces. +The contents of this buffer can then be compared with the string specified in the +.B test +field of the line, see below. +In the following, the term +.I target +refers to the contents of this buffer. The string type specification can be optionally followed by /[Bbc]*. -The ``B'' flag compacts whitespace in the target, which must -contain at least one whitespace character. -If the magic has -.I n -consecutive blanks, the target needs at least -.I n -consecutive blanks to match. -The ``b'' flag treats every blank in the target as an optional blank. -Finally the ``c'' flag, specifies case insensitive matching: lowercase +.RS +.IP B +compacts whitespace in the target. +This means that where the magic file has one or more white space characters, +the target must have at least as many white space charcacters to compare equal. +.IP b +treats every white space in the magic as an optional blank. +The target can have zero or more white spaces at the corresponding point and still compare equal. +.IP c +specifies case insensitive matching: lowercase characters in the magic match both lower and upper case characters in the -targer, whereas upper case characters in the magic, only much uppercase +target, whereas upper case characters in the magic, only mach uppercase characters in the target. +.RE .IP date A four-byte value interpreted as a UNIX date. .IP ldate @@ -104,7 +149,7 @@ .B \e escapes for special characters. .RE -.PP +.IP The numeric types may optionally be followed by .B & and a numeric value, @@ -123,56 +168,80 @@ Numeric values may be preceded by a character indicating the operation to be performed. It may be -.BR = , -to specify that the value from the file must equal the specified value, -.BR < , -to specify that the value from the file must be less than the specified -value, -.BR > , -to specify that the value from the file must be greater than the specified -value, -.BR & , -to specify that the value from the file must have set all of the bits -that are set in the specified value, -.BR ^ , -to specify that the value from the file must have clear any of the bits -that are set in the specified value, or -.BR x , +.RS +.IP \fB=\fR +the value from the file must equal the specified value, +.IP \fB<\fR +the value from the file must be less than the specified value, +.IP \fB>\fR +the value from the file must be greater than the specified value, +.IP \fB&\fR +the value from the file must have set all of the bits that are set in the specified value, +.IP \fB^\fR +the value from the file must have clear at least some of the bits that are set in the specified value, or +.IP \fBx\fR to specify that any value will match. +.RE +.IP If the character is omitted, it is assumed to be .BR = . -For all tests except -.B string -and -.B regex, -operation -.BR ! -specifies that the line matches if the test does -.B not -succeed. +.IP +The test field may have two further flag characters prefixed. +These flags are applicable to all tests, not just the nummeric ones. +If present, they must go before any nummeric comparison operator. +If both are used in the same test, they can go in any order. +.RS +.IP \fB!\fR +inverts the sense of the test. +If the test without this flag fails, the test with it matches, and vice versa. +.IP \fB?\fR +If this test fails (after having considered any +.B ! +flag if present), the failure is propagated back to the root test. +The processing continues as if the root test had failed in the first place: +No further members of the same tree are processed, +and the search for a match continues with the next root test. +Any messages produced by the tree are discarded. +This flag has no effect when used in a root test. +.RE .IP Numeric values are specified in C form; e.g. -.B 13 -is decimal, -.B 013 -is octal, and +.BR 19 , +.BR 023 , +and .B 0x13 -is hexadecimal. +all represent the same number in decimal, octal, and hexadecimal respectively. .IP For string values, the byte string from the file must match the specified byte string. The operators .BR = , -.B < +.BR < , +.BR > , +.BR x , and -.B > -(but not -.BR & ) -can be applied to strings. -The length used for matching is that of the string argument -in the magic file. -This means that a line can match any string, and -then presumably print that string, by doing +.B ! +can be applied to strings. +If none is specified, +.B = +is assumed. +If a string must begin with a character that has an operator meaning (any of +.BR & , +.BR ^ , +.BR = , +.BR > , +.BR < , +.BR x , +.BR ! , +.BR ? ), +specify the +.B = +operator first, and any subsequent characters will be taken as part of the string. +.IP +The length used for matching is that of the string argument in the magic file. +The strings can contain zero-valued bytes. +This means that a test can match any string, +and then presumably print that string, by doing .B >\e0 (because all strings are greater than the null string). .IP message @@ -182,28 +251,8 @@ format specification, the value from the file (with any specified masking performed) is printed using the message as the format string. .PP -Some file formats contain additional information which is to be printed -along with the file type or need additional tests to determine the true -file type. -These additional tests are introduced by one or more -.B > -characters preceding the offset. -The number of -.B > -on the line indicates the level of the test; a line with no -.B > -at the beginning is considered to be at level 0. -Tests are arranged in a tree-like hierarchy: -If a the test on a line at level -.IB n -succeeds, all following tests at level -.IB n+1 -are performed, and the messages printed if the tests succeed, untile a line -with level -.IB n -(or less) appears. For more complex files, one can use empty messages to get just the -"if/then" effect, in the following way: +"if/then/else" effect, in the following way: .sp .nf 0 string MZ @@ -211,6 +260,11 @@ >0x18 leshort >0x3f extended PC executable (e.g., MS Windows) .fi .PP +Notice that the tests in this example are complementary, if one is false the other is true. +When tests are chained with increasingly more leading '>' characters, you get an "and" effect. +When tests have the same number of leading '>' characters, you get an "or" effect. +This "or" effect is not exclusive unless the tests are exclusive. +.PP Offsets do not need to be constant, but can also be read from the file being examined. If the first character following the last @@ -223,20 +277,36 @@ The value at that offset is read, and is used again as an offset in the file. Indirect offsets are of the form: -.BI (( x [.[bslBSL]][+\-][ y ]). +.PP +.B ( +.I x +.RI [ type ] +[ +.RB [ ~ ] +.I op y +.RB ] ) +.PP +where +.I op +is one of +.BR +\-*/%&|^ . +.pp The value of .I x -is used as an offset in the file. A byte, short or long is read at that offset -depending on the -.B [bslBSL] -type specifier. -The capitalized types interpret the number as a big endian -value, whereas the small letter versions interpret the number as a little -endian value. +is used as an offset in the file. A byte, short or long is read at that offset, +depending on the +.I type +specifier, which has the form +.RB [[ . ] bslBSL ]. +The capitalized types interpret the number as a big endian value, +whereas the small letter versions interpret the number as a little endian value. +The default type if one is not specified is long. To that number the value of .I y -is added and the result is used as an offset in the file. -The default type if one is not specified is long. +is added (or multiplied, etc.) and the result is used as an offset in the file. +There is an even more convoluted form, in which +.I y +is parenthesized. See examples below. .PP That way variable length structures can be examined: .sp @@ -252,11 +322,44 @@ .PP This strategy of examining has one drawback: You must make sure that you eventually print something, or users may get empty output (like, when -there is neither PE\e0\e0 nor LE\e0\e0 in the above example) +there is neither PE\e0\e0 nor LE\e0\e0 in the above example). .PP -If this indirect offset cannot be used as-is, there are simple calculations +If you think files starting with "MZ" are MS executables anyhow you can do +.sp +.nf + # MS Windows executables are also valid MS-DOS executables + 0 string MZ + >0x18 leshort <0x40 MZ executable (MS-DOS) + # skip the whole block below if it is not an extended executable + >0x18 leshort >0x3f + >>(0x3c.l) string PE\e0\e0 PE executable (MS-Windows) + >>(0x3c.l) string LX\e0\e0 LX executable (OS/2) + >>(0x3c.l) string !PE\e0\e0 + >>>(0x3c.l) string !LX\e0\e0 MS extended executable of some unknown sort +.fi +.PP +If there are other formats unrelated to MS executables that begin with MZ, you can do +.sp +.nf + # MS Windows executables are also valid MS-DOS executables + 0 string MZ + >0x18 leshort <0x40 MZ executable (MS-DOS) + # Check for extended executable + >0x18 leshort >0x3f + >>(0x3c.l) string PE\e0\e0 PE executable (MS-Windows) + >>(0x3c.l) string LX\e0\e0 LX executable (OS/2) + # if it is neither PE nor LX, it is not a MS format at all + >>(0x3c.l) string !PE\e0\e0 + >>>(0x3c.l) string ?!LX\e0\e0 +.fi +.PP +Here files that are not MS executables, but start with "MZ" nevertheless, +can be recognized by entirely different sets of tests appearing later in the magic file. +.PP +If an indirect offset cannot be used as-is, there are simple calculations possible: appending -.BI [+-*/%&|^] +.RB [ +-*/%&|^ ] +.I number inside parentheses allows one to modify the value read from the file before it is used as an offset: .sp @@ -285,6 +388,12 @@ >>>&0 leshort 0x184 for DEC Alpha .fi .PP +If the uplevel test is a string type test, +the end of the field is after the target string's terminating null character. +However, if the test compares for equality or inequality, +the length is limited by the length of the string specified in the magic file. +Remember that the target is a buffer with at most 31 charcters plus a forced terminating null. +.PP Indirect and relative offsets can be combined: .sp .nf @@ -335,6 +444,17 @@ # these are located 14 and 10 bytes after the section name >>>>(&0xe.l+(-4)) string PK\e3\e4 \eb, ZIP self-extracting archive .fi +.PP +To see in detail how this example works, assume the location 0x3c in the file contains +the long 0x100. At offset 0x100, assume there is a four byte string PE\e0\e0. +This string ends at 0x104. To this number add 0xf4, giving 0x1f8. +At this offset, start a search for the string .idata. +Assume it is found at 0x300. The string ends at 0x306. To this number add 0xe (i.e., 14), +and (-4) resulting in 0x306 + 0xe + (-4) = 0x310. At this offset find the length field, +let us assume it is 0x400. Then start over with 0x306 +0xe = 0x314, and at this offset +find the start offset of the section. Assume this is at 0x1000. +Then check the string at 0x1000 + 0x400 = 0x1400 for the string PK\e3\e4 + .SH BUGS The formats .IR long , diff -u -r file-4.16/src/apprentice.c file-4.16-quique/src/apprentice.c --- file-4.16/src/apprentice.c 2005-10-17 19:13:13.000000000 +0200 +++ file-4.16-quique/src/apprentice.c 2005-11-14 10:53:57.000000000 +0100 @@ -682,6 +682,7 @@ */ EATAB; + repeat: switch (*l) { case '>': case '<': @@ -697,9 +698,13 @@ } break; case '!': - m->reln = *l; + m->negate = 1; ++l; - break; + goto repeat; + case '?': + m->fail = 1; + ++l; + goto repeat; default: if (*l == 'x' && ((isascii((unsigned char)l[1]) && isspace((unsigned char)l[1])) || !l[1])) { diff -u -r file-4.16/src/file.h file-4.16-quique/src/file.h --- file-4.16/src/file.h 2005-10-17 19:13:13.000000000 +0200 +++ file-4.16-quique/src/file.h 2005-11-14 10:53:57.000000000 +0100 @@ -166,8 +166,8 @@ /* Word 3 */ uint8_t in_op; /* operator for indirection */ uint8_t mask_op; /* operator for mask */ - uint8_t dummy1; - uint8_t dummy2; + uint8_t negate; /* invert match ('!' operator) */ + uint8_t fail; /* cancel matches */ #define FILE_OPS "&|^+-*/%" #define FILE_OPAND 0 #define FILE_OPOR 1 diff -u -r file-4.16/src/funcs.c file-4.16-quique/src/funcs.c --- file-4.16/src/funcs.c 2005-10-17 21:03:34.000000000 +0200 +++ file-4.16-quique/src/funcs.c 2005-11-14 10:53:57.000000000 +0100 @@ -46,21 +46,22 @@ file_printf(struct magic_set *ms, const char *fmt, ...) { va_list ap; - size_t len; + size_t len, newsize; char *buf; va_start(ap, fmt); if ((len = vsnprintf(ms->o.ptr, ms->o.len, fmt, ap)) >= ms->o.len) { va_end(ap); - if ((buf = realloc(ms->o.buf, len + 1024)) == NULL) { + newsize = (ms->o.ptr - ms->o.buf) + len + 1024; + if ((buf = realloc(ms->o.buf, newsize)) == NULL) { file_oomem(ms); return -1; } ms->o.ptr = buf + (ms->o.ptr - ms->o.buf); ms->o.buf = buf; + ms->o.size = newsize; ms->o.len = ms->o.size - (ms->o.ptr - ms->o.buf); - ms->o.size = len + 1024; va_start(ap, fmt); len = vsnprintf(ms->o.ptr, ms->o.len, fmt, ap); diff -u -r file-4.16/src/print.c file-4.16-quique/src/print.c --- file-4.16/src/print.c 2005-10-12 21:29:42.000000000 +0200 +++ file-4.16-quique/src/print.c 2005-11-14 10:53:57.000000000 +0100 @@ -92,7 +92,12 @@ } } - (void) fprintf(stderr, ",%c", m->reln); + (void) fputc(',',stderr); + if (m->fail) + (void) fputc('?', stderr); + if (m->negate) + (void) fputc('!', stderr); + (void) fputc(m->reln, stderr); if (m->reln != 'x') { switch (m->type) { diff -u -r file-4.16/src/softmagic.c file-4.16-quique/src/softmagic.c --- file-4.16/src/softmagic.c 2005-10-17 21:04:36.000000000 +0200 +++ file-4.16-quique/src/softmagic.c 2005-11-14 10:57:14.000000000 +0100 @@ -109,6 +109,7 @@ int32_t oldoff = 0; int returnval = 0; /* if a match is found it is set to 1*/ int firstline = 1; /* a flag to print X\n X\n- X */ + int root_pos = ms->o.ptr - ms->o.buf; if (check_mem(ms, cont_level) == -1) return -1; @@ -118,7 +119,7 @@ int flush = !mget(ms, &p, s, &magic[magindex], nbytes, cont_level); if (flush) { - if (magic[magindex].reln == '!') flush = 0; + if (magic[magindex].negate) flush = 0; } else { switch (mcheck(ms, &p, &magic[magindex])) { case -1: @@ -135,6 +136,7 @@ * main entry didn't match, * flush its continuations */ + flush_main: while (magindex < nmagic - 1 && magic[magindex + 1].cont_level != 0) magindex++; @@ -179,13 +181,22 @@ flush = !mget(ms, &p, s, &magic[magindex], nbytes, cont_level); - if (flush && magic[magindex].reln != '!') + + if (flush && !magic[magindex].negate) goto done; switch (flush ? 1 : mcheck(ms, &p, &magic[magindex])) { case -1: return -1; case 0: + if (magic[magindex].fail) { + cont_level = 0; + ms->o.ptr = ms->o.buf + root_pos; + ms->o.len = ms->o.size - root_pos; + if (ms->o.len > 0) + ms->o.ptr[0] = '\0'; + goto flush_main; + } break; default: /* @@ -282,7 +293,7 @@ case FILE_PSTRING: case FILE_BESTRING16: case FILE_LESTRING16: - if (m->reln == '=' || m->reln == '!') { + if (m->reln == '=') { if (file_printf(ms, m->desc, m->value.s) == -1) return -1; t = m->offset + m->vallen; @@ -1162,11 +1173,6 @@ uint32_t v; int matched; - if ( (m->value.s[0] == 'x') && (m->value.s[1] == '\0') ) { - return 1; - } - - switch (m->type) { case FILE_BYTE: v = p->b; @@ -1212,26 +1218,31 @@ break; } else { /* combine the others */ while (--len >= 0) { - if ((m->mask & STRING_IGNORE_LOWERCASE) && - islower(*a)) { - if ((v = tolower(*b++) - *a++) != '\0') - break; - } else if ((m->mask & STRING_COMPACT_BLANK) && - isspace(*a)) { + if (islower(*a)) { + if (m->mask & STRING_IGNORE_LOWERCASE) { + if ((v = tolower(*b++) - *a++) != '\0') + break; + } + goto default_compare; + } else if (isspace(*a)) { a++; - if (isspace(*b++)) { + if (m->mask & STRING_COMPACT_BLANK) { + unsigned char *a_save = a, b_save = b; + while (isspace(*a)) + a++; + while (isspace(*b)) + b++; + if (b - b_save < a - a_save) { + v = 1; + break; + } + } else if (m->mask & STRING_COMPACT_OPTIONAL_BLANK) { while (isspace(*b)) b++; - } else { - v = 1; - break; } - } else if (isspace(*a) && - (m->mask & STRING_COMPACT_OPTIONAL_BLANK)) { - a++; - while (isspace(*b)) - b++; + goto default_compare; } else { + default_compare: if ((v = *b++ - *a++) != '\0') break; } @@ -1246,7 +1257,7 @@ char errmsg[512]; if (p->search.buf == NULL) - return 0; + return m->negate; rc = regcomp(&rx, m->value.s, REG_EXTENDED|REG_NOSUB|REG_NEWLINE| @@ -1262,7 +1273,7 @@ regfree(&rx); free(p->search.buf); p->search.buf = NULL; - return !rc; + return m->negate?!!rc:!rc; } } case FILE_SEARCH: @@ -1279,7 +1290,7 @@ l = 0; v = 0; if (b == NULL) - return 0; + return m->negate; len = slen; while (++range <= m->mask) { while (len-- > 0 && (v = *b++ - *a++) == 0) @@ -1313,13 +1324,6 @@ matched = 1; break; - case '!': - matched = v != l; - if ((ms->flags & MAGIC_DEBUG) != 0) - (void) fprintf(stderr, "%u != %u = %d\n", - v, l, matched); - break; - case '=': matched = v == l; if ((ms->flags & MAGIC_DEBUG) != 0) Regards, -Enrique From rvokal at redhat.com Tue Nov 29 16:18:39 2005 From: rvokal at redhat.com (Radek =?ISO-8859-1?Q?Vok=E1l?=) Date: Tue Nov 29 16:19:15 2005 Subject: Another Cracklib collision .. Message-ID: <1133273919.729.11.camel@localhost.localdomain> See bug report https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=174137 -- Radek Vok?l -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://mx.gw.com/pipermail/file/attachments/20051129/b84d1477/attachment.bin From rvokal at redhat.com Tue Nov 29 16:37:51 2005 From: rvokal at redhat.com (Radek =?ISO-8859-1?Q?Vok=E1l?=) Date: Tue Nov 29 16:38:16 2005 Subject: File should not use isprint ... Message-ID: <1133275071.729.15.camel@localhost.localdomain> Skipped content of type multipart/mixed-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://mx.gw.com/pipermail/file/attachments/20051129/74e932e8/attachment.bin From christos at zoulas.com Tue Nov 29 17:05:09 2005 From: christos at zoulas.com (Christos Zoulas) Date: Tue Nov 29 17:05:40 2005 Subject: File should not use isprint ... In-Reply-To: <1133275071.729.15.camel@localhost.localdomain> from Radek =?ISO-8859-1?Q?Vok=E1l?= (Nov 29, 3:37pm) Message-ID: <20051129150509.702DC564FA@rebar.astron.com> On Nov 29, 3:37pm, rvokal@redhat.com (Radek =?ISO-8859-1?Q?Vok=E1l?=) wrote: -- Subject: File should not use isprint ... | Hi, | | Using the provided testcase: | | $ ./file-cannot-handle-utf-1.sh | expected result: | symbolic link to '/tmp/=C2=A5=C2=B5=C3=85=D0=85=D2=B5=D3=95=E4=B8=A5=E5=BC= | =A5=E6=BC=A5' | | $ file /tmp/sln.test | /tmp/sln.test: symbolic link to | `/tmp/\302\245\302\265\303\205\320\205\322\265\323\225\344\270\245\345\274\= | 245\346\274\245' | | $ file -r /tmp/sln.test | /tmp/sln.test: symbolic link to `/tmp/=C2=A5=C2=B5=C3=85=D0=85=D2=B5=D3=95= | =E4=B8=A5=E5=BC=A5=E6=BC=A5' | | file confuses what it considers to be "non-printable" characters in file_ge= | tbuffer() in | src/funcs.c. | | file shouldn't use "isprint()" to check if a character is printable. Well, this solves the problem with UTF, but what about if the file had \n embedded in it, or other terminal escape sequences? Also what if the string did not come from a symlink, but from a %s magic? Is it really UTF then? christos From rvokal at redhat.com Tue Nov 29 17:29:45 2005 From: rvokal at redhat.com (Radek =?ISO-8859-1?Q?Vok=E1l?=) Date: Tue Nov 29 17:30:15 2005 Subject: File should not use isprint ... In-Reply-To: <20051129150509.702DC564FA@rebar.astron.com> References: <20051129150509.702DC564FA@rebar.astron.com> Message-ID: <1133278185.729.19.camel@localhost.localdomain> On Tue, 2005-11-29 at 10:05 -0500, Christos Zoulas wrote: > On Nov 29, 3:37pm, rvokal@redhat.com (Radek =?ISO-8859-1?Q?Vok=E1l?=) wrote: > -- Subject: File should not use isprint ... > > | Hi, > | > | Using the provided testcase: > | > | $ ./file-cannot-handle-utf-1.sh > | expected result: > | symbolic link to '/tmp/=C2=A5=C2=B5=C3=85=D0=85=D2=B5=D3=95=E4=B8=A5=E5=BC= > | =A5=E6=BC=A5' > | > | $ file /tmp/sln.test > | /tmp/sln.test: symbolic link to > | `/tmp/\302\245\302\265\303\205\320\205\322\265\323\225\344\270\245\345\274\= > | 245\346\274\245' > | > | $ file -r /tmp/sln.test > | /tmp/sln.test: symbolic link to `/tmp/=C2=A5=C2=B5=C3=85=D0=85=D2=B5=D3=95= > | =E4=B8=A5=E5=BC=A5=E6=BC=A5' > | > | file confuses what it considers to be "non-printable" characters in file_ge= > | tbuffer() in > | src/funcs.c. > | > | file shouldn't use "isprint()" to check if a character is printable. > > Well, this solves the problem with UTF, but what about if the file had \n > embedded in it, or other terminal escape sequences? Also what if the string > did not come from a symlink, but from a %s magic? Is it really UTF then? > > christos > True, so what about using iswctype(), after converting each mb sequence to a wchar_t, instead of using isprint()? Radek -- Radek Vok?l -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://mx.gw.com/pipermail/file/attachments/20051129/f8ff6d5a/attachment.bin From vapier at gentoo.org Tue Nov 29 17:36:26 2005 From: vapier at gentoo.org (Mike Frysinger) Date: Tue Nov 29 17:36:39 2005 Subject: File should not use isprint ... In-Reply-To: <1133278185.729.19.camel@localhost.localdomain> References: <20051129150509.702DC564FA@rebar.astron.com> <1133278185.729.19.camel@localhost.localdomain> Message-ID: <20051129153626.GF20234@toucan.gentoo.org> On Tue, Nov 29, 2005 at 04:29:45PM +0100, Radek Vok?l wrote: > On Tue, 2005-11-29 at 10:05 -0500, Christos Zoulas wrote: > > On Nov 29, 3:37pm, rvokal@redhat.com (Radek =?ISO-8859-1?Q?Vok=E1l?=) wrote: > > | file shouldn't use "isprint()" to check if a character is printable. > > > > Well, this solves the problem with UTF, but what about if the file had \n > > embedded in it, or other terminal escape sequences? Also what if the string > > did not come from a symlink, but from a %s magic? Is it really UTF then? > > True, so what about using iswctype(), after converting each mb sequence > to a wchar_t, instead of using isprint()? or define a file_isprint() function that handles wchar details much like file_mbswidth() does now ... -mike From rvokal at redhat.com Tue Nov 29 18:14:18 2005 From: rvokal at redhat.com (Radek =?ISO-8859-1?Q?Vok=E1l?=) Date: Tue Nov 29 18:14:49 2005 Subject: File should not use isprint ... In-Reply-To: <20051129153626.GF20234@toucan.gentoo.org> References: <20051129150509.702DC564FA@rebar.astron.com> <1133278185.729.19.camel@localhost.localdomain> <20051129153626.GF20234@toucan.gentoo.org> Message-ID: <1133280858.729.24.camel@localhost.localdomain> Skipped content of type multipart/mixed-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://mx.gw.com/pipermail/file/attachments/20051129/6a588ac1/attachment.bin From vapier at gentoo.org Tue Nov 29 18:55:46 2005 From: vapier at gentoo.org (Mike Frysinger) Date: Tue Nov 29 18:55:58 2005 Subject: File should not use isprint ... In-Reply-To: <1133280858.729.24.camel@localhost.localdomain> References: <20051129150509.702DC564FA@rebar.astron.com> <1133278185.729.19.camel@localhost.localdomain> <20051129153626.GF20234@toucan.gentoo.org> <1133280858.729.24.camel@localhost.localdomain> Message-ID: <20051129165546.GJ20234@toucan.gentoo.org> On Tue, Nov 29, 2005 at 05:14:18PM +0100, Radek Vok?l wrote: > --- file-4.10.bak/src/funcs.c 2004-06-04 15:40:20.000000000 +0100 > +++ file-4.10/src/funcs.c 2005-11-29 16:06:06.855668919 +0000 > @@ -32,6 +32,7 @@ > #include > #include > #include > +#include > > #ifndef lint > FILE_RCSID("@(#)$Id: funcs.c,v 1.12 2004/06/04 14:40:20 christos Exp $") you cant assume this header always exist, needs to be wrapped by HAVE_WCHAR_H -mike From christos at zoulas.com Tue Nov 29 19:49:40 2005 From: christos at zoulas.com (Christos Zoulas) Date: Tue Nov 29 19:50:14 2005 Subject: File should not use isprint ... In-Reply-To: <20051129153626.GF20234@toucan.gentoo.org> from Mike Frysinger (Nov 29, 3:36pm) Message-ID: <20051129174940.6A5C2564FA@rebar.astron.com> On Nov 29, 3:36pm, vapier@gentoo.org (Mike Frysinger) wrote: -- Subject: Re: File should not use isprint ... | On Tue, Nov 29, 2005 at 04:29:45PM +0100, Radek Vok?l wrote: | > On Tue, 2005-11-29 at 10:05 -0500, Christos Zoulas wrote: | > > On Nov 29, 3:37pm, rvokal@redhat.com (Radek =?ISO-8859-1?Q?Vok=E1l?=) wrote: | > > | file shouldn't use "isprint()" to check if a character is printable. | > > | > > Well, this solves the problem with UTF, but what about if the file had \n | > > embedded in it, or other terminal escape sequences? Also what if the string | > > did not come from a symlink, but from a %s magic? Is it really UTF then? | > | > True, so what about using iswctype(), after converting each mb sequence | > to a wchar_t, instead of using isprint()? | | or define a file_isprint() function that handles wchar details much | like file_mbswidth() does now ... That is a great idea :-) Can either of you implement it? My i18n foo is limited. christos From christos at zoulas.com Tue Nov 29 19:52:31 2005 From: christos at zoulas.com (Christos Zoulas) Date: Tue Nov 29 19:52:57 2005 Subject: File should not use isprint ... In-Reply-To: <1133280858.729.24.camel@localhost.localdomain> from Radek =?ISO-8859-1?Q?Vok=E1l?= (Nov 29, 5:14pm) Message-ID: <20051129175231.574D8564FA@rebar.astron.com> On Nov 29, 5:14pm, rvokal@redhat.com (Radek =?ISO-8859-1?Q?Vok=E1l?=) wrote: -- Subject: Re: File should not use isprint ... | ok, here's another patch which copies a part from file_mbswitch .. does | it look better to you? Looks great to me! Thanks, christos From christos at zoulas.com Tue Nov 29 20:22:35 2005 From: christos at zoulas.com (Christos Zoulas) Date: Tue Nov 29 20:23:01 2005 Subject: File should not use isprint ... In-Reply-To: <1133280858.729.24.camel@localhost.localdomain> from Radek =?ISO-8859-1?Q?Vok=E1l?= (Nov 29, 5:14pm) Message-ID: <20051129182235.7C557564FA@rebar.astron.com> On Nov 29, 5:14pm, rvokal@redhat.com (Radek =?ISO-8859-1?Q?Vok=E1l?=) wrote: -- Subject: Re: File should not use isprint ... I think that the non-printable case was slightly broken; here is what I am going to use: Index: funcs.c =================================================================== RCS file: /src/pub/file/src/funcs.c,v retrieving revision 1.17 diff -u -u -r1.17 funcs.c --- funcs.c 17 Oct 2005 19:03:34 -0000 1.17 +++ funcs.c 29 Nov 2005 18:21:33 -0000 @@ -30,6 +30,7 @@ #include #include #include +#include #ifndef lint FILE_RCSID("@(#)$Id: funcs.c,v 1.17 2005/10/17 19:03:34 christos Exp $") @@ -152,6 +153,13 @@ return 0; } +#define OCTALIFY(n, o) \ + *(n)++ = '\\', \ + *(n)++ = (((uint32_t)*(o) >> 6) & 3) + '0', \ + *(n)++ = (((uint32_t)*(o) >> 3) & 7) + '0', \ + *(n)++ = (((uint32_t)*(o) >> 0) & 7) + '0', \ + (o)++ + protected const char * file_getbuffer(struct magic_set *ms) { @@ -174,14 +182,50 @@ ms->o.pbuf = nbuf; } +#if defined(HAVE_WCHAR_H) && defined(HAVE_MBRTOWC) && defined(HAVE_WCWIDTH) + { + mbstate_t state; + wchar_t nextchar; + int mb_conv = 1; + size_t bytesconsumed; + char *eop; + (void)memset(&state, 0, sizeof(mbstate_t)); + + np = ms->o.pbuf; + op = ms->o.buf; + eop = op + strlen(ms->o.buf); + + while (op < eop) { + bytesconsumed = mbrtowc(&nextchar, op, eop - op, + &state); + if (bytesconsumed == (size_t)(-1) || + bytesconsumed == (size_t)(-2)) { + mb_conv = 0; + break; + } + + if (iswprint(nextchar) ) { + (void)memcpy(np, op, bytesconsumed); + op += bytesconsumed; + np += bytesconsumed; + } else { + while (bytesconsumed-- > 0) + OCTALIFY(np, op); + } + } + *np = '\0'; + + /* Parsing succeeded as a multi-byte sequence */ + if (mb_conv != 0) + return ms->o.pbuf; + } +#endif + for (np = ms->o.pbuf, op = ms->o.buf; *op; op++) { if (isprint((unsigned char)*op)) { *np++ = *op; } else { - *np++ = '\\'; - *np++ = (((uint32_t)*op >> 6) & 3) + '0'; - *np++ = (((uint32_t)*op >> 3) & 7) + '0'; - *np++ = (((uint32_t)*op >> 0) & 7) + '0'; + OCTALIFY(np, op); } } *np = '\0'; From vapier at gentoo.org Tue Nov 29 20:28:16 2005 From: vapier at gentoo.org (Mike Frysinger) Date: Tue Nov 29 20:28:31 2005 Subject: File should not use isprint ... In-Reply-To: <20051129182235.7C557564FA@rebar.astron.com> References: <1133280858.729.24.camel@localhost.localdomain> <20051129182235.7C557564FA@rebar.astron.com> Message-ID: <20051129182816.GL20234@toucan.gentoo.org> On Tue, Nov 29, 2005 at 01:22:35PM -0500, Christos Zoulas wrote: > --- funcs.c 17 Oct 2005 19:03:34 -0000 1.17 > +++ funcs.c 29 Nov 2005 18:21:33 -0000 > @@ -30,6 +30,7 @@ > #include > #include > #include > +#include > > #ifndef lint > FILE_RCSID("@(#)$Id: funcs.c,v 1.17 2005/10/17 19:03:34 christos Exp $") still need to wrap the include in HAVE_WCHAR_H since not everyone has the wchar.h header file -mike From christos at zoulas.com Tue Nov 29 20:44:24 2005 From: christos at zoulas.com (Christos Zoulas) Date: Tue Nov 29 20:44:55 2005 Subject: File should not use isprint ... In-Reply-To: <20051129182816.GL20234@toucan.gentoo.org> from Mike Frysinger (Nov 29, 6:28pm) Message-ID: <20051129184424.E627456527@rebar.astron.com> On Nov 29, 6:28pm, vapier@gentoo.org (Mike Frysinger) wrote: -- Subject: Re: File should not use isprint ... | On Tue, Nov 29, 2005 at 01:22:35PM -0500, Christos Zoulas wrote: | > --- funcs.c 17 Oct 2005 19:03:34 -0000 1.17 | > +++ funcs.c 29 Nov 2005 18:21:33 -0000 | > @@ -30,6 +30,7 @@ | > #include | > #include | > #include | > +#include | > | > #ifndef lint | > FILE_RCSID("@(#)$Id: funcs.c,v 1.17 2005/10/17 19:03:34 christos Exp $") | | still need to wrap the include in HAVE_WCHAR_H since not everyone has | the wchar.h header file | -mike Thanks, I did it after I got your mail! christos From rvokal at redhat.com Wed Nov 30 09:34:51 2005 From: rvokal at redhat.com (=?ISO-8859-1?Q?Radek_Vok=E1l?=) Date: Wed Nov 30 09:35:10 2005 Subject: File should not use isprint ... In-Reply-To: <20051129182235.7C557564FA@rebar.astron.com> References: <20051129182235.7C557564FA@rebar.astron.com> Message-ID: <438D561B.4010706@redhat.com> Christos Zoulas wrote: > On Nov 29, 5:14pm, rvokal@redhat.com (Radek =?ISO-8859-1?Q?Vok=E1l?=) wrote: > -- Subject: Re: File should not use isprint ... > > I think that the non-printable case was slightly broken; here is what > I am going to use: > Tested, works great. The authos of the patch is not me this time but Bastien Nocera . Thanks Radek From christos at zoulas.com Wed Nov 30 15:51:02 2005 From: christos at zoulas.com (Christos Zoulas) Date: Wed Nov 30 15:51:35 2005 Subject: File should not use isprint ... In-Reply-To: <438D561B.4010706@redhat.com> from =?ISO-8859-1?Q?Radek_Vok=E1l?= (Nov 30, 8:34am) Message-ID: <20051130135102.D3F6B564FA@rebar.astron.com> On Nov 30, 8:34am, rvokal@redhat.com (=?ISO-8859-1?Q?Radek_Vok=E1l?=) wrote: -- Subject: Re: File should not use isprint ... | Christos Zoulas wrote: | > On Nov 29, 5:14pm, rvokal@redhat.com (Radek =?ISO-8859-1?Q?Vok=E1l?=) wrote: | > -- Subject: Re: File should not use isprint ... | > | > I think that the non-printable case was slightly broken; here is what | > I am going to use: | > | | Tested, works great. The authos of the patch is not me this time but | Bastien Nocera . Thanks! I will change the attibution. christos