From oscaruser at programmer.net Wed Jan 20 02:30:30 2010 From: oscaruser at programmer.net (oscaruser at programmer.net) Date: Tue, 19 Jan 2010 19:30:30 -0500 Subject: Crystal Reports Mime Type Designation Message-ID: <8CC677EE78F3FA6-E00-38DA@web-mmc-d13.sysops.aol.com> Greetings Christos, I've been using mime-type designation, 'application/x-rpt; charset=binary' for Crystal Reports based on FileExt.com's registry. http://filext.com/file-extension/RPC. Is there an official answer? Is this type be handled in the code at all? I would like to leave this file identification up to the source maintainer rather than keep a branch update. I have recently inquired with SAP.com (10 minutes ago) to see if they can (or will) answer this question in some kind of official manner, but I suspect my request is /dev/null'd. Thank you, -OSC :D -------------- next part -------------- An HTML attachment was scrubbed... URL: From christos at zoulas.com Wed Jan 20 03:13:51 2010 From: christos at zoulas.com (Christos Zoulas) Date: Tue, 19 Jan 2010 20:13:51 -0500 Subject: Crystal Reports Mime Type Designation In-Reply-To: <8CC677EE78F3FA6-E00-38DA@web-mmc-d13.sysops.aol.com> from oscaruser@programmer.net (Jan 19, 7:30pm) Message-ID: <20100120011351.8D9E95654E@rebar.astron.com> On Jan 19, 7:30pm, oscaruser at programmer.net (oscaruser at programmer.net) wrote: -- Subject: Crystal Reports Mime Type Designation | Greetings Christos, | | I've been using mime-type designation, 'application/x-rpt; charset=3Dbinary= | ' for Crystal Reports based on FileExt.com's registry. http://filext.com/fi= | le-extension/RPC. Is there an official answer? Is this type be handled in t= | he code at all? I would like to leave this file identification up to the so= | urce maintainer rather than keep a branch update. I have recently inquired = | with SAP.com (10 minutes ago) to see if they can (or will) answer this ques= | tion in some kind of official manner, but I suspect my request is /dev/null= | 'd. | I guess that's ok, and we can make file output that. christos From christos at zoulas.com Fri Jan 22 23:51:13 2010 From: christos at zoulas.com (Christos Zoulas) Date: Fri, 22 Jan 2010 16:51:13 -0500 Subject: file-5.04 is now available Message-ID: <20100122215113.AF1EE5654E@rebar.astron.com> Hello, This is a bug fix and security release. There are no new features. ftp://ftp.astron.com/pub/file/file-5.04.tar.gz Enjoy, christos 2010-01-22 15:45 Christos Zoulas * print proper mime for crystal reports file * print the last summary information of a cdf document, not the first so that nested documents print the right info 2010-01-16 18:42 Charles Longeau * bring back some fixes from OpenBSD: - make gcc2 builds file - fix typos in a magic file comment 2009-11-17 18:35 Christos Zoulas * ctime/asctime can return NULL on some OS's although they should not (Toshit Antani) 2009-09-14 13:49 Christos Zoulas * Centralize magic path handling routines and remove the special-casing from file.c so that the python module for example comes up with the same magic path (Fixes ~/.magic handling) (from Gab) 2009-09-11 23:38 Reuben Thomas * When magic argument is a directory, read the files in strcmp-sorted order (fixes Debian bug #488562 and our own FIXME). 2009-09-11 13:11 Reuben Thomas * Combine overlapping epoc and psion magic files into one (epoc). * Add some more EPOC MIME types. 2009-08-19 15:55 Christos Zoulas * Fix 3 bugs (From Ian Darwin): - file_showstr could move one past the end of the array - parse_apple did not nul terminate the string in the overflow case - parse_mime truncated the wrong string in the overflow case 2009-08-12 12:28 Robert Byrnes * Include Localstuff when compiling magic. 2009-07-15 10:05 Christos Zoulas * Fix logic for including mygetopts.h * Make cdf.c compile again with debugging * Add the necessary field handling for crystal reports files to work 2009-06-23 01:34 Reuben Thomas * Stop "(if" identifying Lisp files, that's plain dumb! 2009-06-09 22:13 Reuben Thomas * Add a couple of missing MP3 MIME types. 2009-05-27 23:00 Reuben Thomas * Add full range of hash-bang tests for Python and Ruby. * Add MIME types for Python and Ruby scripts. 2009-05-13 10:44 Christos Zoulas * off by one in parsing hw capabilities in elf (Cheng Renquan) 2009-05-08 13:40 Christos Zoulas * lint fixes and more from NetBSD 2009-05-06 10:25 Christos Zoulas * Avoid null dereference in cdf code (Drew Yao) * More cdf bounds checks and overflow checks From gillen.daniel at gmail.com Thu Jan 28 10:22:06 2010 From: gillen.daniel at gmail.com (Gillen Daniel) Date: Thu, 28 Jan 2010 09:22:06 +0100 Subject: Question regarding MAGIC_NO_CHECK_* flags Message-ID: <4B61492E.6020907@gmail.com> Hi @all I'm currently developing an application for browsing file systems and want to implement libmagic in order to get the type of displayed files. As I don't need a human readable output, but one that I can use internally, I created my own magic file. During tests, I saw that libmagic is doing some internal stuff to recognize some file types (ASCII files for ex.). I found some MAGIC_NO_CHECK_* defines in the header file which I can use to disable this behavior. But porting my app from an older library version to 5.03 (Ubuntu 9.04 -> Ubuntu 9.10), I have to use more flags to disable all the internal checks. My question now is if there is some sort of special flag that disables all internal checks whatever version of libmagic I'm using just leaving the file recognition to the magic file. Thx in advance Dan From christos at zoulas.com Thu Jan 28 17:06:59 2010 From: christos at zoulas.com (Christos Zoulas) Date: Thu, 28 Jan 2010 10:06:59 -0500 Subject: Question regarding MAGIC_NO_CHECK_* flags In-Reply-To: <4B61492E.6020907@gmail.com> from Gillen Daniel (Jan 28, 9:22am) Message-ID: <20100128150659.C69C95654E@rebar.astron.com> On Jan 28, 9:22am, gillen.daniel at gmail.com (Gillen Daniel) wrote: -- Subject: Question regarding MAGIC_NO_CHECK_* flags | Hi @all | | I'm currently developing an application for browsing file systems and | want to implement libmagic in order to get the type of displayed files. | As I don't need a human readable output, but one that I can use | internally, I created my own magic file. During tests, I saw that | libmagic is doing some internal stuff to recognize some file types | (ASCII files for ex.). | | I found some MAGIC_NO_CHECK_* defines in the header file which I can use | to disable this behavior. But porting my app from an older library | version to 5.03 (Ubuntu 9.04 -> Ubuntu 9.10), I have to use more flags | to disable all the internal checks. | | My question now is if there is some sort of special flag that disables | all internal checks whatever version of libmagic I'm using just leaving | the file recognition to the magic file. Well, 1. These flags are going to change from version to version because of the addition/removal of features. 2. One of the flags is to disable the magic file itself, which is just an additional check, so what you really want is NO_CHECK_ALL, but check SOFTMAGIC. I don't think it is unreasonable though to add NO_CHECK_BUILTIN to do what you want. It will not help you with older versions of the library. 3. Some formats cannot be detected with magic files (microsoft documents, tar files, etc.) so disabling those checks cripples functionality. 4. If you want standardized output why don't you use mime? christos From dnovotny at redhat.com Thu Jan 28 16:52:47 2010 From: dnovotny at redhat.com (Daniel Novotny) Date: Thu, 28 Jan 2010 09:52:47 -0500 (EST) Subject: segfault in star.ulaw In-Reply-To: <1132987823.305851264690204138.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> Message-ID: <2081858899.306171264690367028.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> hello, our user has filed a Fedora bug about file segfault: $ file star.ulaw Segmentation fault (core dumped) now I was able to analyze the error and propose a fix: the file is a raw sound sample, I have uploaded it to http://people.fedoraproject.org/~dnovotny/f/star.ulaw a gdb analysis of the crash shows line 950 in softmagic.c for (lines = linecnt, b = buf; lines && ((b = CAST(const char *, memchr(c = b, '\n', CAST(size_t, (end - b))))) || (b = CAST(const char *, memchr(c, '\r', CAST(size_t, (end - c)))))); lines--, b++) { last = b; if (b[0] == '\r' && b[1] == '\n') b++; } for some reason the pointer "b" have run outside of the array and b > end, so the substraction yields negative number, which is then cast to unsigned my patch adds "b < end && c < end" to the "for" condition, so the cycle will not run away from "our" portion of memory this seems to fix the bug (the file is then classified as "data"), I tried a few sanity tests with the patched program and everything seems to work normally I attach backtrace and the patch, I can provide a core file, too, but the bug is easily reproducible with the test file downstream bug report link for reference: https://bugzilla.redhat.com/show_bug.cgi?id=533245 regards, Daniel Novotny, Red Hat inc. -------------- next part -------------- A non-text attachment was scrubbed... Name: file-5.04-ulaw-segfault.patch Type: text/x-patch Size: 553 bytes Desc: not available URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: file-ulaw-segfault-backtrace.txt URL: From gillen.daniel at gmail.com Thu Jan 28 20:41:41 2010 From: gillen.daniel at gmail.com (Gillen Daniel) Date: Thu, 28 Jan 2010 19:41:41 +0100 Subject: Question regarding MAGIC_NO_CHECK_* flags In-Reply-To: <4B61D5C9.1010405@gmail.com> References: <20100128150659.C69C95654E@rebar.astron.com> <4B61D5C9.1010405@gmail.com> Message-ID: <4B61DA65.1000602@gmail.com> Hi again I'm really sorry but I just realized that I used my stripped magic file to do the tests :( Using the shipped magic file, I can differentiate between .tar.gz and .png files using mime output and I am able to group all images with "image/*" for example. This should do the trick for me. Thx anyway for your help. Dan Gillen Daniel wrote: > Hi > > Thx for your answer. > > A flag like NO_CHECK_BUILTIN would be great. Nevertheless this won't fix > my issue with older libraries, it would fix it at least in the future :) > > Everything that isn't supported by the magic file could be done manually > afterwards by my app. This wouldn't be a problem for me as I only need a > small amount of files to be identified. The other could be classified > more generally (binary, text etc...). > > I did look at the mime output, but this is already too specific / not > specific enough for me. My app is handing files over to analyzer plugins > depending on the file type. And there I don't need to know whether a > file is for example a c++ text file or a plain ASCII file. I just need > to know if it is plain ASCII or not. While this would be possible with > mime types by looking at the first part before the "/", it doesn't work > for example to differentiate between a .tar.gz and a .png file, as both > produce application/octed-stream and I have to call a different plugin > for .tar.gz than for .png files. > > Dan > > Christos Zoulas wrote: > >> On Jan 28, 9:22am, gillen.daniel at gmail.com (Gillen Daniel) wrote: >> -- Subject: Question regarding MAGIC_NO_CHECK_* flags >> >> | Hi @all >> | >> | I'm currently developing an application for browsing file systems and >> | want to implement libmagic in order to get the type of displayed files. >> | As I don't need a human readable output, but one that I can use >> | internally, I created my own magic file. During tests, I saw that >> | libmagic is doing some internal stuff to recognize some file types >> | (ASCII files for ex.). >> | >> | I found some MAGIC_NO_CHECK_* defines in the header file which I can use >> | to disable this behavior. But porting my app from an older library >> | version to 5.03 (Ubuntu 9.04 -> Ubuntu 9.10), I have to use more flags >> | to disable all the internal checks. >> | >> | My question now is if there is some sort of special flag that disables >> | all internal checks whatever version of libmagic I'm using just leaving >> | the file recognition to the magic file. >> >> Well, >> >> 1. These flags are going to change from version to version because of >> the addition/removal of features. >> 2. One of the flags is to disable the magic file itself, which is just >> an additional check, so what you really want is NO_CHECK_ALL, but >> check SOFTMAGIC. I don't think it is unreasonable though to add >> NO_CHECK_BUILTIN to do what you want. It will not help you with older >> versions of the library. >> 3. Some formats cannot be detected with magic files (microsoft documents, >> tar files, etc.) so disabling those checks cripples functionality. >> 4. If you want standardized output why don't you use mime? >> >> christos >> >> _______________________________________________ >> File mailing list >> File at mx.gw.com >> http://mx.gw.com/mailman/listinfo/file >> >> > > From gillen.daniel at gmail.com Thu Jan 28 20:22:01 2010 From: gillen.daniel at gmail.com (Gillen Daniel) Date: Thu, 28 Jan 2010 19:22:01 +0100 Subject: Question regarding MAGIC_NO_CHECK_* flags In-Reply-To: <20100128150659.C69C95654E@rebar.astron.com> References: <20100128150659.C69C95654E@rebar.astron.com> Message-ID: <4B61D5C9.1010405@gmail.com> Hi Thx for your answer. A flag like NO_CHECK_BUILTIN would be great. Nevertheless this won't fix my issue with older libraries, it would fix it at least in the future :) Everything that isn't supported by the magic file could be done manually afterwards by my app. This wouldn't be a problem for me as I only need a small amount of files to be identified. The other could be classified more generally (binary, text etc...). I did look at the mime output, but this is already too specific / not specific enough for me. My app is handing files over to analyzer plugins depending on the file type. And there I don't need to know whether a file is for example a c++ text file or a plain ASCII file. I just need to know if it is plain ASCII or not. While this would be possible with mime types by looking at the first part before the "/", it doesn't work for example to differentiate between a .tar.gz and a .png file, as both produce application/octed-stream and I have to call a different plugin for .tar.gz than for .png files. Dan Christos Zoulas wrote: > On Jan 28, 9:22am, gillen.daniel at gmail.com (Gillen Daniel) wrote: > -- Subject: Question regarding MAGIC_NO_CHECK_* flags > > | Hi @all > | > | I'm currently developing an application for browsing file systems and > | want to implement libmagic in order to get the type of displayed files. > | As I don't need a human readable output, but one that I can use > | internally, I created my own magic file. During tests, I saw that > | libmagic is doing some internal stuff to recognize some file types > | (ASCII files for ex.). > | > | I found some MAGIC_NO_CHECK_* defines in the header file which I can use > | to disable this behavior. But porting my app from an older library > | version to 5.03 (Ubuntu 9.04 -> Ubuntu 9.10), I have to use more flags > | to disable all the internal checks. > | > | My question now is if there is some sort of special flag that disables > | all internal checks whatever version of libmagic I'm using just leaving > | the file recognition to the magic file. > > Well, > > 1. These flags are going to change from version to version because of > the addition/removal of features. > 2. One of the flags is to disable the magic file itself, which is just > an additional check, so what you really want is NO_CHECK_ALL, but > check SOFTMAGIC. I don't think it is unreasonable though to add > NO_CHECK_BUILTIN to do what you want. It will not help you with older > versions of the library. > 3. Some formats cannot be detected with magic files (microsoft documents, > tar files, etc.) so disabling those checks cripples functionality. > 4. If you want standardized output why don't you use mime? > > christos > > _______________________________________________ > File mailing list > File at mx.gw.com > http://mx.gw.com/mailman/listinfo/file > From daniel.leidert.spam at gmx.net Fri Jan 29 15:42:38 2010 From: daniel.leidert.spam at gmx.net (Daniel Leidert) Date: Fri, 29 Jan 2010 14:42:38 +0100 Subject: Detection of application/xslt+xml (XSLT) Message-ID: <1264772558.6243.50.camel@haktar.wgdd.de> Hi, When I try to get the MIME type of an XSLT file I receive application/xml. This is probably due to the magic file. I would suggest to return application/xslt+xml if xsl:stylesheet or xsl:transform is found. What do you think? Sample file attached. Regards, Daniel -------------- next part -------------- A non-text attachment was scrubbed... Name: test.xsl Type: application/xslt+xml Size: 193 bytes Desc: not available URL: From schwehr at ccom.unh.edu Mon Feb 1 20:45:29 2010 From: schwehr at ccom.unh.edu (Kurt Schwehr) Date: Mon, 01 Feb 2010 13:45:29 -0500 Subject: Marine sciences surveying data formats Message-ID: <4B672149.8030109@ccom.unh.edu> Hi Christos and company, I've been working up file definitions for common data types in marine geology/geophysics and oceanography. Many of the formats that I've been working with do not have very good identifiable characteristics, but I figured that after a year, it time so submit at least a bigging batch of files. The complete file that I have been working on is here: http://vislab-ccom.unh.edu/~schwehr/software/simplesegy/magic Attached are the definitions that are worthy of consideration for inclusion in file. Please let me know what you think. Thanks! -kurt Res. Assist. Professor, Ocean Engineering, UNH CCOM/JHC http://schwehr.org/blog -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: magic-submit URL: From christos at zoulas.com Mon Feb 1 23:39:19 2010 From: christos at zoulas.com (Christos Zoulas) Date: Mon, 1 Feb 2010 16:39:19 -0500 Subject: Marine sciences surveying data formats In-Reply-To: <4B672149.8030109@ccom.unh.edu> from Kurt Schwehr (Feb 1, 1:45pm) Message-ID: <20100201213919.446AD5654F@rebar.astron.com> On Feb 1, 1:45pm, schwehr at ccom.unh.edu (Kurt Schwehr) wrote: -- Subject: Marine sciences surveying data formats | Hi Christos and company, | | I've been working up file definitions for common data types in marine | geology/geophysics and oceanography. Many of the formats that I've been | working with do not have very good identifiable characteristics, but I | figured that after a year, it time so submit at least a bigging batch of | files. The complete file that I have been working on is here: | | http://vislab-ccom.unh.edu/~schwehr/software/simplesegy/magic | | Attached are the definitions that are worthy of consideration for | inclusion in file. Please let me know what you think. | | Thanks! | -kurt | Res. Assist. Professor, Ocean Engineering, UNH CCOM/JHC | http://schwehr.org/blog Thanks Kurt, For the most part these look ok. I am a bit worried about the <= 4 bytes of magic ones, but we'll try them and see how they work out. christos From ghelmer at palisadesys.com Sat Feb 6 00:48:53 2010 From: ghelmer at palisadesys.com (Guy Helmer) Date: Fri, 5 Feb 2010 16:48:53 -0600 Subject: file-5.04 and Excel 2007 files Message-ID: I'm trying to identity Excel 2007 files; I thought file 5.04 would be able to, but I am still getting the result: Book1.xlsx: Zip archive data, at least v2.0 to extract Any hints? Thanks, Guy Helmer From christos at zoulas.com Sat Feb 6 01:32:17 2010 From: christos at zoulas.com (Christos Zoulas) Date: Fri, 5 Feb 2010 18:32:17 -0500 Subject: file-5.04 and Excel 2007 files In-Reply-To: from Guy Helmer (Feb 5, 4:48pm) Message-ID: <20100205233217.6E5A45654E@rebar.astron.com> On Feb 5, 4:48pm, ghelmer at palisadesys.com (Guy Helmer) wrote: -- Subject: file-5.04 and Excel 2007 files | I'm trying to identity Excel 2007 files; I thought file 5.04 would be able to, but I am still getting the result: | Book1.xlsx: Zip archive data, at least v2.0 to extract not yet. I need to link in a libzip so we can look inside. christos From schwehr at ccom.unh.edu Sun Feb 7 14:39:02 2010 From: schwehr at ccom.unh.edu (Kurt Schwehr) Date: Sun, 07 Feb 2010 07:39:02 -0500 Subject: Marine sciences surveying data formats In-Reply-To: <20100201213919.446AD5654F@rebar.astron.com> References: <20100201213919.446AD5654F@rebar.astron.com> Message-ID: <4B6EB466.8070801@ccom.unh.edu> Sounds good. If some of these don't work, I understand. I threw out a whole bunch before submitting them already. Just let me know which ones cause trouble or work and I will flag them in my development file. Thanks! -kurt Christos Zoulas wrote: > | Hi Christos and company, > | > | I've been working up file definitions for common data types in marine > | geology/geophysics and oceanography. Many of the formats that I've been > | working with do not have very good identifiable characteristics, but I > | figured that after a year, it time so submit at least a bigging batch of > | files. The complete file that I have been working on is here: > | > | http://vislab-ccom.unh.edu/~schwehr/software/simplesegy/magic > | > | Attached are the definitions that are worthy of consideration for > | inclusion in file. Please let me know what you think. > | > > Thanks Kurt, > > For the most part these look ok. I am a bit worried about the <= 4 bytes > of magic ones, but we'll try them and see how they work out. > > christos > > From christos at zoulas.com Sun Feb 7 18:58:40 2010 From: christos at zoulas.com (Christos Zoulas) Date: Sun, 7 Feb 2010 11:58:40 -0500 Subject: Marine sciences surveying data formats In-Reply-To: <4B6EB466.8070801@ccom.unh.edu> from Kurt Schwehr (Feb 7, 7:39am) Message-ID: <20100207165840.CA0BF5654E@rebar.astron.com> On Feb 7, 7:39am, schwehr at ccom.unh.edu (Kurt Schwehr) wrote: -- Subject: Re: Marine sciences surveying data formats | Sounds good. If some of these don't work, I understand. I threw out a | whole bunch before submitting them already. Just let me know which ones | cause trouble or work and I will flag them in my development file. | I put them all in for now. We'll see which ones stick :-) Thanks, christos From oscaruser at programmer.net Tue Feb 9 02:10:11 2010 From: oscaruser at programmer.net (oscaruser at programmer.net) Date: Mon, 08 Feb 2010 19:10:11 -0500 Subject: file-5.04 and Excel 2007 files In-Reply-To: <20100205233217.6E5A45654E@rebar.astron.com> Message-ID: <8CC7733616FFDAF-1B0-2ADD@web-mmc-m06.sysops.aol.com> This is what we devised for the 'OpenOffice' doc recognition. It is a front end perl script, and returns the mime type, e.g. file -i option. It accepts either a stream input or filename passed from the command line. Assumes file util installed at /usr/local/bin/file. Comments/updates welcome. OSC -----Original Message----- From: Christos Zoulas To: File Utility Sent: Fri, Feb 5, 2010 3:32 pm Subject: Re: file-5.04 and Excel 2007 files On Feb 5, 4:48pm, ghelmer at palisadesys.com (Guy Helmer) wrote: -- Subject: file-5.04 and Excel 2007 files | I'm trying to identity Excel 2007 files; I thought file 5.04 would be able to, but I am still getting the result: | Book1.xlsx: Zip archive data, at least v2.0 to extract not yet. I need to link in a libzip so we can look inside. christos _______________________________________________ File mailing list File at mx.gw.com http://mx.gw.com/mailman/listinfo/file -------------- next part -------------- A non-text attachment was scrubbed... Name: mime_type.pl Type: application/octet-stream Size: 4272 bytes Desc: not available URL: From daxim at cpan.org Fri Feb 19 19:30:55 2010 From: daxim at cpan.org (Lars =?utf-8?b?RMmq4bSH4bSE4bSL4bSP4bShIOi/quaLieaWrw==?=) Date: Fri, 19 Feb 2010 18:30:55 +0100 Subject: POD files with =encoding Message-ID: <201002191831.08068.daxim@cpan.org> Hello list, attached is a patch that makes file recognise POD files that start with the =encoding command paragraph. It applies cleanly against 5.04. -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-http-perldoc.perl.org-perlpod.html-3dencoding-_encod.patch Type: text/x-patch Size: 912 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. URL: From j at hug.gs Wed Feb 24 00:19:44 2010 From: j at hug.gs (Dr. Jesus) Date: Tue, 23 Feb 2010 14:19:44 -0800 Subject: [patch] Man page fixes Message-ID: The "message" column somehow lost its .It macro. While I was fixing that I noticed some other errors and fixed those as well. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: file-man.patch Type: application/octet-stream Size: 2177 bytes Desc: not available URL: From j at hug.gs Wed Feb 24 00:18:05 2010 From: j at hug.gs (Dr. Jesus) Date: Tue, 23 Feb 2010 14:18:05 -0800 Subject: [patch] Windows PE fixes Message-ID: These attached patches are an attempt to clean up the msdos module. Among other things, Windows CE binaries and system images are properly recognized now, and x64 handling works even in the presence of sections containing MSIL. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: file-msdos.patch Type: application/octet-stream Size: 13410 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: file-lynx-omf-disambiguate.patch Type: application/octet-stream Size: 658 bytes Desc: not available URL: From christos at zoulas.com Wed Feb 24 01:29:32 2010 From: christos at zoulas.com (Christos Zoulas) Date: Tue, 23 Feb 2010 18:29:32 -0500 Subject: [patch] Man page fixes In-Reply-To: from "Dr. Jesus" (Feb 23, 2:19pm) Message-ID: <20100223232932.8970B5654E@rebar.astron.com> On Feb 23, 2:19pm, j at hug.gs ("Dr. Jesus") wrote: -- Subject: [patch] Man page fixes Thanks a lot! christos From christos at zoulas.com Wed Feb 24 01:37:19 2010 From: christos at zoulas.com (Christos Zoulas) Date: Tue, 23 Feb 2010 18:37:19 -0500 Subject: [patch] Windows PE fixes In-Reply-To: from "Dr. Jesus" (Feb 23, 2:18pm) Message-ID: <20100223233719.1AB015654E@rebar.astron.com> On Feb 23, 2:18pm, j at hug.gs ("Dr. Jesus") wrote: -- Subject: [patch] Windows PE fixes | These attached patches are an attempt to clean up the msdos module. Among | other things, Windows CE binaries and system images are properly recognized | now, and x64 handling works even in the presence of sections containing | MSIL. Applied, thanks. christos From j at hug.gs Thu Mar 18 21:07:14 2010 From: j at hug.gs (Dr. Jesus) Date: Thu, 18 Mar 2010 12:07:14 -0700 Subject: WIM file Message-ID: For the msdos module. Sorry, no patch, this is on top of the previous patch I sent you and I guess the file project isn't under source control. # Windows Imaging (WIM) Image 0 string MSWIM\000\000\000 Windows imaging (WIM) image From christos at zoulas.com Thu Mar 18 21:51:21 2010 From: christos at zoulas.com (Christos Zoulas) Date: Thu, 18 Mar 2010 15:51:21 -0400 Subject: WIM file In-Reply-To: from "Dr. Jesus" (Mar 18, 12:07pm) Message-ID: <20100318195121.CC9E656425@rebar.astron.com> On Mar 18, 12:07pm, j at hug.gs ("Dr. Jesus") wrote: -- Subject: WIM file | For the msdos module. Sorry, no patch, this is on top of the previous | patch I sent you and I guess the file project isn't under source | control. | | | | # Windows Imaging (WIM) Image | 0 string MSWIM\000\000\000 Windows imaging (WIM) image Added, thanks. christos From Jens.Schleusener at t-systems-sfr.com Fri Mar 19 12:11:58 2010 From: Jens.Schleusener at t-systems-sfr.com (Jens Schleusener) Date: Fri, 19 Mar 2010 11:11:58 +0100 (CET) Subject: Obvious "ASCII text" file detected as "HTML document text" Message-ID: Hi, within the "SfR Fresh"-Archiv filetype detection is done with the great program "file". For HTML files automatically a rendered display is tried ("simple" text representation is also available). So I just detected a "file" problem, see (incorrectly interpreted as HTML) http://www.sfr-fresh.com/unix/www/eprints-3.2.0.tar.gz:a/eprints-3.2.0/NEWS (the text file is incorrectly interpreted as HTML) and http://www.sfr-fresh.com/unix/www/eprints-3.2.0.tar.gz:t/eprints-3.2.0/NEWS where the same file is correctly displayed as text. Analyzing that file I got the impression that the only existence of the string anywhere in the analyzed otherwise pure text file and without any other "HTML"-indications caused that errorneous (?) behaviour. In the mentioned example that is the line: * Added more metadata to the of the summary pages, for the benefit of search engines. Regards Jens -- T-Systems Solutions for Research GmbH Solutions & Innovations Commercial ICT, Internet- & Intranet-Appl. Dr. Jens Schleusener Bunsenstr. 10, D-37073 G?ttingen +49 551 709-2493 (Tel.) +49 551 709-2169 (Fax) E-Mail: Jens.Schleusener at t-systems.com Internet: http://www.t-systems.com T-Systems Solutions for Research GmbH Management Board: J?rgen Aumayer (Chairman), Hans Gersing Commercial register: Amtsgericht M?nchen, HRB 12 55 01, Registered Office: We?ling VAT ident no.: DE 193456493 Notice: This transmittal and/or attachments may be privileged or confidential. If you are not the intended recipient, you are hereby notified that you have received this transmittal in error; any review, dissemination, or copying is strictly prohibited. If you received this transmittal in error, please notify us immediately by reply and immediately delete this message and all its attachments. Thank you. From ian at darwinsys.com Sat Mar 20 19:34:40 2010 From: ian at darwinsys.com (Ian Darwin) Date: Sat, 20 Mar 2010 13:34:40 -0400 Subject: Obvious "ASCII text" file detected as "HTML document text" In-Reply-To: References: Message-ID: <4BA50730.8010007@darwinsys.com> Jens Schleusener wrote: > Hi, > > within the "SfR Fresh"-Archiv filetype detection is done with the > great program "file". > > For HTML files automatically a rendered display is tried ("simple" > text representation is also available). > > So I just detected a "file" problem, see (incorrectly interpreted as > HTML) > ... > In the mentioned example that is the line: > > * Added more metadata to the of the summary pages, for the > benefit of search engines. Thanks, but file(1) is not an AI program. I think you should not put HTML elements in text files if it is important to you that they be displayed as plain text. This includes putting HTML elements in CVS/SVN/... commit logs which get saved as text. There is no reason the committer could not have written: > * Added more metadata to the "head" section of the summary pages, ... Ian From Jens.Schleusener at t-systems-sfr.com Sun Mar 21 21:38:59 2010 From: Jens.Schleusener at t-systems-sfr.com (Jens Schleusener) Date: Sun, 21 Mar 2010 20:38:59 +0100 (CET) Subject: Obvious "ASCII text" file detected as "HTML document text" In-Reply-To: <4BA50730.8010007@darwinsys.com> References: <4BA50730.8010007@darwinsys.com> Message-ID: Hi Ian, >> within the "SfR Fresh"-Archiv filetype detection is done with the >> great program "file". >> >> For HTML files automatically a rendered display is tried ("simple" >> text representation is also available). >> >> So I just detected a "file" problem, see (incorrectly interpreted as >> HTML) >> ... >> In the mentioned example that is the line: >> >> * Added more metadata to the of the summary pages, for the >> benefit of search engines. > > Thanks, but file(1) is not an AI program. I think you should not put > HTML elements in text files if it is important to you that they be > displayed as plain text. Thanks for your answer but 1) I am not the author of the just as an example mentioned file (with the incorrect detected file type) but the maintainer of the mentioned software archive. 2) although "file" isn't an AI program detecting pure HTML files shouldn't be too difficult, see http://www.w3.org/TR/html401/struct/global.html: An HTML 4 document is composed of three parts: 1. a line containing HTML version information, 2. a declarative header section (delimited by the HEAD element), 3. a body, which contains the document's actual content. The body may be implemented by the BODY element or the FRAMESET element. White space (spaces, newlines, tabs, and comments) may appear before or after each section. Sections 2 and 3 should be delimited by the HTML element. Here's an example of a simple HTML document: My first HTML document

Hello world! Ok, a problem may be non-HTML filed containing HTML code snippets, > This includes putting HTML elements in CVS/SVN/... commit logs which get > saved as text. There is no reason the committer could not have written: > > > * Added more metadata to the "head" section of the summary pages, ... That's right. But after doing some more testing to breakdown the problem I have the impression that is a problem introduced in the current release 5.0.4: Using the following test file "file_html_test.txt" 000000000000000000000000000000000000000000000000000000 11111 This is obviously not a HTML document text 11111 222222222222222222222222222222222222222222222222222222 333333333333333333333333333333333333333333333333 444444444444444444444444444444444444444444444444444444 555555555 But "file" unfortunately think so! 555555555 666666666666666666666666666666666666666666666666666666 I got the following results with self-compiled binaries (under OpenSUSE 11,2): a) file (release 5.0.3): file_html_test.txt: ASCII text b) file (release 5.0.4): file_html_test.txt: HTML document text But in the new Changelog entries of release 5.0.4 I couldn't find an entry regarding that different behaviour. Sorry, I am not a "file" expert. Any debugging hints to find the matching magic rules? Regards Jens From ian at darwinsys.com Sun Mar 21 23:52:56 2010 From: ian at darwinsys.com (Ian Darwin) Date: Sun, 21 Mar 2010 17:52:56 -0400 Subject: Obvious "ASCII text" file detected as "HTML document text" In-Reply-To: References: <4BA50730.8010007@darwinsys.com> Message-ID: <4BA69538.1080700@darwinsys.com> Jens Schleusener wrote: > Hi Ian, > >>> within the "SfR Fresh"-Archiv filetype detection is done with the >>> great program "file". >>> >>> For HTML files automatically a rendered display is tried ("simple" >>> text representation is also available). >>> >>> So I just detected a "file" problem, see (incorrectly interpreted as >>> HTML) >>> ... >>> In the mentioned example that is the line: >>> >>> * Added more metadata to the of the summary pages, for the >>> benefit of search engines. >> >> Thanks, but file(1) is not an AI program. I think you should not put >> HTML elements in text files if it is important to you that they be >> displayed as plain text. > > Thanks for your answer but > > 1) I am not the author of the just as an example mentioned file (with > the incorrect detected file type) but the maintainer of the mentioned > software archive. > > 2) although "file" isn't an AI program detecting pure HTML files > shouldn't be too difficult, see > http://www.w3.org/TR/html401/struct/global.html: > > An HTML 4 document is composed of three parts: > > 1. a line containing HTML version information, > 2. a declarative header section (delimited by the HEAD element), > 3. a body, which contains the document's actual content. The body may > be implemented by the BODY element or the FRAMESET element. > > White space (spaces, newlines, tabs, and comments) may appear before or > after each section. Sections 2 and 3 should be delimited by the HTML > element. Here's an example of a simple HTML document: > Yes, I'm quite familiar with the HTML document format, having been the person who released the world's first commercial HTML editor back in the early 1990's. So. You want "file" to build in an XML/XHTML/HTML4 parser, so we can handle DOCTYPEs, xmlns's, comments, spaces? Not gonna happen. Maybe a regex? \s?(<[^>]+>)+ But it will fail if the user has a valid HTML file like This is not a tag I don't know what the answer is, but it has to be something that doesn't make file even slower than it's already become. I'm guessing we'll end up having to write another C module to handle HTML, XML, XHTML and related stuff. Thoughts? From christos at zoulas.com Mon Mar 22 01:16:45 2010 From: christos at zoulas.com (Christos Zoulas) Date: Sun, 21 Mar 2010 19:16:45 -0400 Subject: Obvious "ASCII text" file detected as "HTML document text" In-Reply-To: <4BA69538.1080700@darwinsys.com> from Ian Darwin (Mar 21, 5:52pm) Message-ID: <20100321231645.217BE56425@rebar.astron.com> On Mar 21, 5:52pm, ian at darwinsys.com (Ian Darwin) wrote: -- Subject: Re: Obvious "ASCII text" file detected as "HTML document text" | Yes, I'm quite familiar with the HTML document format, having been the | person | who released the world's first commercial HTML editor back in the early | 1990's. | | So. You want "file" to build in an XML/XHTML/HTML4 parser, so we can handle | DOCTYPEs, xmlns's, comments, spaces? Not gonna happen. Maybe a regex? | | \s?(<[^>]+>)+ | | But it will fail if the user has a valid HTML file like | This is not a tag | | I don't know what the answer is, but it has to be something that doesn't | make file | even slower than it's already become. | | I'm guessing we'll end up having to write another C module to handle | HTML, XML, XHTML and related stuff. Thoughts? The problem is that browser will eat anything that remotely resembles html and try to render it. This has created a lot of pages that are not standards conformant, but nevertheless are displayed correctly in browsers. We went through approximately 50000 samples of web pages in the latest version of file, and before the changes to the magic file we had a recognition rate of approximately 60%; after the changes we have 99%. Sure we can write another c module for html, but let's come up with a spec for it first and see if this cannot be implemented as magic. christos From dnovotny at redhat.com Wed Mar 24 15:20:48 2010 From: dnovotny at redhat.com (Daniel Novotny) Date: Wed, 24 Mar 2010 09:20:48 -0400 (EDT) Subject: squashfs filesystem version 4.0: magic entry update In-Reply-To: <2059648198.1302691269436836796.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> Message-ID: <1814248542.1302711269436848259.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> hello, one of our users discovered, that the magic entry for squashfs doesn't count with changes in the new version 4.0. the superblock format is different now, I had to add this data according to squashfs source code, patch against file 5.04 attached downstream bug report: https://bugzilla.redhat.com/show_bug.cgi?id=550212 regards, Daniel Novotny, Red Hat inc. -------------- next part -------------- A non-text attachment was scrubbed... Name: file-5.04-squashfs.patch Type: text/x-patch Size: 1516 bytes Desc: not available URL: From christos at zoulas.com Wed Mar 24 16:17:58 2010 From: christos at zoulas.com (Christos Zoulas) Date: Wed, 24 Mar 2010 10:17:58 -0400 Subject: squashfs filesystem version 4.0: magic entry update In-Reply-To: <1814248542.1302711269436848259.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> from Daniel Novotny (Mar 24, 9:20am) Message-ID: <20100324141758.33DA356426@rebar.astron.com> On Mar 24, 9:20am, dnovotny at redhat.com (Daniel Novotny) wrote: -- Subject: squashfs filesystem version 4.0: magic entry update | hello, | | one of our users discovered, that the magic entry for squashfs doesn't count | with changes in the new version 4.0. | | the superblock format is different now, | I had to add this data according to squashfs source code, | patch against file 5.04 attached | | downstream bug report: | https://bugzilla.redhat.com/show_bug.cgi?id=550212 | | regards, | | Daniel Novotny, Red Hat inc. Thanks muchly! christos From dnovotny at redhat.com Wed Apr 14 14:05:43 2010 From: dnovotny at redhat.com (Daniel Novotny) Date: Wed, 14 Apr 2010 07:05:43 -0400 (EDT) Subject: the file command returns zero exit code even in case of not existing file being tested In-Reply-To: <316557957.245871271242751652.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> Message-ID: <655364602.246131271243143464.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> hello, we found, that file returns zero exit code ("success") in case of "no such file or directory" error: $ file nosuch nosuch: cannot open `nosuch' (No such file or directory) $ echo $? 0 the cause is in fsmagic.c: ret = stat(fn, sb); /* don't merge into if; see "ret =" above */ if (ret) { if (ms->flags & MAGIC_ERROR) { file_error(ms, errno, "cannot stat `%s'", fn); return -1; } if (file_printf(ms, "cannot open `%s' (%s)", fn, strerror(errno)) == -1) return -1; return 1; } the error is printed with file_printf and "return 1" means success in this case changing this to "file_error" and "return -1" all the time (not just when there is MAGIC_ERROR flag) causes this to work $ file nosuch nosuch: ERROR: cannot open `nosuch' (No such file or directory) $ echo $? 1 patch against 5.04 attached maybe there is some purpose, why the error condition is printed with file_printf and no error returned, but I didn't find any maybe you have some idea? regards, Daniel Novotny, Red Hat inc. -------------- next part -------------- A non-text attachment was scrubbed... Name: file-5.04-retval.patch Type: text/x-patch Size: 650 bytes Desc: not available URL: From christos at zoulas.com Wed Apr 14 16:21:31 2010 From: christos at zoulas.com (Christos Zoulas) Date: Wed, 14 Apr 2010 09:21:31 -0400 Subject: the file command returns zero exit code even in case of not existing file being tested In-Reply-To: <655364602.246131271243143464.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> from Daniel Novotny (Apr 14, 7:05am) Message-ID: <20100414132131.857F156425@rebar.astron.com> On Apr 14, 7:05am, dnovotny at redhat.com (Daniel Novotny) wrote: -- Subject: the file command returns zero exit code even in case of not exist | hello, | | we found, that file returns zero exit code ("success") in case of "no such file or directory" error: | | $ file nosuch | nosuch: cannot open `nosuch' (No such file or directory) | $ echo $? | 0 | | the cause is in fsmagic.c: | | ret = stat(fn, sb); /* don't merge into if; see "ret =" above */ | | if (ret) { | if (ms->flags & MAGIC_ERROR) { | file_error(ms, errno, "cannot stat `%s'", fn); | return -1; | } | if (file_printf(ms, "cannot open `%s' (%s)", | fn, strerror(errno)) == -1) | return -1; | return 1; | } | | the error is printed with file_printf and "return 1" means success in this case | | changing this to "file_error" and "return -1" all the time | (not just when there is MAGIC_ERROR flag) causes this to work | | $ file nosuch | nosuch: ERROR: cannot open `nosuch' (No such file or directory) | $ echo $? | 1 | | patch against 5.04 attached | | maybe there is some purpose, why the error condition is printed | with file_printf and no error returned, but I didn't find any | maybe you have some idea? Imagine the scenario where we have to classify many files. What should happen if one of them does not exist/cannot be read? According to: http://www.opengroup.org/onlinepubs/009695399/utilities/file.html If file does not exist, cannot be read, or its file status could not be determined, the output shall indicate that the file was processed, but that its type could not be determined. So it is reallly not an error if something went wrong dealing with a particular file. "The output" means stdout to me. christos From dnovotny at redhat.com Thu Apr 22 18:25:18 2010 From: dnovotny at redhat.com (Daniel Novotny) Date: Thu, 22 Apr 2010 11:25:18 -0400 (EDT) Subject: file may trim too much of command line from core file In-Reply-To: <2122324832.773621271949743197.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> Message-ID: <1158966586.774031271949918377.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> hello, there is a problem with core file analysis: the command line of the crashed program is sometimes trimmed in a way the text is not complete and only later part of the string is shown: $ file core.2493 core.2493: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'gee --rid=48373' $ which gee /usr/bin/which: no gee in (/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/stephent/bin) $ strings core.2493 | head | grep gee /usr/bin/python /usr/lib/python2.6/site-packages/rpdb2.py --debugee --rid=48373 I have found a Debian bugzilla entry for this, with a patch attached (the patch searches for other sections in the core file) The same thing was reported in Fedora, I used the Debian patch and it succesfully works both in our development as well as stable OS release The patch is attached, most credits go to Arnaud Giersch from the Debian community links to downstream bug reports follow - Debian bugzilla: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=422524 Red Hat bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=566305 best regards, Daniel Novotny, Red Hat inc. -------------- next part -------------- A non-text attachment was scrubbed... Name: file-5.04-core-trim.patch Type: text/x-patch Size: 769 bytes Desc: not available URL: From christos at zoulas.com Thu Apr 22 19:55:45 2010 From: christos at zoulas.com (Christos Zoulas) Date: Thu, 22 Apr 2010 12:55:45 -0400 Subject: file may trim too much of command line from core file In-Reply-To: <1158966586.774031271949918377.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> from Daniel Novotny (Apr 22, 11:25am) Message-ID: <20100422165545.D042656425@rebar.astron.com> On Apr 22, 11:25am, dnovotny at redhat.com (Daniel Novotny) wrote: -- Subject: file may trim too much of command line from core file | there is a problem with core file analysis: | the command line of the crashed program is sometimes trimmed in a way | the text is not complete and only later part of the string is shown: | | $ file core.2493 | core.2493: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from | 'gee --rid=48373' | | $ which gee | /usr/bin/which: no gee in | (/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/stephent/bin) | | $ strings core.2493 | head | grep gee | /usr/bin/python /usr/lib/python2.6/site-packages/rpdb2.py --debugee --rid=48373 | | I have found a Debian bugzilla entry for this, with a patch attached | (the patch searches for other sections in the core file) | The same thing was reported in Fedora, I used the Debian patch | and it succesfully works both in our development as well as stable OS release | | The patch is attached, most credits go to Arnaud Giersch | from the Debian community | | links to downstream bug reports follow - | Debian bugzilla: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=422524 | Red Hat bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=566305 | | best regards, | | Daniel Novotny, Red Hat inc. got it, thanks a lot! christos From a.nielsen at shikadi.net Sat Apr 24 16:56:15 2010 From: a.nielsen at shikadi.net (Adam Nielsen) Date: Sat, 24 Apr 2010 23:56:15 +1000 Subject: Bug? UPX .exe files not detected Message-ID: <4BD2F87F.5010806@shikadi.net> Hi all, I've just discovered that while the latest version of file has an apparently correct magic string for detecting UPX-compressed .exe files, for some reason they never seem to be detected. I've uploaded a 1k test file at http://www.shikadi.net/files/file-upxtest.dat which contains enough data for the magic string to match, but all I get is a message saying it's a standard Win32 .exe - no mention of UPX. As far as I can tell the magic string is accurate, am I missing some option or is this a bug in file? Thanks, Adam. From christos at zoulas.com Sat Apr 24 18:09:35 2010 From: christos at zoulas.com (Christos Zoulas) Date: Sat, 24 Apr 2010 11:09:35 -0400 Subject: Bug? UPX .exe files not detected In-Reply-To: <4BD2F87F.5010806@shikadi.net> from Adam Nielsen (Apr 24, 11:56pm) Message-ID: <20100424150935.42E3356425@rebar.astron.com> On Apr 24, 11:56pm, a.nielsen at shikadi.net (Adam Nielsen) wrote: -- Subject: Bug? UPX .exe files not detected | Hi all, | | I've just discovered that while the latest version of file has an apparently | correct magic string for detecting UPX-compressed .exe files, for some reason | they never seem to be detected. | | I've uploaded a 1k test file at http://www.shikadi.net/files/file-upxtest.dat | which contains enough data for the magic string to match, but all I get is a | message saying it's a standard Win32 .exe - no mention of UPX. | | As far as I can tell the magic string is accurate, am I missing some option or | is this a bug in file? This is what the current version of file says for me on your data: file-upxtest.dat: PE32 executable (GUI) Intel 80386 (stripped to external PDB), for MS Windows, UPX compressed christos From a.nielsen at shikadi.net Sun Apr 25 03:14:04 2010 From: a.nielsen at shikadi.net (Adam Nielsen) Date: Sun, 25 Apr 2010 10:14:04 +1000 Subject: Bug? UPX .exe files not detected In-Reply-To: <20100424150935.42E3356425@rebar.astron.com> References: <20100424150935.42E3356425@rebar.astron.com> Message-ID: <4BD3894C.2000703@shikadi.net> > | As far as I can tell the magic string is accurate, am I missing some option or > | is this a bug in file? > > This is what the current version of file says for me on your data: > > file-upxtest.dat: PE32 executable (GUI) Intel 80386 (stripped to external PDB), for MS Windows, UPX compressed Thanks for your reply! That's really strange, using file-5.04 this is what I get: $ ./configure && make $ cd src $ export LD_LIBRARY_PATH=.libs/ $ strace ./file -m ../magic/magic.mgc ../file-upxtest.dat ... open(".libs/libmagic.so.1", O_RDONLY) = 3 read(3, "\177ELF\2\1\1\0"..., 832) = 832 ... open("../magic/magic.mgc", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=1779600, ...}) = 0 ../file-upxtest.dat: PE32 executable for MS Windows (GUI) Intel 80386 32-bit As far as I can tell this is using the correct binary, correct library and correct magic file. Am I missing something? The message is very obviously different to yours, apparently in an older style too. file-5.04 was the most recent version I could find. Thanks again, Adam. From christos at zoulas.com Sun Apr 25 03:32:43 2010 From: christos at zoulas.com (Christos Zoulas) Date: Sat, 24 Apr 2010 20:32:43 -0400 Subject: Bug? UPX .exe files not detected In-Reply-To: <4BD3894C.2000703@shikadi.net> from Adam Nielsen (Apr 25, 10:14am) Message-ID: <20100425003243.1A79F56425@rebar.astron.com> On Apr 25, 10:14am, a.nielsen at shikadi.net (Adam Nielsen) wrote: -- Subject: Re: Bug? UPX .exe files not detected | > | As far as I can tell the magic string is accurate, am I missing some option or | > | is this a bug in file? | > | > This is what the current version of file says for me on your data: | > | > file-upxtest.dat: PE32 executable (GUI) Intel 80386 (stripped to external PDB), for MS Windows, UPX compressed | | Thanks for your reply! That's really strange, using file-5.04 this is what I get: | | $ ./configure && make | $ cd src | $ export LD_LIBRARY_PATH=.libs/ | $ strace ./file -m ../magic/magic.mgc ../file-upxtest.dat | ... | open(".libs/libmagic.so.1", O_RDONLY) = 3 | read(3, "\177ELF\2\1\1\0"..., 832) = 832 | ... | open("../magic/magic.mgc", O_RDONLY) = 3 | fstat(3, {st_mode=S_IFREG|0644, st_size=1779600, ...}) = 0 | | ../file-upxtest.dat: PE32 executable for MS Windows (GUI) Intel 80386 32-bit | | As far as I can tell this is using the correct binary, correct library and | correct magic file. Am I missing something? The message is very obviously | different to yours, apparently in an older style too. file-5.04 was the most | recent version I could find. No, the reason mine works is because I am running stuff at the head of the tree that has particular msdos fixes. I'll release 5.05 soon. christos From a.nielsen at shikadi.net Sun Apr 25 03:43:43 2010 From: a.nielsen at shikadi.net (Adam Nielsen) Date: Sun, 25 Apr 2010 10:43:43 +1000 Subject: Bug? UPX .exe files not detected In-Reply-To: <20100425003243.1A79F56425@rebar.astron.com> References: <20100425003243.1A79F56425@rebar.astron.com> Message-ID: <4BD3903F.6080901@shikadi.net> > No, the reason mine works is because I am running stuff at the head of > the tree that has particular msdos fixes. I'll release 5.05 soon. Ah ok, that explains things, glad to know it's been fixed! I look forward to the next release. Cheers, Adam. From tledouxfr at gmail.com Wed May 5 20:28:02 2010 From: tledouxfr at gmail.com (Thomas Ledoux) Date: Wed, 5 May 2010 19:28:02 +0200 Subject: Mimetypes missing Message-ID: Hello, some mimetypes are missing : - in the definition of the icc profile files (as specified by the iana http://www.iana.org/assignments/media-types/application/vnd.iccprofile) - in some of the cases for mpeg sequences The attached patch does add those in the Magdir/sun and Magdir/animation files Thanks Thomas -------------- next part -------------- A non-text attachment was scrubbed... Name: file5.04-mimetypes.patch Type: application/octet-stream Size: 1800 bytes Desc: not available URL: From christos at zoulas.com Wed May 5 20:38:33 2010 From: christos at zoulas.com (Christos Zoulas) Date: Wed, 5 May 2010 13:38:33 -0400 Subject: Mimetypes missing In-Reply-To: from Thomas Ledoux (May 5, 7:28pm) Message-ID: <20100505173833.8661156425@rebar.astron.com> On May 5, 7:28pm, tledouxfr at gmail.com (Thomas Ledoux) wrote: -- Subject: Mimetypes missing | Hello, | | some mimetypes are missing : | - in the definition of the icc profile files | (as specified by the iana | http://www.iana.org/assignments/media-types/application/vnd.iccprofile) | | - in some of the cases for mpeg sequences | | The attached patch does add those in the Magdir/sun and Magdir/animation files | | Thanks | Thomas Thanks, added. christos From woodbrian77 at gmail.com Mon May 31 00:38:45 2010 From: woodbrian77 at gmail.com (Brian Wood) Date: Sun, 30 May 2010 16:38:45 -0500 Subject: File functionality in library format Message-ID: Greetings, I'm working on something called the C++ Middleware Writer. It's an on line service that writes C++ output based on user input. User input files are uploaded to the site as part of the process. I would like to be able to call the file code in a library form from my process rather than making a call to system with file as an argument. It would be great if I could get a version of the code that was adapted for that. I've downloaded, built and installed the 5.04 version so have gotten that far. I looked at 2010 and 2009 archives and didn't find anything related to this. I believe with the growth of on line services there's a need for a library form of the functionality. -- Brian Wood Ebenezer Enterprises http://webEbenezer.net (651) 251-9384 From daxim at cpan.org Mon May 31 14:51:55 2010 From: daxim at cpan.org (Lars =?utf-8?b?RMmq4bSH4bSE4bSL4bSP4bShIOi/quaLieaWrw==?=) Date: Mon, 31 May 2010 13:51:55 +0200 Subject: File functionality in library format In-Reply-To: References: Message-ID: <201005311352.05741.daxim@cpan.org> man 3 libmagic (That was easy.) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. URL: From sacha at ssl.co.uk Mon May 31 15:16:45 2010 From: sacha at ssl.co.uk (Sacha Varma) Date: Mon, 31 May 2010 13:16:45 +0100 Subject: File functionality in library format In-Reply-To: References: Message-ID: <4C03A8AD.1080101@ssl.co.uk> Brian Wood wrote: > I would like to be able to call the > file code in a library form from my process rather than making a call to system > with file as an argument. It would be great if I could get a version > of the code that was adapted for that. Hi Brian - we use the 'file' package like that by using the 'magic' library that it builds (libmagic.a/.so on UNIX). The calls you need are in magic.h - magic_load(), magic_file() etc. Our experience has been that although you can use the package in this way, it seems to be really developed to support the 'file' command, and things can change a lot between releases (for example, we're stuck on 4.19 for some fundamental reason that I can't now recall). From christos at zoulas.com Tue Jun 1 00:44:10 2010 From: christos at zoulas.com (Christos Zoulas) Date: Mon, 31 May 2010 17:44:10 -0400 Subject: File functionality in library format In-Reply-To: from Brian Wood (May 30, 4:38pm) Message-ID: <20100531214410.EA3C656425@rebar.astron.com> On May 30, 4:38pm, woodbrian77 at gmail.com (Brian Wood) wrote: -- Subject: File functionality in library format | Greetings, | | I'm working on something called the C++ Middleware Writer. It's an on line | service that writes C++ output based on user input. User input files are | uploaded to the site as part of the process. I would like to be able | to call the | file code in a library form from my process rather than making a call to system | with file as an argument. It would be great if I could get a version | of the code | that was adapted for that. I've downloaded, built and installed the 5.04 | version so have gotten that far. I looked at 2010 and 2009 archives and | didn't find anything related to this. I believe with the growth of on line | services there's a need for a library form of the functionality. After you install a recent version of file, look for libmagic, and 'man libmagic'. christos From woodbrian77 at gmail.com Tue Jun 1 03:47:49 2010 From: woodbrian77 at gmail.com (Brian Wood) Date: Mon, 31 May 2010 19:47:49 -0500 Subject: File functionality in library format In-Reply-To: <20100531214410.EA3C656425@rebar.astron.com> References: <20100531214410.EA3C656425@rebar.astron.com> Message-ID: On Mon, May 31, 2010 at 4:44 PM, Christos Zoulas wrote: > On May 30, ?4:38pm, woodbrian77 at gmail.com (Brian Wood) wrote: > -- Subject: File functionality in library format > > | Greetings, > | > | I'm working on something called the C++ Middleware Writer. ?It's an on line > | service that writes C++ output based on user input. ?User input files are > | uploaded to the site as part of the process. ?I would like to be able > | to call the > | file code in a library form from my process rather than making a call to system > | with file as an argument. ?It would be great if I could get a version > | of the code > | that was adapted for that. ? ?I've downloaded, built and installed the 5.04 > | version so have gotten that far. ?I looked at 2010 and 2009 archives and > | didn't find anything related to this. ?I believe with the growth of on line > | services there's a need for a library form of the functionality. > > After you install a recent version of file, look for libmagic, and > 'man libmagic'. > > christos > Thanks. It's working now. I had some unresolved symbols when I tried using libmagic.a but they went away when I used the shared library. -- Brian Wood Ebenezer Enterprises http://www.webEbenezer.net (651) 251-9384 From woodbrian77 at gmail.com Tue Jun 1 04:17:33 2010 From: woodbrian77 at gmail.com (Brian Wood) Date: Mon, 31 May 2010 20:17:33 -0500 Subject: File functionality in library format In-Reply-To: <201005311352.05741.daxim@cpan.org> References: <201005311352.05741.daxim@cpan.org> Message-ID: On Mon, May 31, 2010 at 6:51 AM, Lars D?????? ??? wrote: > man 3 libmagic > > (That was easy.) > > _______________________________________________ > File mailing list > File at mx.gw.com > http://mx.gw.com/mailman/listinfo/file > > Thanks. I got it working. When I tried linking with libmagic.a I got some unresolved symbols, but when I used the shared library they went away. -- Brian Wood Ebenezer Enterprises http://www.webEbenezer.net (651) 251-9384 From cchittleborough+comp.file at cluemail.com Wed Jun 2 19:57:49 2010 From: cchittleborough+comp.file at cluemail.com (Christopher Chittleborough) Date: Thu, 03 Jun 2010 02:27:49 +0930 Subject: Patch: minor markup fix in doc/magic.man Message-ID: <1275497869.14885.1378132559@webmail.messagingengine.com> Here's a patch to fix a small markup problem in doc/magic.man. Cheers -- Chris diff -rbu file-5.04.ORIGINAL/doc/magic.man file-5.04/doc/magic.man --- file-5.04.ORIGINAL/doc/magic.man 2009-05-09 08:32:44.000000000 +0930 +++ file-5.04/doc/magic.man 2010-06-01 14:02:20.000000000 +0930 @@ -283,7 +283,7 @@ The special test .Em x always evaluates to true. -.Dv message +.It Dv message The message to be printed if the comparison succeeds. If the string contains a .Xr printf 3 From christos at zoulas.com Wed Jun 2 20:11:08 2010 From: christos at zoulas.com (Christos Zoulas) Date: Wed, 2 Jun 2010 13:11:08 -0400 Subject: Patch: minor markup fix in doc/magic.man In-Reply-To: <1275497869.14885.1378132559@webmail.messagingengine.com> from "Christopher Chittleborough" (Jun 3, 2:27am) Message-ID: <20100602171109.0440456425@rebar.astron.com> On Jun 3, 2:27am, cchittleborough+comp.file at cluemail.com ("Christopher Chittleborough") wrote: -- Subject: Patch: minor markup fix in doc/magic.man | Here's a patch to fix a small markup problem in doc/magic.man. | | Cheers -- Chris thanks a lot! christos From jkaluza at redhat.com Mon Jun 21 10:39:56 2010 From: jkaluza at redhat.com (Jan Kaluza) Date: Mon, 21 Jun 2010 03:39:56 -0400 (EDT) Subject: the file command returns zero exit code even in case of not existing file being tested In-Reply-To: <148458309.502641277105630350.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> Message-ID: <2104358862.502931277105996509.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> > | hello, > | > | we found, that file returns zero exit code ("success") in case of "no such file or directory" error: > | > | $ file nosuch > | nosuch: cannot open `nosuch' (No such file or directory) > | $ echo $? > | 0 > | > | the cause is in fsmagic.c: > | > | ret = stat(fn, sb); /* don't merge into if; see "ret =" above */ > | > | if (ret) { > | if (ms->flags & MAGIC_ERROR) { > | file_error(ms, errno, "cannot stat `%s'", fn); > | return -1; > | } > | if (file_printf(ms, "cannot open `%s' (%s)", > | fn, strerror(errno)) == -1) > | return -1; > | return 1; > | } > | > | the error is printed with file_printf and "return 1" means success in this case > | > | changing this to "file_error" and "return -1" all the time > | (not just when there is MAGIC_ERROR flag) causes this to work > | > | $ file nosuch > | nosuch: ERROR: cannot open `nosuch' (No such file or directory) > | $ echo $? > | 1 > | > | patch against 5.04 attached > | > | maybe there is some purpose, why the error condition is printed > | with file_printf and no error returned, but I didn't find any > | maybe you have some idea? > > Imagine the scenario where we have to classify many files. What should > happen if one of them does not exist/cannot be read? I understand your situation and respect your decision, but I think more common behaviour is to return an error code even one file from many doesn't exit, because that's what "ls", "cat", "chmod", "chown", "sort" and many others do. In my opinion file should act also like those basic commands. > According to: http://www.opengroup.org/onlinepubs/009695399/utilities/file.html > > If file does not exist, cannot be read, or its file status could > not be determined, the output shall indicate that the file was > processed, but that its type could not be determined. That's true, but I think we have to distinguish between the situation when file does not exist or cannot be read and the situation when file status can't be determined. I really don't want to somehow force you to do that, I've just wanted to bring some new ideas about this "problem". > So it is reallly not an error if something went wrong dealing with a particular > file. "The output" means stdout to me. > > christos Jan Kalu?a From christos at zoulas.com Mon Jun 21 15:24:26 2010 From: christos at zoulas.com (Christos Zoulas) Date: Mon, 21 Jun 2010 08:24:26 -0400 Subject: the file command returns zero exit code even in case of not existing file being tested In-Reply-To: <2104358862.502931277105996509.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> from Jan Kaluza (Jun 21, 3:39am) Message-ID: <20100621122426.E597656425@rebar.astron.com> On Jun 21, 3:39am, jkaluza at redhat.com (Jan Kaluza) wrote: -- Subject: the file command returns zero exit code even in case of not exist | | > Imagine the scenario where we have to classify many files. What should | > happen if one of them does not exist/cannot be read? | | I understand your situation and respect your decision, but I think more common behaviour is to return | an error code even one file from many doesn't exit, because that's what | "ls", "cat", "chmod", "chown", "sort" and many others do. In my opinion file should | act also like those basic commands. | | > According to: http://www.opengroup.org/onlinepubs/009695399/utilities/file.html | > | > If file does not exist, cannot be read, or its file status could | > not be determined, the output shall indicate that the file was | > processed, but that its type could not be determined. | | That's true, but I think we have to distinguish between the situation when file does not exist or cannot | be read and the situation when file status can't be determined. | | I really don't want to somehow force you to do that, I've just wanted to bring some new ideas about this "problem". | | > So it is reallly not an error if something went wrong dealing with a particular | > file. "The output" means stdout to me. | > | > christos | | Jan Kalu??a I did not say I disagree with you, in that it would make more sense for file(1) should exit with non-zero if there was an error condition, but an unreadable file or a broken symlink is not an error according to the posix standard, and changing the behavior will not only violate posix, but break existing scripts. christos From christos at zoulas.com Mon Jun 21 15:25:34 2010 From: christos at zoulas.com (Christos Zoulas) Date: Mon, 21 Jun 2010 08:25:34 -0400 Subject: the file command returns zero exit code even in case of not existing file being tested In-Reply-To: <2104358862.502931277105996509.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> from Jan Kaluza (Jun 21, 3:39am) Message-ID: <20100621122534.ADEAC56427@rebar.astron.com> On Jun 21, 3:39am, jkaluza at redhat.com (Jan Kaluza) wrote: -- Subject: the file command returns zero exit code even in case of not exist We could add an extra flag though to exit with non zero on a broken symlink or an unreadable file. The question again is what to do with multiple files? Keep going or exit on the first bad one? christos From jkaluza at redhat.com Thu Jun 24 17:40:37 2010 From: jkaluza at redhat.com (Jan Kaluza) Date: Thu, 24 Jun 2010 10:40:37 -0400 (EDT) Subject: the file command returns zero exit code even in case of not existing file being tested In-Reply-To: <20100621122426.E597656425@rebar.astron.com> Message-ID: <249122402.808861277390437902.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com> Jan Kalu?a ----- "Christos Zoulas" wrote: > On Jun 21, 3:39am, jkaluza at redhat.com (Jan Kaluza) wrote: > -- Subject: the file command returns zero exit code even in case of > not exist > | > | > Imagine the scenario where we have to classify many files. What > should > | > happen if one of them does not exist/cannot be read? > | > | I understand your situation and respect your decision, but I think > more common behaviour is to return > | an error code even one file from many doesn't exit, because that's > what > | "ls", "cat", "chmod", "chown", "sort" and many others do. In my > opinion file should > | act also like those basic commands. > | > | > According to: > http://www.opengroup.org/onlinepubs/009695399/utilities/file.html > | > > | > If file does not exist, cannot be read, or its file status > could > | > not be determined, the output shall indicate that the file > was > | > processed, but that its type could not be determined. > | > | That's true, but I think we have to distinguish between the > situation when file does not exist or cannot > | be read and the situation when file status can't be determined. > | > | I really don't want to somehow force you to do that, I've just > wanted to bring some new ideas about this "problem". > | > | > So it is reallly not an error if something went wrong dealing with > a particular > | > file. "The output" means stdout to me. > | > > | > christos > | > | Jan Kalu?a > > I did not say I disagree with you, in that it would make more sense > for > file(1) should exit with non-zero if there was an error condition, > but > an unreadable file or a broken symlink is not an error according to > the > posix standard, and changing the behavior will not only violate > posix, > but break existing scripts. I can't find those statements in posix standard just now, so I will believe you (although I don't understand why some basic commands return error code in those cases). I also agree this change would break existing scripts. As you have already mentioned in another email, it would be fine to introduce new option which could return an error code when file can't be read. In my opinion, there are two ways how to handle multiple files. The first one is to return error code even if one file from many can't be read (I would say I prefer this one). The second one is to return error code just when all files can't be read. When thinking about it, maybe we can return an error code only when file(1) handles just one file and keep the current behaviour when handling more files, because error code when handling more files is really discussable. > christos > > > _______________________________________________ > File mailing list > File at mx.gw.com > http://mx.gw.com/mailman/listinfo/file Jan kaluza From guy at alum.mit.edu Thu Jun 24 20:09:05 2010 From: guy at alum.mit.edu (Guy Harris) Date: Thu, 24 Jun 2010 10:09:05 -0700 Subject: the file command returns zero exit code even in case of not existing file being tested In-Reply-To: <20100414132131.857F156425@rebar.astron.com> References: <20100414132131.857F156425@rebar.astron.com> Message-ID: <9177AC3D-8DE4-4F13-B8EA-C057EE654B97@alum.mit.edu> On Apr 14, 2010, at 6:21 AM, Christos Zoulas wrote: > According to: http://www.opengroup.org/onlinepubs/009695399/utilities/file.html > > If file does not exist, cannot be read, or its file status could > not be determined, the output shall indicate that the file was > processed, but that its type could not be determined. And according to the 2008 (newer) version: http://www.opengroup.org/onlinepubs/9699919799/utilities/file.html "If the file named by the operand does not exist, cannot be read, or the type of the file named by the operand cannot be determined, this shall not be considered an error that affects the exit status." which is even more explicit. So, yes, we'd need to add a new command-line flag if we want to be POSIX-compliant but also want to offer the option of exiting with an error code if a file doesn't exist or can't be read. (As for history: if this is, indeed, the V7 file command: http://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/file.c and this is, indeed, the V7 C startup code: http://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/libc/csu/crt0.s then, if file had a reliable exit code in the "normal" case, it was through pure luck, as 1) main() was just falling off the end and 2) (the lower 8 bits of) the return value of main ended up being the exit status so the exit status was whatever happened to be in R0 at the time main() returned. (It also had a "-f" flag, which read file names from the file name given as an argument to the flag, and reported on those files; if that failed, it exited with 2, but if it *succeeded*, it exited with 1.)) From jkaluza at redhat.com Tue Jun 29 11:47:10 2010 From: jkaluza at redhat.com (Jan Kaluza) Date: Tue, 29 Jun 2010 10:47:10 +0200 Subject: Z-machine magic entry update Message-ID: <201006291047.11272.jkaluza@redhat.com> Hi, file(1) sometimes identifies some files (.mp3 or Mono debug files - .mdb) as "Infocom game data". Most of those cases can be fixed by checking version of Z- machine file (first byte), which is always between 1 and 8 [1] [2] [3]. I've created simple patch against file-5.04 which do that. You can find the patch attached to this downstream bug: https://bugzilla.redhat.com/show_bug.cgi?id=608922 [1] http://www.gnelson.demon.co.uk/zspec/sect11.html - original standard with version 1-6. [2] http://www.jczorkmid.net/~jpenney/ZSpec11-latest.txt - latest standard with version 7,8. [3] http://en.wikipedia.org/wiki/Z-machine - mention about version 7,8. regards, Jan Kaluza From christos at zoulas.com Tue Jun 29 15:42:47 2010 From: christos at zoulas.com (Christos Zoulas) Date: Tue, 29 Jun 2010 08:42:47 -0400 Subject: Z-machine magic entry update In-Reply-To: <201006291047.11272.jkaluza@redhat.com> from Jan Kaluza (Jun 29, 10:47am) Message-ID: <20100629124247.4EE0256425@rebar.astron.com> On Jun 29, 10:47am, jkaluza at redhat.com (Jan Kaluza) wrote: -- Subject: Z-machine magic entry update | Hi, | | file(1) sometimes identifies some files (.mp3 or Mono debug files - .mdb) as | "Infocom game data". Most of those cases can be fixed by checking version of Z- | machine file (first byte), which is always between 1 and 8 [1] [2] [3]. I've | created simple patch against file-5.04 which do that. | | You can find the patch attached to this downstream bug: | https://bugzilla.redhat.com/show_bug.cgi?id=608922 | | [1] http://www.gnelson.demon.co.uk/zspec/sect11.html - original standard with | version 1-6. | [2] http://www.jczorkmid.net/~jpenney/ZSpec11-latest.txt - latest standard | with version 7,8. | [3] http://en.wikipedia.org/wiki/Z-machine - mention about version 7,8. | Thanks a lot; fixed as suggested. christos From jkaluza at redhat.com Wed Jul 7 12:05:17 2010 From: jkaluza at redhat.com (Jan Kaluza) Date: Wed, 7 Jul 2010 11:05:17 +0200 Subject: WebM magic file Message-ID: <201007071105.17877.jkaluza@redhat.com> Hi I'm attaching very basic magic file for WebM video format (http://www.webmproject.org/). Maybe we can somehow merge that with matroska, because they both use the same EBML header. It would be really great to have it in next file(1) release. Thanks for your work Jan Kaluza -------------- next part -------------- #------------------------------------------------------------------------------ # $File: matroska,v 1.5 2009/09/27 19:02:12 christos Exp $ # webm: file(1) magic for WebM files # # See http://www.webmproject.org/ # # EBML id: 0 belong 0x1a45dfa3 # DocType id: >0 search/4096 \x42\x82 # DocType contents: >>&1 string webm WebM !:mime video/webm From lkundrak at v3.sk Thu Jul 8 00:48:36 2010 From: lkundrak at v3.sk (Lubomir Rintel) Date: Wed, 7 Jul 2010 17:48:36 -0400 Subject: [PATCH] Add matches for Parrot Message-ID: <1278539316-2067-1-git-send-email-lkundrak@v3.sk> Parrot is a virtual machine for highly portable byte code, primarily generated by Perl 6 compilers such as Rakudo. --- magic/Magdir/parrot | 22 ++++++++++++++++++++++ magic/Makefile.am | 1 + 2 files changed, 23 insertions(+), 0 deletions(-) create mode 100644 magic/Magdir/parrot diff --git a/magic/Magdir/parrot b/magic/Magdir/parrot new file mode 100644 index 0000000..400b311 --- /dev/null +++ b/magic/Magdir/parrot @@ -0,0 +1,22 @@ +#------------------------------------------------------------------------------ +# $File$ +# parrot: file(1) magic for Parrot Virtual Machine +# URL: http://www.lua.org/ +# From: Lubomir Rintel + +# Compiled Parrot byte code +0 string \376PBC\r\n\032\n Parrot bytecode +>64 byte x %d. +>72 byte x \b%d, +>8 byte >0 %d byte words, +>16 byte 0 little-endian, +>16 byte 1 big-endian, +>32 byte 0 IEEE-754 8 byte double floats, +>32 byte 1 x86 12 byte long double floats, +>32 byte 2 IEEE-754 16 byte long double floats, +>32 byte 3 MIPS 16 byte long double floats, +>32 byte 4 AIX 16 byte long double floats, +>32 byte 5 4-byte floats, +>40 byte x Parrot %d. +>48 byte x \b%d. +>56 byte x \b%d diff --git a/magic/Makefile.am b/magic/Makefile.am index e10499d..53b01ae 100644 --- a/magic/Makefile.am +++ b/magic/Makefile.am @@ -149,6 +149,7 @@ $(MAGIC_FRAGMENT_DIR)/os9 \ $(MAGIC_FRAGMENT_DIR)/osf1 \ $(MAGIC_FRAGMENT_DIR)/palm \ $(MAGIC_FRAGMENT_DIR)/parix \ +$(MAGIC_FRAGMENT_DIR)/parrot \ $(MAGIC_FRAGMENT_DIR)/pbm \ $(MAGIC_FRAGMENT_DIR)/pdf \ $(MAGIC_FRAGMENT_DIR)/pdp \ -- 1.6.5.2 From lkundrak at v3.sk Thu Jul 8 00:49:35 2010 From: lkundrak at v3.sk (Lubomir Rintel) Date: Wed, 7 Jul 2010 17:49:35 -0400 Subject: [PATCH] Add matches for ruby modules Message-ID: <1278539375-2187-1-git-send-email-lkundrak@v3.sk> Similar to what's already done for Perl. Existing rules only match shebangs, which is not useful in most cases. This was tested to yield no false positives when run against every perl and python source file I could find on my system (260 CPAN modules and 59 Python modules). Produces a couple of false negatives for a couple of files from 30 Ruby gems I have installed, but is still an improvement over existing rules. (Ruby is used to construct DSLs quite often, it would be really tricky to match those). --- magic/Magdir/ruby | 12 ++++++++++++ 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/magic/Magdir/ruby b/magic/Magdir/ruby index 7030295..be1786c 100644 --- a/magic/Magdir/ruby +++ b/magic/Magdir/ruby @@ -14,3 +14,15 @@ !:mime text/x-ruby 0 search/1 #!\ /usr/bin/env\ ruby Ruby script text executable !:mime text/x-ruby + +# What looks like ruby, but does not have a shebang +# (modules and such) +# From: Lubomir Rintel +0 regex \^[\ \t]*require[\ \t]'[A-Za-z_\/]+' +>0 regex include\ [A-Z]|def\ [a-z]|\ do$ +>>0 regex \^[\ \t]*end([\ \t]*[;#].*)?$ Ruby script text +!:mime text/x-ruby +0 regex \^[\ \t]*(class|module)[\ \t][A-Z] +>0 regex (modul|includ)e\ [A-Z]|def\ [a-z] +>>0 regex \^[\ \t]*end([\ \t]*[;#].*)?$ Ruby module source text +!:mime text/x-ruby -- 1.6.5.2 From christos at zoulas.com Thu Jul 8 23:18:59 2010 From: christos at zoulas.com (Christos Zoulas) Date: Thu, 8 Jul 2010 16:18:59 -0400 Subject: [PATCH] Add matches for Parrot In-Reply-To: <1278539316-2067-1-git-send-email-lkundrak@v3.sk> from Lubomir Rintel (Jul 7, 5:48pm) Message-ID: <20100708201859.C14F156425@rebar.astron.com> On Jul 7, 5:48pm, lkundrak at v3.sk (Lubomir Rintel) wrote: -- Subject: [PATCH] Add matches for Parrot | Parrot is a virtual machine for highly portable byte code, primarily | generated by Perl 6 compilers such as Rakudo. Added, thanks! christos From swnykimo at gmail.com Mon Jul 12 08:23:25 2010 From: swnykimo at gmail.com (Guo Lu) Date: Mon, 12 Jul 2010 13:23:25 +0800 Subject: cross compile question Message-ID: Hi, My name is chen, and I am tried to cross compile the "file" tool for arm based machine. I use CodeSourcery cross compiler. The compile commands I used are: 1: ./configure --host=arm-none-linux-gnueabi 2: make there is no problem for configuring to generate the Makefile, but when I execute the 'make' command, I got something about magic number error: file: Unknown !: entry `!:apple 8BIMGIFf' make[2]: *** [magic.mgc] Error 255 make[2]: Leaving directory `/home/chen/tmp/file-5.00/magic' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/chen/tmp/file-5.00' make: *** [all] Error 2 Can anybody give me any hints to solve this. I'll appreciate very much thanks chen -------------- next part -------------- An HTML attachment was scrubbed... URL: From quel at quelrod.net Wed Jul 14 02:07:57 2010 From: quel at quelrod.net (James Nobis) Date: Tue, 13 Jul 2010 18:07:57 -0500 Subject: .xlsx misidentified as a zip file Message-ID: <4C3CF1CD.3080201@quelrod.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Ok it actually *is* a zip file that contains the Microsoft Excel 2007 XML format. unzip -x test.xlsx Archive: test.xlsx inflating: xl/workbook.xml inflating: xl/worksheets/sheet1.xml inflating: xl/worksheets/sheet2.xml inflating: xl/worksheets/sheet3.xml inflating: xl/styles.xml inflating: xl/_rels/workbook.xml.rels inflating: [Content_Types].xml inflating: _rels/.rels James -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iQIcBAEBCgAGBQJMPPHNAAoJEGUWgJyjXssuOJYP/RRQBB+nOf2WR+O4qSVxZzl6 KTXRwusu3pWZlIHXakY7NUB1XS58Qe9LJBjfuL2Bv4OI8CByD9nN7+3JSW7tfHB5 ZYUKg9t9aSgnSM+ocdWfSKuPBgdG4wj6M6N5rTf03VCMOe0nxZ3vmJi9bGm7CndD B115B/uwij0FHcu4B4M1zdVk/BwLFQcW3/JSasuaZ8Jy2vkmcwmW5LbWwLw5ucMA cOa98s14JBPqRBz7EJ1jos8MlGIZ1G2CB6zr94gE/+meuHOiqlByDYFiD3xaHI17 Q1p8Y3gAA4t4YEFJJ4gdZSliucXEGNN/pZ8optobwsEYFdDi9pzGV2q7kONEoUuq p9ZD6qwF9EUnbYMA744wc57k2iO/8dYP79wQKAfEEmWf7L7Vt+/2JWwp3q9ziikC yZqNQ6pm7WJ4ZL3VCJMqH3033CQcnidgqF3zovcIwzKtBG/qls+yyQSIoyBaN+9+ xCJ9umS22HCJt5msaVmpFZLcwlYvhwWUYhGzw7a1Lc7sAakknNZdfa8zRqm8Pmij hdAZAQD/6Soe7Pu/crNWLx1Qx/DKgG3uCoThWM7ho/IVd84nFPLvL6ikVzilL8aQ yIivAg/c9T03dI587qvmgL+2OWyrJStTiTqTy5XRTrwSKRU8n3QiiYuQdCYCTFQo lHtSYWQHg4kYDCsPNuUJ =xpR4 -----END PGP SIGNATURE----- From quel at quelrod.net Wed Jul 14 02:00:15 2010 From: quel at quelrod.net (James Nobis) Date: Tue, 13 Jul 2010 18:00:15 -0500 Subject: .xlsx misidentified as a zip file Message-ID: <4C3CEFFF.3080206@quelrod.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 The .xlsx is Microsoft Excel 2007. I'm using the 5.04 release and it gives me this: Zip archive data, at least v2.0 to extract. The first 4 bytes of the file: 50 4b 03 04 P K I believe this matches the archive magic entry: 0 string PK\003\004 I'm attaching a test file that gets this incorrect identification. I created the file with Gnumeric (Open Office doesn't appear to let you save in this format.) James -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iQIcBAEBCgAGBQJMPO/+AAoJEGUWgJyjXssu9MgP/ixj9dmhYBYJcUm4M06SXi6K rbQdzt8m6sesCnEekR6ljUSj81nI6G3yUMY2cgdeD+w08AHDJHq+tDTnkeVEykm7 Qq9ysgLrBfa/rUIAz9Ez2Fhkx0B3ZZ6aSV4hKhsqA/c3JDUBpkBeBH7Zp6KBXR8a /dYEFCUaXVp5z/HMSagkWlJMtau86T/CycJ04mVgdF89VggcwczfiJbsa3DDodj/ Ue+26cTdHeJS4xmtcbYF9EI1yLVjd7cDmCGPCdmxbSyd+ozqAG27lUGGiOVMREV5 1+m2WhcVA65icdM0iLJpafxUoa7U8BYZNdMZb9la8KdHurxW6+PZ1Tgwn3JGgxYr rGdZY0wgkFg5wH0K6RqHFV6HXp8Nz2jkw/5kKeK4L+Go5AyWA/0LY5T8u18s97bx BKQp7ktEemrBLDhv45bpAnq8/E0VAu2iTBWeYG6Bm3xtybUFWwXVWPDtYaiIEaBD QJtpG0sd82jbu9BT3qoB0jCPjUS0bIQxXRXJdB2UU+PJhMUTaV2sV3WygLmdqkap 0lyyDNXaRtI0GTQfIYj9ERP9nGpDyb6kfzZEz+3jgAxjtMieXQ7eQFLhJ0/aOkWE iac2aWQcjbS/oHRI8p7n2eqL78H3J6/K/Zg4NbCbxu1PLCHqFCaLnzcqF1H685Bm L1RnUedJfIj8kTdKfxlo =a+aH -----END PGP SIGNATURE----- -------------- next part -------------- A non-text attachment was scrubbed... Name: test.xlsx Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet Size: 3459 bytes Desc: not available URL: From christos at zoulas.com Wed Jul 14 15:32:55 2010 From: christos at zoulas.com (Christos Zoulas) Date: Wed, 14 Jul 2010 08:32:55 -0400 Subject: .xlsx misidentified as a zip file In-Reply-To: <4C3CF1CD.3080201@quelrod.net> from James Nobis (Jul 13, 6:07pm) Message-ID: <20100714123255.A6B5856425@rebar.astron.com> On Jul 13, 6:07pm, quel at quelrod.net (James Nobis) wrote: -- Subject: .xlsx misidentified as a zip file | Ok it actually *is* a zip file that contains the Microsoft Excel 2007 XML format. | | unzip -x test.xlsx | Archive: test.xlsx | inflating: xl/workbook.xml | inflating: xl/worksheets/sheet1.xml | inflating: xl/worksheets/sheet2.xml | inflating: xl/worksheets/sheet3.xml | inflating: xl/styles.xml | inflating: xl/_rels/workbook.xml.rels | inflating: [Content_Types].xml | inflating: _rels/.rels I will add support for Office 2007 zipped xml files later this year. christos From jkaluza at redhat.com Mon Jul 19 12:30:46 2010 From: jkaluza at redhat.com (Jan Kaluza) Date: Mon, 19 Jul 2010 11:30:46 +0200 Subject: [PATCH] Fixed bad "from '%s'" for ELF some binaries Message-ID: <201007191130.46391.jkaluza@redhat.com> Hi, attached patch fixes the bug when bad "from %s" string was printed for ELF binary. The problem is that "from %s" string is retrieved from "note section" from note with id 3 (NT_PRPSINFO). The function "donote" in readelf.c, which tries to retrieve that information, is shared between all types of ELF binaries (ET_CORE, ET_EXEC, ...). But for ET_EXEC, the note with id 3 is NT_GNU_BUILD_ID. Therefore file(1) returns bad "from %s" string if the binary is ET_EXEC type and contains NT_GNU_BUILD_ID "info". My solution just adds one condition to detect if handled ELF file is ET_CORE and if it's not, it doesn't try to handle that note at all. Downstream bug: https://bugzilla.redhat.com/show_bug.cgi?id=599695 If I can ask you, please confirm if you accept my patches, because as package maintainer I don't want to use patches which are not accepted by you. It would help me a lot. Thanks for your work Jan Kaluza -------------- next part -------------- A non-text attachment was scrubbed... Name: file-5.04-core-prpsinfo.patch Type: text/x-patch Size: 904 bytes Desc: not available URL: From lrn1986 at gmail.com Sun Jul 18 17:32:41 2010 From: lrn1986 at gmail.com (LRN) Date: Sun, 18 Jul 2010 18:32:41 +0400 Subject: Mingw compatibility patch Message-ID: <4C431089.7000704@gmail.com> Directory diff against file 5.04 is attached. It is not perfect by any measure, especially the bits in configure.ac and Makefile.am, and i've chosen to cull away missing functionality instead of porting it, such as symlink support (MSVCRT does not seem to be symlink-aware even if Windows is) and archive unpacking with fork()/exec() (can be rewritten with spawn(), but that's complicated)...but the code compiles, and file.exe works as intended (with altered magic.mgc search paths, yay!). P.S. is there a way to prevent libtool from picking up compiled file.exe from the source tree when calling 'make' again? Because if this file.exe doesn't work (wich is often the case during development), libtool's magic tests will fail, and libtool will complain about missing real libraries. -------------- next part -------------- A non-text attachment was scrubbed... Name: file-mingw.diff Type: text/x-patch Size: 12693 bytes Desc: not available URL: From christos at zoulas.com Mon Jul 19 17:57:14 2010 From: christos at zoulas.com (Christos Zoulas) Date: Mon, 19 Jul 2010 10:57:14 -0400 Subject: [PATCH] Fixed bad "from '%s'" for ELF some binaries In-Reply-To: <201007191130.46391.jkaluza@redhat.com> from Jan Kaluza (Jul 19, 11:30am) Message-ID: <20100719145714.C1FA056426@rebar.astron.com> On Jul 19, 11:30am, jkaluza at redhat.com (Jan Kaluza) wrote: -- Subject: [PATCH] Fixed bad "from '%s'" for ELF some binaries | Hi, | | attached patch fixes the bug when bad "from %s" string was printed for ELF | binary. The problem is that "from %s" string is retrieved from "note section" | from note with id 3 (NT_PRPSINFO). The function "donote" in readelf.c, which | tries to retrieve that information, is shared between all types of ELF | binaries (ET_CORE, ET_EXEC, ...). But for ET_EXEC, the note with id 3 is | NT_GNU_BUILD_ID. Therefore file(1) returns bad "from %s" string if the binary | is ET_EXEC type and contains NT_GNU_BUILD_ID "info". | | My solution just adds one condition to detect if handled ELF file is ET_CORE | and if it's not, it doesn't try to handle that note at all. | | Downstream bug: https://bugzilla.redhat.com/show_bug.cgi?id=599695 | | If I can ask you, please confirm if you accept my patches, because as package | maintainer I don't want to use patches which are not accepted by you. It would | help me a lot. | | Thanks for your work | Jan Kaluza Thanks for the patch, applied to HEAD. Best, christos From christos at zoulas.com Mon Jul 19 20:09:12 2010 From: christos at zoulas.com (Christos Zoulas) Date: Mon, 19 Jul 2010 13:09:12 -0400 Subject: Mingw compatibility patch In-Reply-To: <4C431089.7000704@gmail.com> from LRN (Jul 18, 6:32pm) Message-ID: <20100719170912.6767D56425@rebar.astron.com> On Jul 18, 6:32pm, lrn1986 at gmail.com (LRN) wrote: | Content-Transfer-Encoding: 7bit | | Directory diff against file 5.04 is attached. | It is not perfect by any measure, especially the bits in configure.ac | and Makefile.am, and i've chosen to cull away missing functionality | instead of porting it, such as symlink support (MSVCRT does not seem to | be symlink-aware even if Windows is) and archive unpacking with | fork()/exec() (can be rewritten with spawn(), but that's I'd rather keep this around as a diff since I would prefer mingw32 to fix their own issues (printf c99 standard formats etc), rather than polluting the code. | (with altered magic.mgc search paths, yay!). | | P.S. is there a way to prevent libtool from picking up compiled file.exe | from the source tree when calling 'make' again? Because if this file.exe | doesn't work (wich is often the case during development), libtool's | magic tests will fail, and libtool will complain about missing real | libraries. This is not desirable again because if the version of the magic file changes with the new development code, building the magic file with the old executable will not work. christos From lrn1986 at gmail.com Tue Jul 20 08:23:03 2010 From: lrn1986 at gmail.com (LRN) Date: Tue, 20 Jul 2010 09:23:03 +0400 Subject: Mingw compatibility patch In-Reply-To: <20100719170912.6767D56425@rebar.astron.com> References: <20100719170912.6767D56425@rebar.astron.com> Message-ID: <4C4532B7.8050408@gmail.com> On 19.07.2010 21:09, Christos Zoulas wrote: > On Jul 18, 6:32pm, lrn1986 at gmail.com (LRN) wrote: > | Content-Transfer-Encoding: 7bit > | > | Directory diff against file 5.04 is attached. > | It is not perfect by any measure, especially the bits in configure.ac > | and Makefile.am, and i've chosen to cull away missing functionality > | instead of porting it, such as symlink support (MSVCRT does not seem to > | be symlink-aware even if Windows is) and archive unpacking with > | fork()/exec() (can be rewritten with spawn(), but that's > > I'd rather keep this around as a diff since I would prefer mingw32 to > fix their own issues (printf c99 standard formats etc), rather than > polluting the code. 1) There are ways to present these fixes in a nicer way, such as the way glib does it (and i can do that) 2) It will probably take some years for MinGW maintainers to add C99-conformant printf implementation, so it's not a temporary issue you can wait upon (actually, you can, but i wouldn't want you to) 3) There are other fixes which may actually go into code in their present form, no need to dump everything together. Though i have a bad habit of presenting independent patches in a single lump, especially in this particular case, since i didn't have any VCS to work with, just a simple recursive diff (and that i can fix by creating a local git repository out of a file(1) source snapshot) 4) How exactly are you going to keep it in a diff form? Will people be able to learn about it and use it? Will it be applied automatically for a particular platform? Because the goal is to allow people to compile file(1) and libmagic out of the box, preferably without extra patching that needs to be done manually. > | (with altered magic.mgc search paths, yay!). > | > | P.S. is there a way to prevent libtool from picking up compiled file.exe > | from the source tree when calling 'make' again? Because if this file.exe > | doesn't work (wich is often the case during development), libtool's > | magic tests will fail, and libtool will complain about missing real > | libraries. > > This is not desirable again because if the version of the magic file > changes with the new development code, building the magic file with > the old executable will not work. How could the version of the magic file that is located within the source tree (and should be preferably compiled with the version of the file program built from the same source tree, as you have said) be connected with actual magic file installed in the system along with the file program actually used to do things? Either i don't know something about the shell/libtool/gcc (which is probable), or you misunderstood my post-scriptum (which is also probable). I think, actually, that on *nix this problem can be (and probably is) dodged by chmod'ing newly created executables as non-executable (aren't they made that way by default?), preventing shell from picking them up when libtool runs a `file` command. From mdorey at bluearc.com Tue Jul 20 10:20:26 2010 From: mdorey at bluearc.com (Martin Dorey) Date: Tue, 20 Jul 2010 00:20:26 -0700 Subject: Mingw compatibility patch Message-ID: <54A098E33E92A04EB0DD9A2E8B546CB00478AB5485@us-ex-mbx1.terastack.bluearc.com> > 2) It will probably take some years for > MinGW maintainers to add > C99-conformant printf implementation I doubt they'll ever do it. Their project is not about recreating a Linux-like environment on Windows (that's Cygwin). Their project is about being able to compile with gcc for Windows, using Microsoft's C runtime. Portability from Microsoft's compilers is quite possibly more important to them than portability from other platforms. It's Microsoft's printf, then, that's causing the issue. Perhaps Microsoft will eventually get with the C99 program but I wouldn't hold your breath there either. ----- Original Message ----- From: file-bounces at mx.gw.com To: file at mx.gw.com Sent: Mon Jul 19 22:23:03 2010 Subject: Re: Mingw compatibility patch On 19.07.2010 21:09, Christos Zoulas wrote: > On Jul 18, 6:32pm, lrn1986 at gmail.com (LRN) wrote: > | Content-Transfer-Encoding: 7bit > | > | Directory diff against file 5.04 is attached. > | It is not perfect by any measure, especially the bits in configure.ac > | and Makefile.am, and i've chosen to cull away missing functionality > | instead of porting it, such as symlink support (MSVCRT does not seem to > | be symlink-aware even if Windows is) and archive unpacking with > | fork()/exec() (can be rewritten with spawn(), but that's > > I'd rather keep this around as a diff since I would prefer mingw32 to > fix their own issues (printf c99 standard formats etc), rather than > polluting the code. 1) There are ways to present these fixes in a nicer way, such as the way glib does it (and i can do that) 2) It will probably take some years for MinGW maintainers to add C99-conformant printf implementation, so it's not a temporary issue you can wait upon (actually, you can, but i wouldn't want you to) 3) There are other fixes which may actually go into code in their present form, no need to dump everything together. Though i have a bad habit of presenting independent patches in a single lump, especially in this particular case, since i didn't have any VCS to work with, just a simple recursive diff (and that i can fix by creating a local git repository out of a file(1) source snapshot) 4) How exactly are you going to keep it in a diff form? Will people be able to learn about it and use it? Will it be applied automatically for a particular platform? Because the goal is to allow people to compile file(1) and libmagic out of the box, preferably without extra patching that needs to be done manually. > | (with altered magic.mgc search paths, yay!). > | > | P.S. is there a way to prevent libtool from picking up compiled file.exe > | from the source tree when calling 'make' again? Because if this file.exe > | doesn't work (wich is often the case during development), libtool's > | magic tests will fail, and libtool will complain about missing real > | libraries. > > This is not desirable again because if the version of the magic file > changes with the new development code, building the magic file with > the old executable will not work. How could the version of the magic file that is located within the source tree (and should be preferably compiled with the version of the file program built from the same source tree, as you have said) be connected with actual magic file installed in the system along with the file program actually used to do things? Either i don't know something about the shell/libtool/gcc (which is probable), or you misunderstood my post-scriptum (which is also probable). I think, actually, that on *nix this problem can be (and probably is) dodged by chmod'ing newly created executables as non-executable (aren't they made that way by default?), preventing shell from picking them up when libtool runs a `file` command. _______________________________________________ File mailing list File at mx.gw.com http://mx.gw.com/mailman/listinfo/file From christos at zoulas.com Tue Jul 20 20:51:11 2010 From: christos at zoulas.com (Christos Zoulas) Date: Tue, 20 Jul 2010 13:51:11 -0400 Subject: Mingw compatibility patch In-Reply-To: <4C4532B7.8050408@gmail.com> from LRN (Jul 20, 9:23am) Message-ID: <20100720175111.572C256425@rebar.astron.com> On Jul 20, 9:23am, lrn1986 at gmail.com (LRN) wrote: -- Subject: Re: Mingw compatibility patch | 1) There are ways to present these fixes in a nicer way, such as the way | glib does it (and i can do that) The fewer ifdefs in the code the better. | 2) It will probably take some years for MinGW maintainers to add | C99-conformant printf implementation, so it's not a temporary issue you | can wait upon (actually, you can, but i wouldn't want you to) Well, it is 2010 -- it is already 11 years since the standard was published, what are they waiting on? By the time they finish, there will be another c standard published. | 3) There are other fixes which may actually go into code in their | present form, no need to dump everything together. Though i have a bad | habit of presenting independent patches in a single lump, especially in | this particular case, since i didn't have any VCS to work with, just a | simple recursive diff (and that i can fix by creating a local git | repository out of a file(1) source snapshot) I don't mind the single lump. | 4) How exactly are you going to keep it in a diff form? Will people be | able to learn about it and use it? Will it be applied automatically for | a particular platform? Because the goal is to allow people to compile | file(1) and libmagic out of the box, preferably without extra patching | that needs to be done manually. I would just add the diff file to the distribution in README.MIGWIN | How could the version of the magic file that is located within the | source tree (and should be preferably compiled with the version of the | file program built from the same source tree, as you have said) be | connected with actual magic file installed in the system along with the | file program actually used to do things? Either i don't know something | about the shell/libtool/gcc (which is probable), or you misunderstood my | post-scriptum (which is also probable). | I think, actually, that on *nix this problem can be (and probably is) | dodged by chmod'ing newly created executables as non-executable (aren't | they made that way by default?), preventing shell from picking them up | when libtool runs a `file` command. The binary you just compiled is used to compile the magic file in the source tree. It needs to be the binary associated with the same version of the file. christos From lrn1986 at gmail.com Tue Jul 20 22:18:59 2010 From: lrn1986 at gmail.com (LRN) Date: Tue, 20 Jul 2010 23:18:59 +0400 Subject: Mingw compatibility patch In-Reply-To: <20100720175111.572C256425@rebar.astron.com> References: <20100720175111.572C256425@rebar.astron.com> Message-ID: <4C45F6A3.1040405@gmail.com> On 20.07.2010 21:51, Christos Zoulas wrote: > On Jul 20, 9:23am, lrn1986 at gmail.com (LRN) wrote: > -- Subject: Re: Mingw compatibility patch > > | 1) There are ways to present these fixes in a nicer way, such as the way > | glib does it (and i can do that) > > The fewer ifdefs in the code the better. How about a single: #ifdef WIN32 #define SIZE_T_FORMAT "" #else #define SIZE_T_FORMAT "z" #endif And then instead of: printf ("There are %zu thingies", some_sizeof); do it this way: printf ("There are %" SIZE_T_FORMAT "u thingies", some_sizeof); That should pre-process into: printf ("There are %zu thingies", some_sizeof); or printf ("There are %u thingies", some_sizeof); Depending on the platform. Glib has a special /usr/lib/glib-2.0/include/glibconfig.h header for such platform-specific code, but i can't imagine libmagic doing something like that, so that might just go into some header other than magic.h > | 2) It will probably take some years for MinGW maintainers to add > | C99-conformant printf implementation, so it's not a temporary issue you > | can wait upon (actually, you can, but i wouldn't want you to) > > Well, it is 2010 -- it is already 11 years since the standard was published, > what are they waiting on? By the time they finish, there will be another > c standard published. What is Microsoft waiting for? For Windows to dominate the market and wipe out POSIX competition. Which won't happen What are MinGW devs waiting for? Hard to tell. Martin already pointed out that such a task might be outside their scope. And even if it isn't, they are VERY peculiar about code licensing (MinGW code must be 100% public domain, and must derive 100% from public documentation; although the latter is not a problem in this particular case) > > | 4) How exactly are you going to keep it in a diff form? Will people be > | able to learn about it and use it? Will it be applied automatically for > | a particular platform? Because the goal is to allow people to compile > | file(1) and libmagic out of the box, preferably without extra patching > | that needs to be done manually. > > I would just add the diff file to the distribution in README.MIGWIN Oh, well, let's hope people do read such things. I know i do not :) > | How could the version of the magic file that is located within the > | source tree (and should be preferably compiled with the version of the > | file program built from the same source tree, as you have said) be > | connected with actual magic file installed in the system along with the > | file program actually used to do things? Either i don't know something > | about the shell/libtool/gcc (which is probable), or you misunderstood my > | post-scriptum (which is also probable). > | I think, actually, that on *nix this problem can be (and probably is) > | dodged by chmod'ing newly created executables as non-executable (aren't > | they made that way by default?), preventing shell from picking them up > | when libtool runs a `file` command. > > The binary you just compiled is used to compile the magic file in the source > tree. It needs to be the binary associated with the same version of the > file. Isn't it possible to invoke the just-compiled binary file(1) directly, by relative/absolute path (not relying on shell to find the right version), instead of invoking it by name only? And shove it somewhere deep, so that shell won't find it by itself? From christos at zoulas.com Tue Jul 20 22:37:19 2010 From: christos at zoulas.com (Christos Zoulas) Date: Tue, 20 Jul 2010 15:37:19 -0400 Subject: Mingw compatibility patch In-Reply-To: <4C45F6A3.1040405@gmail.com> from LRN (Jul 20, 11:18pm) Message-ID: <20100720193719.9EE0E56425@rebar.astron.com> On Jul 20, 11:18pm, lrn1986 at gmail.com (LRN) wrote: -- Subject: Re: Mingw compatibility patch | How about a single: | #ifdef WIN32 | #define SIZE_T_FORMAT "" | #else | #define SIZE_T_FORMAT "z" | #endif | | And then instead of: | printf ("There are %zu thingies", some_sizeof); | do it this way: | printf ("There are %" SIZE_T_FORMAT "u thingies", some_sizeof); | | That should pre-process into: | printf ("There are %zu thingies", some_sizeof); | or | printf ("There are %u thingies", some_sizeof); | Depending on the platform. I think that would work, but might fail where size_t is "unsigned long". Isn't that the case for WIN64? | Glib has a special /usr/lib/glib-2.0/include/glibconfig.h header for | such platform-specific code, but i can't imagine libmagic doing | something like that, so that might just go into some header other than | magic.h It can go to a private header. | > | 4) How exactly are you going to keep it in a diff form? Will people be | > | able to learn about it and use it? Will it be applied automatically for | > | a particular platform? Because the goal is to allow people to compile | > | file(1) and libmagic out of the box, preferably without extra patching | > | that needs to be done manually. | > | > I would just add the diff file to the distribution in README.MIGWIN | Oh, well, let's hope people do read such things. I know i do not :) :-) ` | > | How could the version of the magic file that is located within the | > | source tree (and should be preferably compiled with the version of the | > | file program built from the same source tree, as you have said) be | > | connected with actual magic file installed in the system along with the | > | file program actually used to do things? Either i don't know something | > | about the shell/libtool/gcc (which is probable), or you misunderstood my | > | post-scriptum (which is also probable). | > | I think, actually, that on *nix this problem can be (and probably is) | > | dodged by chmod'ing newly created executables as non-executable (aren't | > | they made that way by default?), preventing shell from picking them up | > | when libtool runs a `file` command. | > | > The binary you just compiled is used to compile the magic file in the source | > tree. It needs to be the binary associated with the same version of the | > file. | Isn't it possible to invoke the just-compiled binary file(1) directly, | by relative/absolute path (not relying on shell to find the right | version), instead of invoking it by name only? And shove it somewhere | deep, so that shell won't find it by itself? I think it already does that. He was worried about making the build fail if there is a bug in the new binary, if I understand correctly. christos From j at hug.gs Tue Jul 20 23:10:34 2010 From: j at hug.gs (Dr. Jesus) Date: Tue, 20 Jul 2010 13:10:34 -0700 Subject: Mingw compatibility patch In-Reply-To: <20100720193719.9EE0E56425@rebar.astron.com> References: <4C45F6A3.1040405@gmail.com> <20100720193719.9EE0E56425@rebar.astron.com> Message-ID: On Tue, Jul 20, 2010 at 12:37 PM, Christos Zoulas wrote: > | That should pre-process into: > | printf ("There are %zu thingies", some_sizeof); > | or > | printf ("There are %u thingies", some_sizeof); > | Depending on the platform. > > I think that would work, but might fail where size_t is "unsigned long". > Isn't that the case for WIN64? size_t is 8 bytes long on win64. The equivalent of "z" is "I" or "I64" for Microsoft's CRT. http://msdn.microsoft.com/en-us/library/tcxf1dw6(v=VS.71).aspx For what it's worth, I think it might be easier to just build in a small printf module from e.g. the Linux kernel or uclibc or vstr or whatever. Trying to make format strings portable is yet another detour between memory and the screen, and most standards-compliant CRT versions are huge enough as it is. From christos at zoulas.com Wed Jul 21 00:20:14 2010 From: christos at zoulas.com (Christos Zoulas) Date: Tue, 20 Jul 2010 17:20:14 -0400 Subject: Mingw compatibility patch In-Reply-To: from "Dr. Jesus" (Jul 20, 1:10pm) Message-ID: <20100720212014.E806756425@rebar.astron.com> On Jul 20, 1:10pm, j at hug.gs ("Dr. Jesus") wrote: -- Subject: Re: Mingw compatibility patch | On Tue, Jul 20, 2010 at 12:37 PM, Christos Zoulas wrote: | > | That should pre-process into: | > | printf ("There are %zu thingies", some_sizeof); | > | or | > | printf ("There are %u thingies", some_sizeof); | > | Depending on the platform. | > | > I think that would work, but might fail where size_t is "unsigned long". | > Isn't that the case for WIN64? | | size_t is 8 bytes long on win64. The equivalent of "z" is "I" or | "I64" for Microsoft's CRT. | | http://msdn.microsoft.com/en-us/library/tcxf1dw6(v=VS.71).aspx | | For what it's worth, I think it might be easier to just build in a | small printf module from e.g. the Linux kernel or uclibc or vstr or | whatever. Trying to make format strings portable is yet another | detour between memory and the screen, and most standards-compliant CRT | versions are huge enough as it is. there is the ugly and portable solution to use "%lu", (unsigned long)size_t_var christos From j at hug.gs Wed Jul 21 00:27:33 2010 From: j at hug.gs (Dr. Jesus) Date: Tue, 20 Jul 2010 14:27:33 -0700 Subject: Mingw compatibility patch In-Reply-To: <20100720212014.E806756425@rebar.astron.com> References: <20100720212014.E806756425@rebar.astron.com> Message-ID: On Tue, Jul 20, 2010 at 2:20 PM, Christos Zoulas wrote: > On Jul 20, ?1:10pm, j at hug.gs ("Dr. Jesus") wrote: > -- Subject: Re: Mingw compatibility patch > > | On Tue, Jul 20, 2010 at 12:37 PM, Christos Zoulas wrote: > | > | That should pre-process into: > | > | printf ("There are %zu thingies", some_sizeof); > | > | or > | > | printf ("There are %u thingies", some_sizeof); > | > | Depending on the platform. > | > > | > I think that would work, but might fail where size_t is "unsigned long". > | > Isn't that the case for WIN64? > | > | size_t is 8 bytes long on win64. ?The equivalent of "z" is "I" or > | "I64" for Microsoft's CRT. > | > | ? http://msdn.microsoft.com/en-us/library/tcxf1dw6(v=VS.71).aspx > | > | For what it's worth, I think it might be easier to just build in a > | small printf module from e.g. the Linux kernel or uclibc or vstr or > | whatever. ?Trying to make format strings portable is yet another > | detour between memory and the screen, and most standards-compliant CRT > | versions are huge enough as it is. > > there is the ugly and portable solution to use "%lu", (unsigned long)size_t_var Sure, until you build on a platform where sizeof(size_t) != sizeof(unsigned long). All new macs are in this boat: $ file `which file` /usr/bin/file: Mach-O universal binary with 2 architectures /usr/bin/file (for architecture x86_64): Mach-O 64-bit executable x86_64 /usr/bin/file (for architecture i386): Mach-O executable i386 $ uname -a Darwin hurricane-2.local 10.4.0 Darwin Kernel Version 10.4.0: Fri Apr 23 18:28:53 PDT 2010; root:xnu-1504.7.4~1/RELEASE_I386 i386 i386 One could use %llu instead, but then you need macros to abstract the format string and the cast. From lrn1986 at gmail.com Wed Jul 21 00:28:14 2010 From: lrn1986 at gmail.com (LRN) Date: Wed, 21 Jul 2010 01:28:14 +0400 Subject: Mingw compatibility patch In-Reply-To: <20100720193719.9EE0E56425@rebar.astron.com> References: <20100720193719.9EE0E56425@rebar.astron.com> Message-ID: <4C4614EE.4070907@gmail.com> On 20.07.2010 23:37, Christos Zoulas wrote: > On Jul 20, 11:18pm, lrn1986 at gmail.com (LRN) wrote: > -- Subject: Re: Mingw compatibility patch > > | How about a single: > | #ifdef WIN32 > | #define SIZE_T_FORMAT "" > | #else > | #define SIZE_T_FORMAT "z" > | #endif > | > | And then instead of: > | printf ("There are %zu thingies", some_sizeof); > | do it this way: > | printf ("There are %" SIZE_T_FORMAT "u thingies", some_sizeof); > | > | That should pre-process into: > | printf ("There are %zu thingies", some_sizeof); > | or > | printf ("There are %u thingies", some_sizeof); > | Depending on the platform. > > I think that would work, but might fail where size_t is "unsigned long". It is. However, this simple testcase: #include int main (int argc, char **argv) { size_t s = (size_t) -1; printf ("There are %lu thingies", s); return 0; } produces a warning ( warning: format '%lu' expects type 'long unsigned int', but argument 2 has type 'size_t' ) when compiled with -Wall, and does not produce one if i change "%lu" to "%u". Both will produce the same output: There are 4294967295 thingies > Isn't that the case for WIN64? OK, MinGW defines #define __SIZE_TYPE__ long unsigned int while MinGW64 defines it as #ifdef _WIN64 #define __SIZE_TYPE__ long long unsigned int #else #define __SIZE_TYPE__ long unsigned int #endif The above testcase does, when compiled with MinGW64, not produce a warning only when "%I64u" format is used (can't verify that the output is correct, because there are some errors during the compilation that i can't fix and don't care to fix, because i'm not interested in using MinGW64 at the moment). So unless there are other varieties, this can be done as: #ifdef WIN32 #ifdef _WIN64 #define SIZE_T_FORMAT "I64" #else #define SIZE_T_FORMAT "" #endif #else #define SIZE_T_FORMAT "z" #endif > | Glib has a special /usr/lib/glib-2.0/include/glibconfig.h header for > | such platform-specific code, but i can't imagine libmagic doing > | something like that, so that might just go into some header other than > | magic.h > > It can go to a private header. file.h should do, it is included in all source files that use "%z" format > |> | How could the version of the magic file that is located within the > |> | source tree (and should be preferably compiled with the version of the > |> | file program built from the same source tree, as you have said) be > |> | connected with actual magic file installed in the system along with the > |> | file program actually used to do things? Either i don't know something > |> | about the shell/libtool/gcc (which is probable), or you misunderstood my > |> | post-scriptum (which is also probable). > |> | I think, actually, that on *nix this problem can be (and probably is) > |> | dodged by chmod'ing newly created executables as non-executable (aren't > |> | they made that way by default?), preventing shell from picking them up > |> | when libtool runs a `file` command. > |> > |> The binary you just compiled is used to compile the magic file in the source > |> tree. It needs to be the binary associated with the same version of the > |> file. > | Isn't it possible to invoke the just-compiled binary file(1) directly, > | by relative/absolute path (not relying on shell to find the right > | version), instead of invoking it by name only? And shove it somewhere > | deep, so that shell won't find it by itself? > > I think it already does that. If it already does that, why do i get file(1) from the source tree called by libtool? ...Wait a second...How exactly does shell look for executables on *nix? I remember reading somewhere that the fact that MSys shell looks for executables first in '.' is a non-POSIX behaviour and is done to maintain compatibility with Win32. *Checks on Debian* Yes, that's it. I've created two myscript.sh - one in my current (home) directory, and another in /bin/ directory, each one will print its location upon execution. When i invoke 'myscript.sh', the shell reads /bin/myscript.sh first; i have to use './myscript.sh' to invoke myscript in current directory. I see why people insist on calling things in current directory with './' prefix, and why that never seemed to make a difference in Msys - because on MSys $PATH begins with '.:/usr/local/bin:/mingw/bin:/bin' by default. I think this is fixable on file(1) side (use different executable name, rename it to 'file' on installation). It is also possible to fix MSys by putting '.' into the end of the PATH, but that might have unforeseen consequences ?, because '.' is the first entry in the PATH by default (if i am not mistaken), and people tend to rely on defaults... From christos at zoulas.com Wed Jul 21 01:00:27 2010 From: christos at zoulas.com (Christos Zoulas) Date: Tue, 20 Jul 2010 18:00:27 -0400 Subject: Mingw compatibility patch In-Reply-To: from "Dr. Jesus" (Jul 20, 2:27pm) Message-ID: <20100720220028.043995653E@rebar.astron.com> On Jul 20, 2:27pm, j at hug.gs ("Dr. Jesus") wrote: -- Subject: Re: Mingw compatibility patch | > there is the ugly and portable solution to use "%lu", (unsigned long)size= | _t_var | | Sure, until you build on a platform where sizeof(size_t) !=3D | sizeof(unsigned long). All new macs are in this boat: | | $ file `which file` | /usr/bin/file: Mach-O universal binary with 2 architectures | /usr/bin/file (for architecture x86_64): Mach-O 64-bit executable x8= | 6_64 | /usr/bin/file (for architecture i386): Mach-O executable i386 | $ uname -a | Darwin hurricane-2.local 10.4.0 Darwin Kernel Version 10.4.0: Fri Apr | 23 18:28:53 PDT 2010; root:xnu-1504.7.4~1/RELEASE_I386 i386 i386 | | One could use %llu instead, but then you need macros to abstract the | format string and the cast. Really MacOS is not LP64? christos From j at hug.gs Wed Jul 21 01:32:06 2010 From: j at hug.gs (Dr. Jesus) Date: Tue, 20 Jul 2010 15:32:06 -0700 Subject: Mingw compatibility patch In-Reply-To: <20100720220028.043995653E@rebar.astron.com> References: <20100720220028.043995653E@rebar.astron.com> Message-ID: On Tue, Jul 20, 2010 at 3:00 PM, Christos Zoulas wrote: > On Jul 20, ?2:27pm, j at hug.gs ("Dr. Jesus") wrote: > -- Subject: Re: Mingw compatibility patch > > | > there is the ugly and portable solution to use "%lu", (unsigned long)size= > | _t_var > | > | Sure, until you build on a platform where sizeof(size_t) !=3D > | sizeof(unsigned long). ?All new macs are in this boat: > | > | $ file `which file` > | /usr/bin/file: Mach-O universal binary with 2 architectures > | /usr/bin/file (for architecture x86_64): ? ? ? ?Mach-O 64-bit executable x8= > | 6_64 > | /usr/bin/file (for architecture i386): ?Mach-O executable i386 > | $ uname -a > | Darwin hurricane-2.local 10.4.0 Darwin Kernel Version 10.4.0: Fri Apr > | 23 18:28:53 PDT 2010; root:xnu-1504.7.4~1/RELEASE_I386 i386 i386 > | > | One could use %llu instead, but then you need macros to abstract the > | format string and the cast. > > Really MacOS is not LP64? When you put it like that it sounds sort of crazy, so I double-checked: $ cat test13.c #include #include int main(void) { printf("%lu %lu\n", sizeof(size_t), sizeof(unsigned long)); return EXIT_SUCCESS; } $ ./test13 8 8 $ file test13 test13: Mach-O 64-bit executable x86_64 Whoops. I must have been thinking of Windows, where DWORDs are unsigned longs and have to be 32 bit or all kinds of things would break. From jkaluza at redhat.com Wed Jul 21 12:30:35 2010 From: jkaluza at redhat.com (Jan Kaluza) Date: Wed, 21 Jul 2010 11:30:35 +0200 Subject: Binary pattern vs Text pattern ideas Message-ID: <201007211130.35921.jkaluza@redhat.com> Hi, sorry if I'm reinventing the wheel by this email, but File is quite old project and I wasn't able to find if somebody sent the idea I want to discuss about. Currently File distinguish between binary magic pattern and text magic pattern. This brings some problems, because binary patterns are tried as first and sometimes they are too general. For example following pattern .... : 0 string #!/usr/bin/env a >15 string >\0 %s script text executable ... is matched before: 0 search/1 #!\/usr/bin/env\ python Python script text executable In that example it's hard to fix that by using "search" instead of "string", because we need to get the name of interpreter somehow. If I'm missing something here and It's doable, please correct me. The same (and in my opinion worse) situation is with search/xxxx/b patterns in sgml. This leads to having some scripts to be identified as HTML documented just because sgml uses binary search which is preferred. I think there are three solutions for this situation: 1) Convert all binary patterns, which are currently used to match text files, into text patterns and use binary patterns only if it's really needed for binary formats (I don't think search/b is needed for example in "sgml"). This could fix lot of bugs, but it would not fix the "string" pattern example I've pasted above. 2) Handle binary and text patterns together and give them priority according to "strength". I think this is better solution. It's basically the same as treating text patterns as binary patterns. The problem here could be a performance degradation, because we will have to check more patterns for binary files, but for text files it could be faster, because we don't have to go through all binary patterns. I think File would match better with this change However, it will still not fix some too general patterns. Imagine for example that you have unified diff of html file. You can write very good pattern to recognize diff and because unified diff is very unique format, it will almost always match just for a diff. Problem is it has low strength to beat: #0 search/4096/cwbt \