current draft of POSIX 1003.2B file

Daniel Quinlan quinlan at proton.pathname.com
Tue Oct 22 02:04:00 EDT 1996


POSIX 1003.2B is a set of changes that are being made to the base
1003.2 document that was published in 1992.  Here I have taken those
changes and applied them to the original standard.

There are two areas that readers of this list may want to focus on:

  1. Incompatibilities between "our" file and this specification.

  2. Problems in the standard itself.  It looks quite similar to SVR4
     version of file.  It looks even more similar to the Sun version.

I would like to collect any comments and relay them to the POSIX
standards group.  Since we can't expect another revision of POSIX.2
for a while, we should really push POSIX to get it right this time.

The first RATIONALE is a "rationale" of the changes.  The second is
the RATIONALE for the file specification.  I can detect no changes
since the drafting I received a year ago.

==========================================================================

 BEGIN_RATIONALE

 Rationale:  The changes in this clause, except for those related to
 symbolic links, satisfy the following requirement from ISO/IEC 9945-
 2:1993 Annex H.1:

    (12)  The file utility should allow user-specified algorithms for file
          type recognition, similar to those used in the historical
          /etc/magic file.

 END_RATIONALE

==========================================================================

 5.14  file - Determine file type

 5.14.1  Synopsis

 file  [-dhi] [-M file] [-m file] file ...

 5.14.2  Description

 The file utility shall perform a series of tests on each specified file
 in an attempt to classify it.

     (1)  If the file is not a regular file, its file type shall be
          identified.  The file types directory, FIFO, block special, and
          character special shall be identified as such.  Other
          implementation-defined file types may also be identified.

     (2)  If the file is a regular file, and

           (a)  The file is zero-length, it shall be identified as an
                empty file.

           (b)  The file is not zero-length, file shall examine an initial
                segment of the file and shall make a guess at identifying
                its contents or whether it is an executable binary file.
                (The answer is not guaranteed to be correct.)

 If file does not exist, cannot be read, or its file status could not be
 determined, the output shall indicate that the file was processed, but
 that its type could not be determined.

 If file is a symbolic link, by default the link shall be resolved and
 file shall test the type of file referenced by the symbolic link.

 5.14.3  Options

 The file utility shall conform to the utility argument syntax
 guidelines described in 2.10.2.

 The following options shall be supported by the implementation:

    -d          Apply any default system tests to the file.

    -h          When a symbolic link is encountered, identify the file
                as a symbolic link.  If -h is not specified and file is
                a symbolic link that refers to a nonexistent file, file
                shall identify the file as a symbolic link, as if -h
                had been specified.

    -i          If a file is a regular file, do not attempt to classify
                the type of the file further, but identify the file as
                specified in 5.14.6.1, using a <type> string that
                contains the string regular file.

    -M file     Specify the name of a file containing tests that shall
                be applied to a file in order to classify it (see
                5.14.7).  No default system tests shall be applied.

    -m file     Specify the name of a file containing tests that shall
                be applied to a file in order to classify it (see
                5.14.7).

 If multiple instances of the -m, -d, or -M options are specified, the
 concatenation of the tests specified, in the order specified, shall be
 the set of tests that are applied.  If a -M option is specified, no
 tests other than those specified using the -d, -M, and -m options
 shall be applied to the file.  If neither the -d nor -M options are
 specified, any default system tests shall be applied after any tests
 specified using the -m option.

 5.14.4  Operands

 The following operand shall be supported by the implementation:

    file        A pathname of a file to be tested.

 5.14.5  External Influences

 5.14.5.1  Standard Input

 None.

 5.14.5.2  Input Files

 The file can be any file type.

 5.14.5.3  Environment Variables

 The following environment variables shall affect the execution of file:

    LANG               This variable shall determine the locale to use for
                       the locale categories when both LC_ALL and the
                       corresponding environment variable (beginning with
                       LC_) do not specify a locale.  See 2.6.

    LC_ALL             This variable shall determine the locale to be used
                       to override any values for locale categories
                       specified by the settings of LANG or any
                       environment variables beginning with LC_.

    LC_CTYPE           This variable shall determine the interpretation of
                       sequences of bytes of text data as characters
                       (e.g., single- versus multibyte characters in
                       arguments and input files).

    LC_MESSAGES        This variable shall determine the language in which
                       messages should be written.

 5.14.5.4  Asynchronous Events

 Default.

 5.14.6  External Effects

 5.14.6.1  Standard Output

 In the POSIX Locale, the following format shall be used to identify each
 file operand specified:

       "%s: %s\n", <file>, <type>

 The values for <type> are unspecified, except that in the POSIX Locale,
 if file is identified as one of the types listed in Table 5-1, <type>
 shall contain (but is not limited to) the corresponding string.  Each
 space shown in the strings shall be exactly one <space> character.

                     Table 5-1  -  file Output Strings

 _________________________________________________________________________
|            If file is a             |  <type> shall contain the string |
|_____________________________________|__________________________________|
|  Directory                          |  directory                       |
|  FIFO                               |  fifo                            |
|  Block special                      |  block special                   |
|  Character special                  |  character special               |
|  Symbolic link                      |  symbolic link to                |
|  Executable binary                  |  executable                      |
|  Empty regular file                 |  empty                           |
|  ar archive library (see 6.1)       |  archive                         |
|  Extended cpio format (see Section  |  cpio archive                    |
|  10.1.2 of POSIX.1 {8})             |                                  |
|  Extended tar format (see Section   |  tar archive                     |
|  10.1.1 of POSIX.1 {8})             |                                  |
|  Shell script                       |  commands text                   |
|  C-language source                  |  c program text                  |
|  FORTRAN source                     |  fortran program text            |
|  Other text file                    |  text                            |
|_____________________________________|__________________________________|

 If file is identified as a symbolic link (see -h), the following
 alternative output format shall be used:

       "%s: %s %s\n", <file>, <type>, <contents of link>

 If the file named by the file operand does not exist or cannot be
 read, the string cannot open shall be included as part of the <type>
 field, but this shall not be considered an error that affects the exit
 status.  If the type of the file named by the file operand cannot be
 determined, the string unknown type shall be included as part of the
 <type> field, but this shall not be considered an error that affects
 the exit status.

 5.14.6.2  Standard Error

 Used only for diagnostic messages.

 5.14.6.3  Output Files

 None.

 5.14.7  Extended Description

 A file specified as an option-argument to the -m or -M options shall
 contain one test per line, which shall be applied to the file.  If the
 test succeeds, the message field of the line shall be printed and no
 further tests shall be applied, with the exception that tests on
 immediately following lines beginning with a single > character shall
 be applied.

 Each line shall be composed of the following four <blank>-separated
 fields:

    offset An unsigned number (optionally preceded by a single >
           character) specifying the offset, in bytes, of the value in
           the file that is to be compared against the value field of
           the line.  If the file is shorter than the specified offset,
           the test shall fail.

           If the offset begins with the character >, the test
           contained in the line shall not be applied to the file
           unless the test on the last line for which the offset did
           not begin with a > was successful.  By default, the offset
           shall be interpreted as an unsigned decimal number.  With a
           leading 0x or 0X, the offset shall be interpreted as a
           hexadecimal number; otherwise, with a leading 0, the offset
           shall be interpreted as an octal number.

    type   The type of the value in the file to be tested.  The type
           shall consist of the type specification characters c, d, f,
           s, and u, specifying character, signed decimal, floating
           point, string, and unsigned decimal, respectively.

           The type string shall be interpreted as the bytes from the
           file starting at the specified offset and including the same
           number of bytes specified by the value field.  If
           insufficient bytes remain in the file past the offset to
           match the value field, the test shall fail.

           The type specification characters d, f, and u can be
           followed by an optional unsigned decimal integer that
           specifies the number of bytes represented by the type.  The
           type specification character f can be followed by an
           optional F, D, or L, indicating that the value is of type
           float, double, or long double, respectively.  The type
           specification characters d and u can be followed by an
           optional C, S, I, or L, indicating that the value is of type
           char, short, int, or long, respectively.

           The default number of bytes represented by the type
           specifiers d, f, and u shall correspond to their respective
           C-language types as follows.  If the system claims
           conformance to the C-Language Development Utilities Option,
           those specifiers shall correspond to the default sizes used
           in the c89 utility.  Otherwise, the default sizes shall be
           implementation defined.

           For the type specifier characters d and u, the default
           number of bytes shall correspond to the size of the basic
           integral data type of the implementation.  For these
           specifier characters, the implementation shall support
           values of the optional number of bytes to be converted
           corresponding to the number of bytes in the C-language types
           char, short, int, or long. These numbers can also be
           specified by an application as the characters C, S, I, and
           L, respectively.  The byte order used when interpreting
           numeric values is implementation defined, but shall
           correspond to the order in which a constant of the
           corresponding type is stored in memory on the system.

           For the type specifier f, the default number of bytes shall
           correspond to the number of bytes in the basic double
           precision floating-point data type of the underlying
           implementation.  The implementation shall support values of
           the optional number of bytes to be converted corresponding
           to the number of bytes in the C-language types float,
           double, and long double. These numbers can also be specified
           by an application as the characters F, D, and L,
           respectively.

           All type specifiers, except for s, can be followed by a mask
           specifier of the form &number. The mask value shall be ANDed
           with the value before the comparison with the value from the
           file is made.  By default, the mask shall be interpreted as
           an unsigned decimal number.  With a leading 0x or 0X, the
           mask shall be interpreted as a unsigned hexadecimal number;
           otherwise, with a leading 0, the mask shall be interpreted
           as an unsigned octal number.

           The strings byte, short, long, and string shall also be
           supported as type fields, being interpreted as dC, dS, dL,
           and s, respectively.

    value  The value to be compared with the value from the file.

           Any value that contains a character that is not a digit,
           other than a leading sign (+ or -) or a leading 0x or 0X,
           shall be interpreted as a string.  The test shall succeed
           only when a string value exactly matches the bytes from the
           file.

           If the value is a string, it can contain the following
           sequences:

              \character
                 The backslash-escape sequences in Table 2-16 (see
                 2.12).  The results of using any other character,

                 other than an octal digit, following the backslash are
                 unspecified.

              \octal
                 Octal sequences that can be used to represent
                 characters with specific coded values.  An octal
                 sequence shall consist of a backslash followed by the
                 longest sequence of one, two, or three octal-digit
                 characters (01234567).  If the size of a byte on the
                 system is greater than 9 b, the valid escape sequence
                 used to represent a byte is implementation defined.

           By default, any value that is not a string shall be
           interpreted as a signed decimal number.  Any such value,
           with a leading 0x or 0X, shall be interpreted as an unsigned
           hexadecimal number; otherwise, with a leading zero, the
           value shall be interpreted as an unsigned octal number.

           If the value is not a string, it can be preceded by a
           character indicating the comparison to be performed.
           Permissible characters and the comparisons they specify are
           as follows:

              =  The test shall succeed if the value from the file
                 equals the value field.

              <  The test shall succeed if the value from the file is
                 less than the value field.

              >  The test shall succeed if the value from the file is
                 greater than the value field.

              &  The test shall succeed if all of the bits in the value
                 field are set in the value from the file.

              ^  The test shall succeed if at least one of the bits in
                 the value field is not set in the value from the file.

              x  The test shall succeed if there is any value in the
                 file.

    message
           The message to be printed if the test succeeds.  The message
           shall be interpreted using the notation for the printf
           formatting specification; see 4.50.7.  If the value field
           was a string, the the value from the file shall be the        b
           argument for the printf formatting specification; otherwise,
           the value from the file shall be the argument.



 5.14.8  Exit Status

 The file utility shall exit with one of the following values:

     0    Successful completion.

    >0    An error occurred.

 5.14.9  Consequences of Errors

 Default.

==========================================================================

 Editor's Note:  The rationale in E.5.14 (IEEE Std 1003.2-1992 pages 987-
 88, lines 9703-49) will be replaced by the following:

 BEGIN_RATIONALE

 file_Rationale._(This_subclause_is_not_a_part_of_P1003.2)

 Historical systems have used a ``magic file'' named /etc/magic to help
 identify file types.  Because it is generally useful for users and
 scripts to be able to identify special file types, the -m flag and a
 portable format for user-created magic files has been specified.  No
 requirement is made that an implementation of file use this method of
 identifying files, only that users be permitted to add their own
 classifying tests.

 In addition, three options have been added to historical practice.  The -
 d flag has been added to permit users to cause their tests to follow any
 default system tests.  The -i flag has been added to permit users to test
 portably for regular files in shell scripts.  The -M flag has been added
 to permit users to ignore any default system tests.

 The historical -c option was omitted as not particularly useful to users
 or portable shell scripts.  In addition, a reasonable implementation of
 the file utility would report any errors found each time the magic file
 is read.

 The historical format of the magic file was the same as that specified by
 the rationale in the previous version of this standard for the offset,
 value, and message fields; however, it used less precise type fields than
 the format specified by the current normative text.  The new type field
 values are a superset of the historical ones.

 The following is an example magic file:

       0  short       070707       cpio archive
       0  short       0143561      byte-swapped cpio archive
       0  string      070707       ASCII cpio archive
       0  long        0177555      very old archive
       0  short       0177545      old archive
       0  short       017437       old packed data
       0  string      \037\036     packed data
       0  string      \377\037     compacted data
       0  string      \037\235     compressed data
       >2 byte&0x80   >0           block compressed
       >2 byte&0x1f   x            %d bits
       0  string      \032\001     Compiled Terminfo Entry
       0  short       0433         Curses screen image
       0  short       0434         Curses screen image
       0  string      <ar>         System V Release 1 archive
       0  string      !<arch>\n__.SYMDEF   archive random library
       0  string      !<arch>      archive
       0  string      ARF_BEGARF   PHIGS clear text archive
       0  long        0x137A2950   scalable OpenFont binary
       0  long        0x137A2951   encrypted scalable OpenFont binary

 END_RATIONALE



More information about the File mailing list