Go to the first, previous, next, last section, table of contents.

5 Invoking `gperf`

There are many options to gperf. They were added to make the program more convenient for use with real applications. “On-line” help is readily available via the ‘--help’ option. Here is the complete list of options.

5.1 Specifying the Location of the Output File

‘--output-file=file’: Allows you to specify the name of the file to which the output is written to.

The results are written to standard output if no output file is specified or if it is ‘-’.

5.2 Options that affect Interpretation of the Input File

These options are also available as declarations in the input file (see section 4.1.1.2 Gperf Declarations).

‘-e keyword-delimiter-list’
‘--delimiters=keyword-delimiter-list’: Allows you to provide a string containing delimiters used to separate keywords from their attributes. The default is ",". This option is essential if you want to use keywords that have embedded commas or newlines. One useful trick is to use -e'TAB', where TAB is the literal tab character.
‘-t’
‘--struct-type’: Allows you to include a struct type declaration for generated code. Any text before a pair of consecutive ‘%%’ is considered part of the type declaration. Keywords and additional fields may follow this, one group of fields per line. A set of examples for generating perfect hash tables and functions for Ada, C, C++, Pascal, Modula 2, Modula 3 and JavaScript reserved words are distributed with this release.
‘--ignore-case’: Consider upper and lower case ASCII characters as equivalent. The string comparison will use a case insignificant character comparison. Note that locale dependent case mappings are ignored. This option is therefore not suitable if a properly internationalized or locale aware case mapping should be used. (For example, in a Turkish locale, the upper case equivalent of the lowercase ASCII letter ‘i’ is the non-ASCII character ‘capital i with dot above’.) For this case, it is better to apply an uppercase or lowercase conversion on the string before passing it to the gperf generated function.

5.3 Options to specify the Language for the Output Code

These options are also available as declarations in the input file (see section 4.1.1.2 Gperf Declarations).

‘-L generated-language-name’

‘--language=generated-language-name’

Instructs gperf to generate code in the language specified by the option's argument. Languages handled are currently:

‘KR-C’: Old-style K&R C. This language is understood by old-style C compilers and ANSI C compilers, but ANSI C compilers may flag warnings (or even errors) because of lacking ‘const’.
‘C’: Common C. This language is understood by ANSI C compilers, and also by old-style C compilers, provided that you #define const to empty for compilers which don't know about this keyword.
‘ANSI-C’: ANSI C. This language is understood by ANSI C compilers and C++ compilers.
‘C++’: C++. This language is understood by C++ compilers.

The default is ANSI-C.

‘-a’

This option is supported for compatibility with previous releases of gperf. It does not do anything.

‘-g’

This option is supported for compatibility with previous releases of gperf. It does not do anything.

5.4 Options for fine tuning Details in the Output Code

Most of these options are also available as declarations in the input file (see section 4.1.1.2 Gperf Declarations).

‘-K slot-name’
‘--slot-name=slot-name’: This option is only useful when option ‘-t’ (or, equivalently, the ‘%struct-type’ declaration) has been given. By default, the program assumes the structure component identifier for the keyword is ‘name’. This option allows an arbitrary choice of identifier for this component, although it still must occur as the first field in your supplied struct.
‘-F initializers’
‘--initializer-suffix=initializers’: This option is only useful when option ‘-t’ (or, equivalently, the ‘%struct-type’ declaration) has been given. It permits to specify initializers for the structure members following slot-name in empty hash table entries. The list of initializers should start with a comma. By default, the emitted code will zero-initialize structure members following slot-name.
‘-H hash-function-name’
‘--hash-function-name=hash-function-name’: Allows you to specify the name for the generated hash function. Default name is ‘hash’. This option permits the use of two hash tables in the same file.
‘-N lookup-function-name’
‘--lookup-function-name=lookup-function-name’: Allows you to specify the name for the generated lookup function. Default name is ‘in_word_set’. This option permits multiple generated hash functions to be used in the same application.
‘-Z class-name’
‘--class-name=class-name’: This option is only useful when option ‘-L C++’ (or, equivalently, the ‘%language=C++’ declaration) has been given. It allows you to specify the name of generated C++ class. Default name is Perfect_Hash.
‘-7’
‘--seven-bit’: This option specifies that all strings that will be passed as arguments to the generated hash function and the generated lookup function will solely consist of 7-bit ASCII characters (bytes in the range 0..127). (Note that the ANSI C functions isalnum and isgraph do not guarantee that a byte is in this range. Only an explicit test like ‘c >= 'A' && c <= 'Z'’ guarantees this.) This was the default in versions of gperf earlier than 2.7; now the default is to support 8-bit and multibyte characters.
‘-l’
‘--compare-lengths’: Compare keyword lengths before trying a string comparison. This option is mandatory for binary comparisons (see section 4.3 Use of NUL bytes). It also might cut down on the number of string comparisons made during the lookup, since keywords with different lengths are never compared via strcmp. However, using ‘-l’ might greatly increase the size of the generated C code if the lookup table range is large (which implies that the switch option ‘-S’ or ‘%switch’ is not enabled), since the length table contains as many elements as there are entries in the lookup table.
‘-c’
‘--compare-strncmp’: Generates C code that uses the strncmp function to perform string comparisons. The default action is to use strcmp.
‘-C’
‘--readonly-tables’: Makes the contents of all generated lookup tables constant, i.e., “readonly”. Many compilers can generate more efficient code for this by putting the tables in readonly memory.
‘-E’
‘--enum’: Define constant values using an enum local to the lookup function rather than with #defines. This also means that different lookup functions can reside in the same file. Thanks to James Clark <jjc@ai.mit.edu>.
‘-I’
‘--includes’: Include the necessary system include file, <string.h>, at the beginning of the code. By default, this is not done; the user must include this header file himself to allow compilation of the code.
‘-G’
‘--global-table’: Generate the static table of keywords as a static global variable, rather than hiding it inside of the lookup function (which is the default behavior).
‘-P’
‘--pic’: Optimize the generated table for inclusion in shared libraries. This reduces the startup time of programs using a shared library containing the generated code. If the option ‘-t’ (or, equivalently, the ‘%struct-type’ declaration) is also given, the first field of the user-defined struct must be of type ‘int’, not ‘char *’, because it will contain offsets into the string pool instead of actual strings. To convert such an offset to a string, you can use the expression ‘stringpool + o’, where o is the offset. The string pool name can be changed through the option ‘--string-pool-name’.
‘-Q string-pool-name’
‘--string-pool-name=string-pool-name’: Allows you to specify the name of the generated string pool created by option ‘-P’. The default name is ‘stringpool’. This option permits the use of two hash tables in the same file, with ‘-P’ and even when the option ‘-G’ (or, equivalently, the ‘%global-table’ declaration) is given.
‘--null-strings’: Use NULL strings instead of empty strings for empty keyword table entries. This reduces the startup time of programs using a shared library containing the generated code (but not as much as option ‘-P’), at the expense of one more test-and-branch instruction at run time.
‘--constants-prefix=prefix’: Allows you to specify a prefix for the constants TOTAL_KEYWORDS, MIN_WORD_LENGTH, MAX_WORD_LENGTH, and so on. This option permits the use of two hash tables in the same file, even when the option ‘-E’ (or, equivalently, the ‘%enum’ declaration) is not given or the option ‘-G’ (or, equivalently, the ‘%global-table’ declaration) is given.
‘-W hash-table-array-name’
‘--word-array-name=hash-table-array-name’: Allows you to specify the name for the generated array containing the hash table. Default name is ‘wordlist’. This option permits the use of two hash tables in the same file, even when the option ‘-G’ (or, equivalently, the ‘%global-table’ declaration) is given.
‘--length-table-name=length-table-array-name’: Allows you to specify the name for the generated array containing the length table. Default name is ‘lengthtable’. This option permits the use of two length tables in the same file, even when the option ‘-G’ (or, equivalently, the ‘%global-table’ declaration) is given.
‘-S total-switch-statements’
‘--switch=total-switch-statements’: Causes the generated C code to use a switch statement scheme, rather than an array lookup table. This can lead to a reduction in both time and space requirements for some input files. The argument to this option determines how many switch statements are generated. A value of 1 generates 1 switch containing all the elements, a value of 2 generates 2 tables with 1/2 the elements in each switch, etc. This is useful since many C compilers cannot correctly generate code for large switch statements. This option was inspired in part by Keith Bostic's original C program.
‘-T’
‘--omit-struct-type’: Prevents the transfer of the type declaration to the output file. Use this option if the type is already defined elsewhere.
‘-p’: This option is supported for compatibility with previous releases of gperf. It does not do anything.

5.5 Options for changing the Algorithms employed by `gperf`

‘-k selected-byte-positions’
‘--key-positions=selected-byte-positions’: Allows selection of the byte positions used in the keywords' hash function. The allowable choices range between 1-255, inclusive. The positions are separated by commas, e.g., ‘-k 9,4,13,14’; ranges may be used, e.g., ‘-k 2-7’; and positions may occur in any order. Furthermore, the wildcard '*' causes the generated hash function to consider all byte positions in each keyword, whereas '$' instructs the hash function to use the “final byte” of a keyword (this is the only way to use a byte position greater than 255, incidentally). For instance, the option ‘-k 1,2,4,6-10,'$'’ generates a hash function that considers positions 1,2,4,6,7,8,9,10, plus the last byte in each keyword (which may be at a different position for each keyword, obviously). Keywords with length less than the indicated byte positions work properly, since selected byte positions exceeding the keyword length are simply not referenced in the hash function. This option is not normally needed since version 2.8 of gperf; the default byte positions are computed depending on the keyword set, through a search that minimizes the number of byte positions.
‘-D’
‘--duplicates’: Handle keywords whose selected byte sets hash to duplicate values. Duplicate hash values can occur if a set of keywords has the same names, but possesses different attributes, or if the selected byte positions are not well chosen. With the -D option gperf treats all these keywords as part of an equivalence class and generates a perfect hash function with multiple comparisons for duplicate keywords. It is up to you to completely disambiguate the keywords by modifying the generated C code. However, gperf helps you out by organizing the output. Using this option usually means that the generated hash function is no longer perfect. On the other hand, it permits gperf to work on keyword sets that it otherwise could not handle.
‘-m iterations’
‘--multiple-iterations=iterations’: Perform multiple choices of the ‘-i’ and ‘-j’ values, and choose the best results. This increases the running time by a factor of iterations but does a good job minimizing the generated table size.
‘-i initial-value’
‘--initial-asso=initial-value’: Provides an initial value for the associate values array. Default is 0. Increasing the initial value helps inflate the final table size, possibly leading to more time efficient keyword lookups. Note that this option is not particularly useful when ‘-S’ (or, equivalently, ‘%switch’) is used. Also, ‘-i’ is overridden when the ‘-r’ option is used.
‘-j jump-value’
‘--jump=jump-value’: Affects the “jump value”, i.e., how far to advance the associated byte value upon collisions. Jump-value is rounded up to an odd number, the default is 5. If the jump-value is 0 gperf jumps by random amounts.
‘-n’
‘--no-strlen’: Instructs the generator not to include the length of a keyword when computing its hash value. This may save a few assembly instructions in the generated lookup table.
‘-r’
‘--random’: Utilizes randomness to initialize the associated values table. This frequently generates solutions faster than using deterministic initialization (which starts all associated values at 0). Furthermore, using the randomization option generally increases the size of the table.
‘-s size-multiple’
‘--size-multiple=size-multiple’: Affects the size of the generated hash table. The numeric argument for this option indicates “how many times larger or smaller” the maximum associated value range should be, in relationship to the number of keywords. It can be written as an integer, a floating-point number or a fraction. For example, a value of 3 means “allow the maximum associated value to be about 3 times larger than the number of input keywords”. Conversely, a value of 1/3 means “allow the maximum associated value to be about 3 times smaller than the number of input keywords”. Values smaller than 1 are useful for limiting the overall size of the generated hash table, though the option ‘-m’ is better at this purpose. If `generate switch' option ‘-S’ (or, equivalently, ‘%switch’) is not enabled, the maximum associated value influences the static array table size, and a larger table should decrease the time required for an unsuccessful search, at the expense of extra table space. The default value is 1, thus the default maximum associated value about the same size as the number of keywords (for efficiency, the maximum associated value is always rounded up to a power of 2). The actual table size may vary somewhat, since this technique is essentially a heuristic.

5.6 Informative Output

‘-h’
‘--help’: Prints a short summary on the meaning of each program option. Aborts further program execution.
‘-v’
‘--version’: Prints out the current version number.
‘-d’
‘--debug’: Enables the debugging option. This produces verbose diagnostics to “standard error” when gperf is executing. It is useful both for maintaining the program and for determining whether a given set of options is actually speeding up the search for a solution. Some useful information is dumped at the end of the program when the ‘-d’ option is enabled.