A Short Course in Hungarian Notation

In general, Hungarian notation names a variable with a lower-case prefix to identify the class or usage of the variable. There is no such thing as standard HN, so the point here is to create a consistent notation system. In this class, we'll use the following standards.

      c: signed character
     uc: unsigned character
      i: integer
     ui: unsigned integer
     si: short integer
     li: long integer
      n: an integer number where the actual size is irrelevant
      f: float
      d: double
      s: string of characters 
     sz: string of characters, terminated by a null character
      b: an integer or character being used as a boolean value
     by: single byte
     ct: an integer being used as a counter or tally
      p: pointer to a structure or general void pointer
    pfs: file stream pointer
    pfn: pointer to a function
     px: pointer to a variable of class x, e.g. pi, pf, pli 



These prefixes are combined with an identifying name where each significant part begins with a capital letter:


Function names (other than main) also follow this pattern of upper/lower case:
As the semester goes along, we may add more standard notations. If you find you need a variable that does not fit in these classes, create a new prefix and use it consistently. --PART-BOUNDARY=.19611051453.ZM29627.imm X-Zm-Content-Name: hungarian2.html Content-Description: Hypertext Content-Type: text/html ; name="hungarian2.html" ; charset=us-ascii Greg's guide to Hungarian Notation

Hungarian Notation

First, a little disclaimer:

Please note that this guide is based on MY understanding of Hungarian notation, and may not be completely accurate. Yes, I used to work at Microsoft, but I don't have the guide we used there anymore. This document has been done almost entirely from memory and is based on how I've been writing code for the past few years, so it's probably actually a bastardized version of the form of Hungarian notation used by Microsoft. There's a very brief description of Hungarian notation in Charles Petzold's "Programming Windows", which I checked with while writing this guide. My intent is to go into much more detail than he does, though.

If you DO know Hungarian, and notice anything wrong with this guide, please send me mail. Thanks.

What is Hungarian notation?

Hungarian Notation is a naming convention that (in theory) allows the programmer to determine, the type and use of an identifier (variable, function, constant, etc.) It is frequently used in Windows programming, so a quick guide (such as this one) may be useful if you're working with Windows code but don't know the naming scheme. Note that there is really no such thing as "standard" Hungarian notation (at least as far as I've been able to determine). The basic ideas remain the same, however, and the key is to aim for CONSISTENCY.

Variables

Variable names are probably the most common type of identifiers. A variable name in Hungarian notation consists of three parts: the prefix (or constructor), base type (or tag), and qualifier. Not all of these elements will be present in all variable names -- the only part that is really needed is the base type.

Base Types (Tags)

The Tag is NOT necessarily one of the types directly provided by the programming language; it may be application-defined (for example, a type dbr might represent a database record structure). Tags are short (usually two or three letters) descriptive reminders about the type of value the variable stores. This type will usually only be useful to someone who knows the application and knows what the basic types the application uses are; for example, a tag co could just as easily refer to a coordinate, or a color. Within a given application, however, the co would always have a specific meaning -- all co's would refer to the same type of object, and all references to that type of object would use the tag co.

Common Tags

These are many basic types that are used in any application. These tags are used for these types.

TagDescription
fBoolean flag. The qualifier should be used to describe the condition that causes the flag to be set (for example, fError might be used to indicate a variable that is set when an error condition exists, and clear when there is no error). The actual data representation may be a byte, a word, or even a single bit in a bitfield.
chA single-byte character.
wA machine word (16 bits on Win3.1 X86 machines). This is a somewhat ambiguous tag, and it is often better to use a more specific name. (For example, a word that represents a count of characters is better referred to as a cch than as a w. See the prefixes section for an explanation of the cch notation.)
bA byte (typically 8 bits). See the warnings for w.
lA long integer (typically 32 bits). See the warnings for w.
uAn unsigned value. Usually more accurately used as a prefix with one of the integer types described above. For example, a uw is an unsigned word.
rA single-precision real number (float)
dA double-precision real number (double)
bitA single bit. Typically this is used to specify bits within other types; it is usually more appropriate to use f.
vA void. (Rather C-specific, meaning the type is not specified.) This type will probably never be used without the p prefix. This would usually be used for generic subroutines (such as alloc and free) that work with pointers but don't need to refer to the specific type of data referenced by those pointers.
stA Pascal-type string. That is, the first byte contains the length of the string, and the remainder contains the actual characters in the string.
szA null-terminated (C-style) string.
fnA function. This will almost always have a p prefix, as about the only useful thing you can do with a function (from a variable's perspective) is take the address of it.

Prefixes (Constructors)

The base types are not usually sufficient to describe a variable, since variables frequently refer to more complex types. For example, you may have a pointer to a database record, or an array of coordinates, or a count of colors. In Hungarian notation, these extended types are described by the variable's prefix. The complete type of a variable is given by the combination of the prefix(es) and base type. Yes, it is possible to have more than one prefix -- for example, you may have a pointer to an array of database records.

Common Constructors

It is not usually necessary to create a new prefix, although it is certainly possible to do so. The list of standard prefixes should be sufficient for most uses, however.

ConstructorDescription
pA pointer.
lpA long (far) pointer. (Used on machines with a segmented architecture, such as X86's under DOS or Win3.1).
hpA huge pointer. (Similar to a far pointer, except that it handles crossing segment boundaries during pointer arithmetic correctly.)
rgAn array. An rgch is an array of characters; a pch could point to a specific element in this array. (The notation comes from viewing an array as a mathematical function -- the input is the index, and the output is the value at that index. So the entire array is essentially the "range" of that function.)
iAn index (into an array). For example, an ich could be used to index into an rgch. I've also seen this used for resource IDs under Windows (which makes sense if you think about it -- a resource ID is an index into a resource table).
cA count. cch could be the count of characters in the rgch. (As another example, note that the first byte of an st is the cch of that string.)
dThe difference between two instances of a type. For example, given a type x used to represent an X-coordinate on a graph, a dx could contain the difference on the X-axis of two such coordinates.
hA handle. Handles are commonly used in Windows programming; they represent resources allocated by the system and handed back to the application. On other systems, a "handle" might be a pointer to a pointer, in which case it might be clearer to use a pp (or lplp if appropriate).
mpA specific type of array, a mapping. This prefix is followed by two types, rather than one. The array represents a mapping function from the first type to the second. For example, mpwErrisz could be a mapping of error codes (wErr) to indexes of message strings (isz).
vA global variable (personally I prefer g for this)

Examples

Tags and constructors are both in lower case, with no seperating punctuation, so some ambiguity is possible if you are not careful in choosing your representations. For example, you probably shouldn't use pfn to represent a structure you've defined, as it could be taken as a pointer (p) to a function (fn). (Even if you ARE careful, some ambiguity is still possible. For example, if you have a handle to a pointer (unlikely, but who knows?), you'd want to represent this as hp, which also means huge pointer. Cases like these should be rare, however, and the true type should still be distinguishable from the context of the code.)

Here are some further examples of constructors + tags:

VariableDescription
pchA pointer to a character.
ichAn index into an array of characters.
rgszAn array of null-terminated strings (most likely the values stored in the array are actually pointers to the strings, so you could arguably also use rgpsz).
rgrgxA two-dimensional array of x's. (An array of arrays of x's.)
piszA pointer to an index into an array of null-terminated strings. (Or possibly a pointer to a resource ID of a string -- the real meaning should be clear within the context of the code.)
lpcwFar pointer to a count of words.

Qualifiers

Although the combination of constructors and tags are enough to specify the type of a variable, it won't be sufficient to distinguish the variable from others of the same type. This is where the third (optional) part of a variable name, the qualifier, comes in. A qualifier is a short descriptive word or words (or reasonable facsimile) that describes HOW the variable is used. Some kind of punctuation should be used to distinguish the qualifier from the constructor + tag portion. Typically this is done by making the first letter of the qualifier (or of each qualifier if you choose to use more than one word) upper-case.

The use of many variables will fall into the same basic categories, so there are several standard qualifiers:
QualifierDescription
FirstThe first element in a set. This will often be an index or a pointer (for example, pchFirst or iwFirst).
LastThe last element in a set. Both First and Last refer to valid values that are in a given set, and are often paired, such as in this sample C loop:
for (iw = iwFirst; iw <= iwLast; iw++)
    {
    ...
    }
MinThe first element in a set. Similar to First, but Min always refers to the first actual element in a set, while First may be used to indicate the first element actually dealt with (if you're working with a substring, for example).
MaxThe upper limit in a set. This is NOT a valid value; xMax is usually equivalent to xLast + 1. The above example loop could also be written as
for (iw = iwFirst; iw < iwMax; iw++)
    {
    ...
    }
I've also seen Lim used to indicate the limit in much the same manner.
MacThe current upper limit in a set. This is similar to Max, but is used where the upper limit can vary (for example, a variable length structure).
SavA temporary saved value; usually used when temporarily modifying variables that you want to restore later.
TA temporary variable -- one which will be used quickly in a given context and then disposed of or reused.
SrcA source. This is usually paired with Dest in copy/transfer operations.
DestA destination. This is usually paired with Src.

Structures and structure members

Structures are usually by definition their own types, so a given structure usually defines its own tag (for example, the dbr I used earlier in this document).

Structure members should simply be named the same way variables are. Since the context is usually only within the structure itself, name conflicts are less likely, so qualifiers are often not as necessary. If you have multiple instances of a variable type within the structure, you'll still need the qualifiers, of course (for example, if you're creating a structure containing name and address string records, you could name them szName and szAddress). If the language does not support seperate contexts for each structure (I think the Microsoft Macro Assembler (MASM) falls into this category, but I haven't worked with it in a few years), then the structure name is appended to the name of the member as a qualifier. (So the examples given above might be named szNameDbr and szAddressDbr if these fields appeared in the dbr structure.)

Procedures

The simple rules for naming variables don't always work quite as well for procedures. This is because specifying what the procedure actually does is important, and many procedures won't have a return value. Also, the context for procedures is usually the entire program, so you have more chance for naming conflicts. To handle these problems, a few modifications are made to the rules:
  1. Procedure names are distinguished from variable names by using some punctuation -- for example, function names have the first letter capitalized while variable names begin with lowercase letters.
  2. If the procedure explicitly returns a value (as opposed to implicitly returning one through a variable argument), then the procedure name will begin with the type of value that is returned.
  3. If the procedure is a true function (that is, it operates on its parameters and returns a value with no side-effects), then it is typical to name it XFromYZ..., where X is the type returned and Y, Z, etc., are the types of the parameters. For example, DxFromWnd(hwnd) (or possibly DxFromHwnd(hwnd) if you really want to be specific) could be used for a function that returns the width of a window.
  4. If the procedure has side-effects, then follow the type (if any) with a few words that describe what the procedure does. Each word should be capitalized. For example, FTryMove() could be used to indicate a procedure that checks the validity of a move (in a game, for instance), and returns a boolean value (true/false) to indicate if the move is valid.
  5. If the procedure operates on an object, the type of the object should be appended to the name. For example, InitFoo(pfoo) could indicate a procedure that initializes a structure foo (or more accurately in this case, a structure foo that the procedure is given a pointer to).

Macros and constants

Macros are usually handled the same way as procedures. Constants may be handled as variables (such as fTrue and fFalse), although you'll often see constants defined in all upper-case (IWFIRST, for example). Some people will use underscores to seperate parts of a constant name if they capitalize them (I_W_FIRST). If I remember correctly, this capitaliztion is NOT really a part of "proper" Hungarian, but I use it myself to distinguish between constants and variables.

Labels

If you need a label for some reason, it can be considered to be a variation on a procedure -- labels are effectively identifiers specifying a piece of code. Since labels don't take parameters or return a value, no types are specified. EndLoop or OutOfMem are typical examples.
Click here to go back.

Congratulations. You've found the one page that some people might consider semi-useful, and even then only a small segment of the software development community. If even that much...

Greg Legowski, gregleg@telerama.lm.com
email -- For when it absolutely has to get lost at the speed of light.

This World Wide Web server is a service of Telerama Public Access Internet.

Last modified: Wed Sep 18 10:54:54 1996