Skip to content

SCANL/identifier_name_structure_catalogue

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 

Repository files navigation

Identifier Naming Structure Catalogue

This README will catalogue common source code identifier naming structures, best practices, and semantics derived from research. The goal of this document is to act as a resource for researchers, students, and developers that want to learn about what is scientifically known about naming identifiers. We are currently looking into other types of identifier characteristics that should be included in this document. This is a living document, we will expand this as we discover more patterns and characteristics through our, and possibly others', research. Check back periodically for more information!

This document is broken down into the following sections:

  • Linguistic Terminology used throughout the document.

  • Part-of-speech Tagset used throughout the document.

  • Common Naming Structures derived by analyzing identifier names and deriving part-of-speech sequences called grammar patterns. This section discusses common identifier naming patterns and their meaning.

  • Linguistic Antipatterns, which are recurring, detrimental practices in the naming, documentation, and/or choice of identifier. In this section we provide the antipattern name, a definition, an example, and several options for resolving the antipattern.

  • Naming Styles, which are practices that dictate how identifiers should be lexically formed. The three most common naming styles: camelCase, under_score, and PascalCase are pivotal to developer comprehension.

Linguistic Terminology

First you should be familiar with some simple linguistic concepts.

Linguistic-terminology Definition
Head-noun The right-most noun in a noun phrase is typically referred to as a head-noun. This noun is the word that most-closely embodies the concept that represents the in-memory entity that the identifier is used to describe.
Noun-adjunct Noun-adjuncts are defined as a noun acting as (i.e., being used as) an adjective. These are found in certain types of compound-words which, in English, are often groups of two-or-more words separated by a dash. For example, in the word employee-name, 'employee' is a noun-adjunct and 'name' is a noun (or, more specifically, a head-noun).
Hypernym A word with a broad meaning that more specific words fall under; a superordinate. For example, color is a hypernym of red. Definition from Oxford Languages
Hyponym a word of more specific meaning than a general or superordinate term applicable to it. For example, spoon is a hyponym of cutlery. Definition from Oxford Languages

Tagset

The tagset that we use is a subset of Penn treebank. Each of our annotations and an example can be found below. Further examples and definitions can be found in the paper [1]

Abbreviation Expanded Form Examples
N noun Disneyland, shoe, faucet, mother, bedroom
DT determiner the, this, that, these, those, which
CJ conjunction and, for, nor, but, or, yet, so
P preposition behind, in front of, at, under, beside, above, beneath, despite
NPL noun plural streets, cities, cars, people, lists, items, elements.
NM noun modifier (adjective) red, cold, hot, scary, beautiful, happy, faster, small
NM noun modifier (noun-adjunct italicized) employeeName, filePath, fontSize, userId
V verb run, jump, drive, spin
VM verb modifier (adverb) very, loudly, seriously, impatiently, badly
PR pronoun she, he, her, him, it, we, us, they, them, I, me, you
D digit 1, 2, 10, 4.12, 0xAF
PRE preamble (e.g., Hungarian) Gimp, GLEW, GL, G, p_, m_, b_

Common naming patterns and their definition

The grammar patterns below represent different naming structures found in source code; they are represented by sequences of part-of-speech tags. The patterns we present are all empirically derived from a manually-tagged sample of 1,335 identifiers. Refer to Newman et al [1] for more information. The manually tagged dataset is freely available here.

We present each pattern, a definition for the pattern, and examples of the pattern below. We use regular expression synax, where the * symbol means "zero or more" while the + symbol means "one or more" of the token.

Grammar_Pattern_sequences Definition
NM* N Noun Phrase: Zero or more noun-modifiers appear to the left of a head-noun. Noun-modifiers that appear before the head-noun act as a way to specialize our understanding of the head-noun by taking the general concept the head-noun represents and reducing it to a more concise, specific concept. For example, in the identifier 'issueDescription' the head-noun is 'Description', which is the general concept. The noun-adjunct, 'issue', specializes our understanding of the 'Description' by specifying what kind of 'Description' we are talking about.

It is good practice to be careful in the choice, and number, of noun-modifiers to use before the head-noun. A good identifier will include only enough noun-modifiers to concisely define the concept represented by the head-noun.

This is the most common naming pattern for identifiers that are not function names.

Here are some examples that follow this pattern:

Examples
Identifier NameGrammar Pattern
 GtkWidget *selection_width_label; 
NM NM N
 int dynamic_Table_Index;
NM NM N
 ReadBufferOperation *read_Operation;
NM N
NM* NPL Plural noun phrase: This is identical to Noun Phrase (NM* N), except the head-noun is plural. The plural is often purposeful in that the head-noun's plurality expresses the multiplicity of the data. That is, these identifiers (when they are not function names) are more likely to have a collection data type [1].

Some naming conventions (e.g., the Java naming standard) generally consider it good practice to match the plurality of the identifier with whether its type represents a singular or collection object.

Identifiers that follow this pattern are usually not function names.

Here are some examples that follow this pattern:

Examples
Identifier NameGrammar Pattern
 int training_examples; 
NM NPL
 String[] method_Name_Prefixes;
NM NM NPL
 vector<Handle<AbcdAtmVolCurve>> curves;
NPL
V NM* (N|NPL) Verb Phrase: The addition of a verb to a noun phrase creates a verb phrase. The verb in a verb phrase is an action being applied to (or with) the concept embodied by the noun phrase that follows. In some cases, instead of being an action, the verb is an existential quantifier. In this case, the identifier's data type is probably (interpretable as) Boolean.

These are typically either function identifiers or identifiers with a boolean type.

Here are some examples that follow this pattern:

Examples
Identifier NameGrammar Pattern
 bool create_metadata_array(); 
V NM N
 bool is_First_frame;
V NM N
 int create_Duplicate_Change_Id();
V NM NM N
P NM* (N|NPL) Prepositonal phrase: A noun or verb-phrase with a leading preposition is a prepositional phrase. The preposition in a prepositional phrase typically explains how the entity (or entities) represented by the accompanying noun or verb-phrase are related in terms of order, space, time (e.g., on_enter), ownership, causality, or representation (e.g., to_string). In the case of this specific grammar pattern, there is oftentimes an un-specified verb on the left-hand-side of the preposition.

The un-specified verb is usually an action such as the following: GET, CONVERT (e.g., to string), EXECUTE (e.g., on enter) or some other action. Developers understand the implied action because of experience or domain knowledge, for example, understanding the implied verb in event-driven functions beginning with the preposition 'on'. There may also be noun-phrase to the left of the preposition. We discuss these in another grammar pattern below.

This pattern is used in many types of identifiers whether they are function names or otherwise.

Here are some examples that follow this pattern:

Examples
Identifier NameGrammar Pattern
 ModelSettings& with_Market_Rate_Accuracy(); 
P NM NM N
 btVector3 from_Local_Aabb_Min;
P NM NM N
 String to_string();
P N
NM* N P NM* (N|NPL) Prepositional phrase with leading noun phrase: Sometimes a noun phrase is explicitly present on both the left and right of the preposition. When the left-hand-side noun-phrase is specified, there is an explicit relationship between the left- and right-hand side noun-phrases. This relationship is expressed through the preposition. The preposition helps us understand how the entity (or entities) represented by both noun-phrases are related in terms of order, space, time (e.g., generated_token_on_creation), ownership (e.g., scroll_id_for_node), causality, or representation (e.g., url_from_json, query_timeout_in_milliseconds).

This pattern is used in many types of identifiers whether they are function names or otherwise.

Here are some examples that follow this pattern:

Examples
Identifier NameGrammar Pattern
 String generated_Token_On_Creation; 
V N P N
 long query_Timeout_In_Milliseconds; 
NM N P NPL
 class Scroll_Id_For_Node;
NM N P N
 HttpUrl url_from_json();
N P N
V P NM* (N|NPL) Prepositional phrase with leading verb: Same as prepositional phrase pattern but the leading verb, or verb phrase, is specified this time. As before, the preposition helps us understand how the entity (or entities) represented by the verb- and noun-phrases are related in terms of order, space, time, ownership, causality (e.g., destroy_with_parent), or representation (e.g., save_as_quadratic_png, tessellate_to_mesh, convert_to_php_namespace).

The usage of this pattern is similar to when the verb is implicit. There may still be an implicit noun phrase to the right of the verb and to the left of the preposition.

This pattern is used in many types of identifiers whether they are function names or otherwise.

Here are some examples that follow this pattern:

Examples
Identifier NameGrammar Pattern
 gboolean destroy_with_parent; 
V P N
 string convert_to_php_namespace(); 
V P NM N
 void tessellate_To_Mesh();
V P N
 void save_As_Quadratic_Png();
V P NM N
V* DT NM* (N|NPL) Noun phrase with leading determiner: The addition of a determiner tells us how much of the population, which is specified by the noun-phrase, is represented, or acted on, by the identifier.

Typically, the determiner will tell us that we are interested in ALL, ANY, ONE, A, THE, SEVERAL, etc., of the population of objects specified by the noun phrase. If there is a leading verb, the verb specifies an action to take on the population or it represents existential quantification (e.g., matchesAnyParentCategories).

This pattern is used in many types of identifiers whether they are function names or otherwise.

Here are some examples that follow this pattern:

Examples
Identifier NameGrammar Pattern
 List<int> all_invocation_matchers; 
DT NM NPL
 String[] all_Open_Indices; 
DT NM NPL
 int is_a_empty;
V DT N
 boolean matches_any_parent_categories();
V DT NM NPL
V+ Verb sequence: One or more verbs with no noun phrase. Because these are missing a noun phrase to act upon (in contrast to the Verb Phrase pattern above), a larger population of these are likely generic functions like Sort (though more data/research is needed), which can act upon many different types of data and have different behaviors depending on the data being acted upon.

The noun phrase that this action (i.e., the verb) is applied to is implicit. That is, it is not present in the identifier name. Instead, the noun phrase is implied by the program context (e.g., it is represented by a this-pointer) or it is present in the function parameters. In some cases, these are boolean-type variables that may be missing an existential quantifier (e.g., add 'is' before 'parsing' to make it explicit)

These are typically function names or identifiers with a boolean type.

Here are some examples that follow this pattern:

Examples
Identifier NameGrammar Pattern
 void sort();
V
 void delete();
V
 void resume();
V
 bool *parsing;
V

Linguistic Antipatterns

Linguistic Antipatterns (LAs) in software systems are recurring, detrimental practices in the naming, documentation, and/or choice of identifier in the implementation of an entity; thus impairing program understanding. They were first discussed by Arnaoudova et al [2]. They typically take the form of an identifier name that incorrectly describes the behavior of the entity that it represents OR an entity that betrays the behavior conveyed linguistically by its corresponding identifier.

Name Definition and Example
Get more than accessor A getter that performs actions other than returning the corresponding attribute. Example: method getImageData which always returns a new object.
        ImageData getImageData(){
          final Point size = this.getSize();
          this.imageData = new ImageData(size.x, size.y, 8);
          return this.imageData;
        }
      
How to resolve:
  1. The method name should change so that it is not a getter or
  2. the implementation should be corrected to conform to standard get-method behavior
Is returns more than a Boolean The name of a method is a predicate suggesting a true/false value in return. However the return type is not Boolean but rather a more complex type thus allowing a wider range of values without documenting them. Example: method isValid with return type int.
        public int isValid(){
            final long currentTime = System.currentTimeMillis();
            if (currentTime <= this.expires) {
              // The delay has not passed yet -
              // assuming source is valid.
              return SourceValidity.VALID;
            }
          // The delay has passed, prepare for the next interval.
          this.expires = currentTime + this.delay;
          return this.delegate.isValid();
        }
      
How to resolve:
  1. The type should be changed to boolean to reflect the function's behavior as a binary predicate.
  2. Consider changing the name such that it does not imply a yes/no question and provides some indication of n-ary return values.
  3. Carfully document the meaning of each value that can be returned. Thoroughly test each value.
Set method returns A set method having a return type different than void without proper documentation of the return type/values. Example: method setBreadth has a non-void return type.
      public Dimension setBreadth(final Dimension target, final int source) {
        if (this.orientation == Orientation.VERTICAL) {
          return new Dimension(source, (int) target.getHeight());
        } else {
          return new Dimension((int) target.getWidth(), source);
        }
      }
    
How to resolve:
  1. The word set, when used in this manner, has a specific definition in the programming domain. Consider using a different term, such as change.
  2. Correct the implementation such that it works like a stereotypical set method (i.e., void return, mutates a class attribute)
  3. Carefully document the reasoning behind using set while also returning a value
Expecting but not getting single instance The name of a method indicates that a single object is returned but the return type is a collection. Example: method getExpansion, which ends with a head-noun that is singular, but returns a List object.
        /**
          * Returns the expansion state for a tree.
          *
          * @return the expansion state for a tree
        */
        public List getExpansion() {
          return this.fExpansion;
        }
      
How to resolve:
  1. Correct the method name so that it is plural-- getExpansions()
Not implemented condition The comments of a method suggest a conditional behavior that is not implemented in the code. When the implementation is default this should be documented. Example: method getChildren has a comment which indicates there should be a conditional within its body.
        /**
        * Returns the children of this object. When this object is
        * displayed in a tree, the returned objects will be this
        * element's children. Returns an empty array if this object
        * has no children.
        *
        * @param object The object to get the children for.
        */
        public Object[] getChildren(final Object o) {
          return new Object[0];
        }
      
How to resolve:
  1. Complete implementation of the method
  2. Document (i.e., update the comment) that the method is incomplete and does not implement the behavior indicated in its comment
Validation method does not confirm A validation method (e.g., name starting with "validate", "check", "ensure") does not confirm the validation, i.e., the method neither provides a return value informing whether the validation was successful, nor documents how to proceed to understand. Example: method checkCollision returns void despite indicating that it is designed to perform validation.
        public void checkCollision(final String before,
                                   final String after) {
          final boolean collision = before != null
              && before.equals(this._shortName) || after != null
              && after.equals(this._shortName);
          if (collision) {
            if (this._longName == null) {
              this._longName = this.getLongName();
            }
            this. _displayName = this._longName;
          }
        }
      
How to resolve:
  1. Change method to return confirmation (i.e., true or false)
  2. Consider changing the name to avoid implication of validation behavior (i.e., avoid terms like check and is)
  3. If the previous options are not available then thoroughly document method behavior, consider highlighting irregular validation behavior
Get method does not return The name suggests that the method returns something (e.g., name starts with "get" or "return") but the return type is void. The documentation should explain where the resulting data is stored and how to obtain it. Example: method getMethodBodies has a void return type but its name indicates that it is a getter method.
      protected void getMethodBodies(
        final CompilationUnitDeclaration unit,
        final int place) {
          //[Removed some code for conciseness]
          this.parser.scanner
            .setSourceBuffer(
              unit.compilationResult.compilationUnit
              .getContents());
          if (unit.types != null) {
            for (int i = unit.types.length; --i >= 0;) {
              unit.types[i].parseMethod(this.parser, unit);
            }
          }
      }
    
How to resolve:
  1. Change method to return correct entity.
  2. Consider changing the name to avoid the word get
  3. If the previous options are not available then thoroughly document method behavior, consider highlighting irregular getter behavior
Not answered question The name of a method is in the form of predicate whereas the return type is not Boolean. Example: method isValid with a void return type.
        public void isValid(final Object[] selection,
                            final StatusInfo res) {
          // only single selection
          if (selection.length == 1
              && selection[0] instanceof IFile) { 
              res.setOK();
          } else {
              res.setError(""); //$NON-NLS-1$
          }
        }
      
  1. Change method to return correct entity.
  2. Consider changing the name to avoid the word get
  3. If the previous options are not available then thoroughly document method behavior, consider highlighting irregular getter behavior
Transform method does not return The name of a method suggests the transformation of an object but there is no return value and it is not clear from the documentation where the result is stored. Example: method javaToNative has a void return type but indicates that it performs a transformation (i.e., type conversion).
        public void javaToNative(final Object object,
                                 final TransferData transferData) {
          final byte[] check =
              LocalSelectionTransfer.TYPE_NAME.getBytes();
          super.javaToNative(check, transferData);
        }
      
  1. Change method to return correct entity.
  2. If the previous option is not available then thoroughly document method behavior, consider highlighting irregular transformation behavior
Expecting but not getting a collection The name of a method suggests that a collection should be returned but a single object or nothing is returned. Example: method getStats with a Boolean return type; making it difficult to understand the reason behind the plurality of the method name.
        public boolean getStats() {
          return SAXParserBase._stats;
        }
      
  1. Change the name of the method (and any related identifier names) so that it is singular instead of plural
Method name and return type are opposite The intent of the method suggested by its name is in contradiction with what it returns. Example: method disable with return type ControlEnableState. The words "disable" and "enable" having opposite meanings.
        public static ControlEnableState disable(Control w) {
          return new ControlEnableState(w);
        }
      
  1. Change method name so that it aligns better with the return type (i.e., change disable to enable)
  2. Change type name to align better with method name (i.e., to ControlDisableState)
Method signature and comment are opposite The documentation of a method is in contradiction with its declaration. Example: method isNavigateForwardEnabled is in contradiction with its comment documenting "a back navigation", as "forward" and "back" are antonyms
        /**
        *	Returns true if this listener has a target for a
        *	back navigation. Only one listener needs to return
        *	true for the back button to be enabled.
        */
        public boolean isNavigateForwardEnabled() {
          boolean enabled = false;
          if (this._isForwardEnabled == 1) { 
            enabled = true;
          } else {
            if (this._isForwardEnabled != 0) { enabled =
              this.navigateForward(false) != null;
            }
          }
          return enabled;
        }
      
  1. Change the comment to specify that this method is for forward navigation
Says one but contains many The name of an attribute suggests a single instance, while its type suggests that the attribute stores a collection of objects. Example: attribute _target that is of type Vector. It is unclear whether a change aspects one or multiple instances in the collection.
        Vector _target;
      
  1. Change the identifier name to reflect plurality of its type (i.e., _target -> _targets)
Name suggests boolean but type is not The name of an attribute suggests that its value is true or false, but its declaring type is not Boolean. Example: attribute isReached that is of type int[] where the declared type and values are not documented.
        int[] isReached;
      
  1. Change the name of the identifier to be more descriptive with respect to what kind of array it represents.
  2. Consider removing the word is and using a different term unless the array represents a sequence of appropriate (i.e., boolean-like) values
  3. If appropriate, consider using a boolean array
  4. Carefully document the data represented by the array, including the reasoning for its integer type and whether different integer values have different meanings
Says many but contains one The name of an attribute suggests multiple instances, but its type suggests a single one. Example: attribute stats that is of type Boolean. Documenting such inconsistencies avoids additional comprehension effort to understand the purpose of the attribute.
        private static boolean _stats = true;
      
  1. Change identifier name to singular instead of plural
Attribute name and type are opposite The name of an attribute is in contradiction with its type as they contain antonyms. Example: attribute start that is of type MAssociationEnd. The use of antonyms can induce wrong assumptions.
        MAssociationEnd start = null;
      
  1. Change identifier name to align with type name (i.e., change start to end).
Attribute signature and comment are opposite The declaration of an attribute is in contradiction with its documentation. Example: attribute INCLUDE_NAME_DEFAULT whose comment documents an "exclude pattern". Whether the pattern is included or excluded is thus unclear.
        /**
        *	Configuration default exclude pattern,
        *	ie .*\/@href|.*\/@action|frame/@src
        */
        public final static String INCLUDE_NAME_DEFAULT
          = ".*/@href=|.*/@action=|frame/@src=";
      
  1. Change identifier name to align with comment (i.e., include -> exclude)
  2. Change comment to align with method name (i.e., exclude -> include)

Naming Styles

Naming style concerns the lexical structure of an identifier name. The three most common naming styles are camelCase, under_score, and PascalCase. Prior research [3] found that camelCase and under_score do not significantly differ in terms of improving or degrading the comprehension abilities of developers as long as the developer had training or experience using the given style. It is worth noting that this same paper found that camelCase has a slight edge in terms of comprehension for shorter identifier names. This observation is supported by [4] and [5]. The importance of naming style was further emphasized in a study of developer opinions on identifier naming practices [6].

Because there has been no data to suggest that one naming style is better than the others, it is most important that development projects pick a naming style and remain consistent in the usage of that naming style throughout the code.

Naming Style Definition Example
camelCase The first letter of each word in an identifier, except the first word, is capitalized getFullName()
under_score An under_score is placed between each word in the identifier call_with_default()
PascalCase The first letter of each word in an identifier, including the first word, is capitalized. NewObject()
kebab-case This is a variant of under_score, used in languages that allow dashes (-) in identifier names, such as Lisp and Forth employee-name

References

  1. Christian D. Newman, Reem S. Alsuhaibani, Michael J. Decker, Anthony Peruma, Dishant Kaushik, Mohamed Wiem Mkaouer, Emily Hill, On the generation, structure, and semantics of grammar patterns in source code identifiers, Journal of Systems and Software, 2020, 110740, ISSN 0164-1212, https://doi.org/10.1016/j.jss.2020.110740. (http://www.sciencedirect.com/science/article/pii/S0164121220301680)

  2. Arnaoudova, V., Di Penta, M. & Antoniol, G. Linguistic antipatterns: what they are and how developers perceive them. Empir Software Eng., Vol 21, 104–158 (2016). https://doi.org/10.1007/s10664-014-9350-8

  3. Binkley, D., Davis, M., Lawrie, D. et al. The impact of identifier style on effort and comprehension. Empir Software Eng 18, 219–276 (2013). https://doi.org/10.1007/s10664-012-9201-4

  4. D. Binkley, M. Davis, D. Lawrie and C. Morrell, "To camelcase or under_score," 2009 IEEE 17th International Conference on Program Comprehension, 2009, pp. 158-167, doi: https://doi.org/10.1109/ICPC.2009.5090039.

  5. B. Sharif and J. I. Maletic, "An Eye Tracking Study on camelCase and under_score Identifier Styles," 2010 IEEE 18th International Conference on Program Comprehension, 2010, pp. 196-205, doi: https://doi.org/10.1109/ICPC.2010.41.

  6. R. S. Alsuhaibani, C. D. Newman, M. J. Decker, M. L. Collard and J. I. Maletic, "On the Naming of Methods: A Survey of Professional Developers," 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 2021, pp. 587-599, doi: https://doi.org/10.1109/ICSE43902.2021.00061.

Acknowledgements

This material is based in part upon work supported by the National Science Foundation under Grant No. 1850412.

Webpage

This page is currently supported by SCANL lab. If other research labs join this effort, we will put their webpages down here as well.

Contribute

If you are interested in correcting something in this document, make an issue! If you would like to add, or otherwise somehow contribute, or if you're just interested in our research and want to ask questions, please email: [email protected]