Record (computer science)

In computer science, a record (also called a structure, struct, or compound data) is a basic data structure. Records in a database or spreadsheet are usually called "rows".[1][2][3][4]

A record is a collection of fields, possibly of different data types, typically in a fixed number and sequence.[5] The fields of a record may also be called members, particularly in object-oriented programming; fields may also be called elements, though this risks confusion with the elements of a collection.

For example, a date could be stored as a record containing a numeric year field, a month field represented as a string, and a numeric day-of-month field. A personnel record might contain a name, a salary, and a rank. A Circle record might contain a center and a radius—in this instance, the center itself might be represented as a point record containing x and y coordinates.

Records are distinguished from arrays by the fact that their number of fields is determined in the definition of the record, and by the fact the records are a heterogenous data type; not all of the fields must contain the same type of data.[6]

A record type is a data type that describes such values and variables. Most modern computer languages allow the programmer to define new record types. The definition includes specifying the data type of each field and an identifier (name or label) by which it can be accessed. In type theory, product types (with no field names) are generally preferred due to their simplicity, but proper record types are studied in languages such as System F-sub. Since type-theoretical records may contain first-class function-typed fields in addition to data, they can express many features of object-oriented programming.

Records can exist in any storage medium, including main memory and mass storage devices such as magnetic tapes or hard disks. Records are a fundamental component of most data structures, especially linked data structures. Many computer files are organized as arrays of logical records, often grouped into larger physical records or blocks for efficiency.

The parameters of a function or procedure can often be viewed as the fields of a record variable; and the arguments passed to that function can be viewed as a record value that gets assigned to that variable at the time of the call. Also, in the call stack that is often used to implement procedure calls, each entry is an activation record or call frame, containing the procedure parameters and local variables, the return address, and other internal fields.

An object in object-oriented language is essentially a record that contains procedures specialized to handle that record; and object types are an elaboration of record types. Indeed, in most object-oriented languages, records are just special cases of objects, and are known as plain old data structures (PODSs), to contrast with objects that use OO features.

A record can be viewed as the computer analog of a mathematical tuple, although a tuple may or may not be considered a record, and vice versa, depending on conventions and the specific programming language. In the same vein, a record type can be viewed as the computer language analog of the Cartesian product of two or more mathematical sets, or the implementation of an abstract product type in a specific language.

Keys edit

A record may have zero or more keys. A key maps an expression to a value, or a set of values, in the record. A primary key is unique throughout all stored records; only one of this key exists.[7] In other words, no duplicate may exist for any primary key. For example an employee file might contain employee number, name, department, and salary. The employee number will be unique in the organization and would be the primary key. Depending on the storage medium and file organization the employee number might be indexed—that is also stored in a separate file to make lookup faster. The department code is not necessarily unique; it may also be indexed, in which case it would be considered a secondary key, or alternate key.[8] If it is not indexed the entire employee file would have to be scanned to produce a listing of all employees in a specific department. Keys are usually chosen in a way that minimizes the chances of multiple values being feasibly mapped to by one key. For example, the salary field would not normally be considered usable as a key since many employees will likely have the same salary. Indexing is one factor considered when designing a file.

History edit

Journal sheet from 1880 United States Census, showing tabular data with rows of data, each a record corresponding to a single person.

The concept of a record can be traced to various types of tables and ledgers used in accounting since remote times. The modern notion of records in computer science, with fields of well-defined type and size, was already implicit in 19th century mechanical calculators, such as Babbage's Analytical Engine.[9][10]

Hollerith punched card (1895)

The original machine-readable medium used for data (as opposed to control) was punch card used for records in the 1890 United States Census: each punch card was a single record. Compare the journal entry from 1880 and the punch card from 1895. Records were well-established in the first half of the 20th century, when most data processing was done using punched cards. Typically each record of a data file would be recorded in one punched card, with specific columns assigned to specific fields. Generally, a record was the smallest unit that could be read in from external storage (e.g. card reader, tape or disk). The contents of punchcard-style records were originally called "unit records" because since punchcards had pre-determined document lengths.[11] When storage systems became more advanced with the use of hard drives and magnetic tape, variable-length records became the standard. A variable-length record is a record in which the size of the record in bytes is approximately equal to the sum of the sizes of its fields. This was not possible to do before more advanced storage hardware was invented because all of the punchcards had to conform to pre-determined document lengths that the computer could read, since at the time the cards had to be physically fed into a machine.

Most machine language implementations and early assembly languages did not have special syntax for records, but the concept was available (and extensively used) through the use of index registers, indirect addressing, and self-modifying code. Some early computers, such as the IBM 1620, had hardware support for delimiting records and fields, and special instructions for copying such records.

The concept of records and fields was central in some early file sorting and tabulating utilities, such as IBM's Report Program Generator (RPG).

COBOL was the first widespread programming language to support record types,[12] and its record definition facilities were quite sophisticated at the time. The language allows for the definition of nested records with alphanumeric, integer, and fractional fields of arbitrary size and precision, as well as fields that automatically format any value assigned to them (e.g., insertion of currency signs, decimal points, and digit group separators). Each file is associated with a record variable where data is read into or written from. COBOL also provides a MOVE CORRESPONDING statement that assigns corresponding fields of two records according to their names.

The early languages developed for numeric computing, such as FORTRAN (up to FORTRAN IV) and Algol 60, did not have support for record types; but later versions of those languages, such as FORTRAN 77 and Algol 68 did add them. The original Lisp programming language too was lacking records (except for the built-in cons cell), but its S-expressions provided an adequate surrogate. The Pascal programming language was one of the first languages to fully integrate record types with other basic types into a logically consistent type system. The PL/I programming language provided for COBOL-style records. The C programming language initially provided the record concept as a kind of template (struct) that could be laid on top of a memory area, rather than a true record data type. The latter were provided eventually (by the typedef declaration), but the two concepts are still distinct in the language. Most languages designed after Pascal (such as Ada, Modula, and Java), also supported records.

Although records are not often used in their original context anymore (i.e. being used solely for the purpose of containing data), records influenced newer object oriented programming languages and relational database management systems. Since records provided more modularity in the way data was stored and handled, they are better suited at representing complex, real-world concepts than the primitive data types provided by default in languages. This influenced later languages such as C++, Python, JavaScript, and Objective-C which address the same modularity concerns of programmers.[13] Objects in these languages are essentially records with the addition of methods and inheritance, which allow programmers to manipulate the way data behaves instead of only the contents of a record. Many programmers regard records as obsolete now since object-oriented languages have features that far surpass what records are capable of. On the other hand, many programmers argue that the low overhead and ability to use records in assembly language make records still relevant when programming with low levels of abstraction. Today, the most popular languages on the TIOBE index, an indicator of the popularity of programming languages, have been influenced in some way by records due to the fact that they are object oriented.[14] Query languages such as SQL and Object Query Language were also influenced by the concept of records. These languages allow the programmer to store sets of data, which are essentially records, in tables.[15] This data can then be retrieved using a primary key. The tables themselves are also records which may have a foreign key: a key that references data in another table.   

Operations edit

  • Declaration of a new record type, including the position, type, and (possibly) name of each field;
  • Declaration of variables and values as having a given record type;
  • Construction of a record value from given field values and (sometimes) with given field names;
  • Selection of a field of a record with an explicit name;
  • Assignment of a record value to a record variable;
  • Comparison of two records for equality;
  • Computation of a standard hash value for the record.

The selection of a field from a record value yields a value.

Some languages may provide facilities that enumerate all fields of a record, or at least the fields that are references. This facility is needed to implement certain services such as debuggers, garbage collectors, and serialization. It requires some degree of type polymorphism.

In systems with record subtyping, operations on values of record type may also include:

  • Adding a new field to a record, setting the value of the new field.
  • Removing a field from a record.

In such settings, a specific record type implies that a specific set of fields are present, but values of that type may contain additional fields. A record with fields x, y, and z would thus belong to the type of records with fields x and y, as would a record with fields x, y, and r. The rationale is that passing an (x,y,z) record to a function that expects an (x,y) record as argument should work, since that function will find all the fields it requires within the record. Many ways of practically implementing records in programming languages would have trouble with allowing such variability, but the matter is a central characteristic of record types in more theoretical contexts.

Assignment and comparison edit

Most languages allow assignment between records that have exactly the same record type (including same field types and names, in the same order). Depending on the language, however, two record data types defined separately may be regarded as distinct types even if they have exactly the same fields.

Some languages may also allow assignment between records whose fields have different names, matching each field value with the corresponding field variable by their positions within the record; so that, for example, a complex number with fields called real and imag can be assigned to a 2D point record variable with fields X and Y. In this alternative, the two operands are still required to have the same sequence of field types. Some languages may also require that corresponding types have the same size and encoding as well, so that the whole record can be assigned as an uninterpreted bit string. Other languages may be more flexible in this regard, and require only that each value field can be legally assigned to the corresponding variable field; so that, for example, a short integer field can be assigned to a long integer field, or vice versa.

Other languages (such as COBOL) may match fields and values by their names, rather than positions.

These same possibilities apply to the comparison of two record values for equality. Some languages may also allow order comparisons ('<'and '>'), using the lexicographic order based on the comparison of individual fields.[citation needed]

PL/I allows both of the preceding types of assignment, and also allows structure expressions, such as a = a+1; where "a" is a record, or structure in PL/I terminology.

Algol 68's distributive field selection edit

In Algol 68, if Pts was an array of records, each with integer fields X and Y, one could write Y of Pts to obtain an array of integers, consisting of the Y fields of all the elements of Pts. As a result, the statements Y of Pts[3] := 7 and (Y of Pts)[3] := 7 would have the same effect.

Pascal's "with" statement edit

In the Pascal programming language, the command with R do S would execute the command sequence S as if all the fields of record R had been declared as variables. Similarly to entering a different namespace in an object-oriented language like C#, it is no longer necessary to use the record name as a prefix to access the fields. So, instead of writing Pt.X := 5; Pt.Y := Pt.X + 3 one could write with Pt do begin X := 5; Y := X + 3 end.

Representation in memory edit

The representation of records in memory varies depending on the programming languages. Usually the fields are stored in consecutive positions in memory, in the same order as they are declared in the record type. This may result in two or more fields stored into the same word of memory; indeed, this feature is often used in systems programming to access specific bits of a word. On the other hand, most compilers will add padding fields, mostly invisible to the programmer, in order to comply with alignment constraints imposed by the machine—say, that a floating point field must occupy a single word.

Some languages may implement a record as an array of addresses pointing to the fields (and, possibly, to their names and/or types). Objects in object-oriented languages are often implemented in rather complicated ways, especially in languages that allow multiple class inheritance.

Self-defining records edit

A self-defining record is a type of record which contains information to identify the record type and to locate information within the record. It may contain the offsets of elements; the elements can therefore be stored in any order or may be omitted.[16] The information stored in a self-defining record can be interpreted as metadata for the record, which is similar to what one would expect to find in the UNIX metadata regarding a file, containing information such as the record's creation time and the size of the record in bytes. Alternatively, various elements of the record, each including an element identifier, can simply follow one another in any order.

See also edit

References edit

  1. ^ "Computer Science Dictionary Definitions". Computing Students. Retrieved Jan 22, 2018.
  2. ^ Radványi, Tibor (2014). Database Management Systems. Eszterházy Károly College. p. 19. Archived from the original on 2018-09-23. Retrieved 23 September 2018.
  3. ^ Kahate, Atul (2006). Introduction to Database Management Systems. Pearson. p. 3. ISBN 978-81-317-0078-5. Retrieved 23 September 2018.
  4. ^ Connolly, Thomas (2004). Database Solutions: A Step by Step Guide to Building Databases (2nd ed.). Pearson. p. 7. ISBN 978-0-321-17350-8.
  5. ^ Felleisen, Matthias (2001). How To Design Programs. MIT Press. pp. 53, 60. ISBN 978-0262062183.
  6. ^ Pape, Tobias; Kirilichev, Vasily; Bolz, Carl Friedrich; Hirschfeld, Robert (2017-01-13). "Record data structures in racket: usage analysis and optimization". ACM SIGAPP Applied Computing Review. 16 (4): 25–37. doi:10.1145/3040575.3040578. ISSN 1559-6915. S2CID 14306162.
  7. ^ "Add or change a table's primary key in Access". Retrieved 2022-03-01.
  8. ^ "Alternate key - Oracle FAQ". Retrieved 2022-03-01.
  9. ^ Bromley, Allan (October 1998). "Charles Babbage's Analytical Engine, 1838". IEEE Annals of the History of Computing. 20 (4): 29–45. doi:10.1109/85.728228. S2CID 2285332. Retrieved 23 September 2018.
  10. ^ Swade, Doron. "Automatic Computation: Charles Babbage and Computational Method". The Rutherford Journal. Retrieved 23 September 2018.
  11. ^ Edwin D. Reilly; Anthony Ralston; David Hemmendinger, eds. (2003). Encyclopedia of computer science (4th ed.). Chichester, UK: Wiley. ISBN 978-1-84972-160-8. OCLC 436846454.
  12. ^ Sebesta, Robert W. (1996). Concepts of Programming Languages (Third ed.). Addison-Wesley Publishing Company, Inc. p. 218. ISBN 0-8053-7133-8.
  13. ^ Leavens, Gary T.; Weihl, William E. (1990). "Reasoning about object-oriented programs that use subtypes". Proceedings of the European conference on object-oriented programming on Object-oriented programming systems, languages, and applications - OOPSLA/ECOOP '90. New York, New York, USA: ACM Press. pp. 212–223. doi:10.1145/97945.97970. ISBN 0-201-52430-9. S2CID 46526.
  14. ^ "index | TIOBE - The Software Quality Company". Retrieved 2022-03-01.
  15. ^ "What is a Relational Database (RDBMS)?". Oracle. Retrieved February 28, 2022.
  16. ^ Kraimer, Martin R. "EPICS Input / Output Controller (IOC) Application Developer's Guide". Argonne National Laboratory. Retrieved November 25, 2015.