Introduction

edit

Data clumps refer to code smell in computer programming code, in which groups of data items that are related are always passed around together. They are often primitive values that ideally should have been converted into objects. They are often created due to bad program structuring or not following object oriented principles. They can also be a result of excessive use of "copy and paste programming".

Description

edit

Code refactoring is the process of improving the efficiency of code without affecting its functionality. An important part of this process is identifying code smells. According to Martin Fowler, "a code smell is a surface indication that usually corresponds to a deeper problem in the system". Data clumps is an example of a code smell in which two or three data items are passed around together in a program(eg. start and end variable or length,breadth and height of an object). Such recurrence of items leads to duplication of code. An element of abstraction is missing from the code making it difficult to understand.

Issues with Data Clumps & How to identify it

edit

There are a lot of downsides to not removing Data Clumps from the code, one of them being re-usability. Since Data Clumps refer to related data that goes together any change to one such data may not be reflected in other places of usage. For example the dimensions of an object : length, breadth etc are required to define an object and determine various properties such as area, parameter. Now if somehow we change the value of one of the dimensions we have to change it everywhere else where we are using it. But, if we had these dimensions packed in an object we would just call the Mutator method to do this and it will be reflected all over the system.

In some cases we can have 10-15 parameters that go together and passing them from one method to another just produces some ugly code. Such code is unreadable and hard to understand for the reader and can be a nightmare to maintain. A general rule of thumb to identify Data Clumps is that when you see a few parameters that are repeatedly being passed around in methods in groups, try to delete one of them and check whether the remaining parameters still make any sense or not, if they don't club them into a Data Clump.

Refactoring code having Data Clumps

edit

Extract Class

edit

In this refactoring technique, we break down a large class into multiple small classes. This results in maintaining of the single responsibility principle[cite]. Classes adhering to the single responsibility principle are reliable and tolerant to changes. Data clumps are avoided as the different data items can be passed as an object of the class.

Java Example

edit
class Person { private String name;
private String name;
//we have data elements that should belong to a phone number class.
private String officeAreaCode; 
private String officeNumber;

public String getOfficeAreaCode() { return officeAreaCode;
}
public void setOfficeAreaCode(String arg) { officeAreaCode = arg;
}
//other getters and setters
}
class TelephoneNumber{
    private String areaCode;
    private String number;
    
    //getters and setters
}

//Now we link this with the person class by creating an object of telephone number 
//and call getters and setters with the object

class Person { 
    private String name;
    
private TelephoneNumber telephone=new TelephoneNumber();

public String getAreaCode(){
    return telephone.getAreaCode();
}

//other getters and setters
...

Parameter Object

edit

Whenever we encounter repeating group of data items in the form of parameters it is better to pack them in an object. This can help in avoiding code duplication. As an example if we have an application that gets the information about a person and can be instantiated to refer to a person.

Java Example

edit
public Person createPerson( final String lastname, final String firstname, final String middlename, final String streetAddress, final String city, final String state)
{
....
}

// here instead of passing a long parameter list we can create 2 objects one belonging to Name and other Address

public Person createPerson( final Name fullname, final Address fulladdress)
{
....
}

// And the 2 classes can be defined as

public final class Name{
final String lastname;
final String firstname;
final String middlename;
....
}

public final class Address{
final String streetAddress;
final String city;
final String state;
....
}