The following is actually a commit message that i wrote for a bunch of refactoring i did on our code base (propreitary), to explain to our developers on some of the readability concerns with source code. After writing i realized that this would make up a good blog post because apart from the filenames (which i masked) the message is generic and is applicable elsewhere.

Writing for readability

Most of the refactoring done as part of this commit is an attempt to emphasise on writing code for readability. Readability comes with elegance and simplicity. Elegance and simplicity improve the readability and understanding of the code which improves quality.

Let's look into the things that should be considered while writing code or refactoring code. For learning purposes, i am going to write the list of files that are refactored and different sections in this article you should read to understand the reason behind the changes in the file.

Now go through the changes in the following file,

<filename1>.c#

  • Read Column Width Section
  • Read Double Negation Section
  • Read Progressively discard assumptions Section
  • Read Write self-expressive code

<filename2>.c#

  • Read Partial Classes
  • Read Writing self-expressive code
  • Read Naming Identifiers
  • Read Logging Metadata

filename3>.c#

  • Read Logging Metadata

<filename4>.c#

  • Read Test Content is not Test

Column Width

I see that a line of code goes beyond 80 characters. This is not helpful for readability. Every line of code should be atmost 80 characters (or closer). This is followed not just in code but in text books as well for improving reading ergonomics. The first ever IBM machine accepted punched cards that are 80 characters length. (although there were other that were lesser)

But why does the code span beyond 80 characters. Think of it !

  • Possibly you are too nested into a if /else block or a loop, or both :-), so a simple assignment is beyond 80 characters width of the screen.
  • You are consolidating more than one statement into one. This is very common, and must be avoided. Take for example the following line of code

var configuration = context.HttpContext.RequestServices.GetRequiredService<IConfiguration>();

Particularly with OOP languages, we spend a lot of time on objects IO than computation. I would say almost all the time. So it's very important that we handle assignments and object access. The above line should be split into multiple (or) at least 2 parts. 1) to get the object that gives the functionality 2) invoking the functionality of the object. This way you know whether a failure is with the accessor (or) the functionality. Most importantly readability.

Another example
if (!context.HttpContext.Request.Headers.TryGetValue(configuration["AppSettings:HeaderName"], out var potentialHeaderValue) || !configuration["AppSettings:HeaderValue"].Equals(potentialHeaderValue))

Again the issue here is trying to do everything within the conditional statement. There is nothing we will gain out of it. Even if you split this class into multiple statements, the compiler will figure out how to optimize them. Read this article by Eric Lippert on a gist of different things the compiler does. Also refer this if you are interested. You should be.

Double Negation

if(not(not available) || not something == other)
{
  do something
}

Double negatives are just plain confusing. So kill them on sight.Only use positive conditions. Read this article from Martin Fowler.

Why we have double negatives.

There are a few cases. I will try to list some of them.

  • Your condition depends on another condition that is opposite of what you want to evaluate to .
    For example, let's say you want to do something, if an object doesn't exist in a collection. And you don't have to do anything if the object exists, you possibly end up in a double negative condition. It is perfectly fine to do the following (building on top of the above if statement)
if(available)
{
  break / continue / return;
}
else
{
  do what you want to do `
}

A statement like the above is perfectly fine.

Progressively discard assumptions

This is particularly about nested conditions and writing the whole function or most of it inside the function. This reduces readability and forces the developer to keep something or the other in mind, that would he would not have to remember if conditions were discarded progressively. Example

function someFunction()
{
  if(condition == true)
  {
    //some line of code
    if(some other condition)
    {
        //some more lines of code
    }
  }
}	

In the above program consider discarding conditions instead

function someFunction()
{
  if(condition == false)
     return;
  //some line of code
  if(some other condition)
    return;
  //some more lines of code
}	

This way we don't need to remember (or) account a lot till the end of the function, improves readability and also fits statements into 80 character width because of less indentation.

Write Self Expressive Code

Self expressive code is about naming the identifiers properly and expressing logic into functions that are atomic and clearly articulate the purpose. The idea behind this is to understand the code easily at a glance, against reading it fully to understand the purpose. If we are concerned about the verbosity and it's implications on performance, we don't need to. The functions that are expressed separately are inlined according to the expression and they are optimized before they run in production.
Let's take an example

if(header == null || headers[getconfig(name) == null || headers[getconfig(name)] != expected value))
{
    this.Result  = new UnauthorizedResult();
    return;
}

In the above code, if we read it for few minutes we can understand what it's trying to do. It verifies the incoming header and it's value against the configuration and if they don't match it changes the state of the system to Unauthorised result.

But how about hiding all of that inside something like

if(IsAuthenticationFailure())
{
    this.Result  = new UnauthorizedResult();
    return;
}

private bool IsAuthenticationFailure()
{
    return 
      header == null || 
      headers[getconfig(name) == null || 
      headers[getconfig(name)] != expected value)
}

The only difference here is that if i read the first half of the code the idea will be clear. Someone doesn't really need to understand the intrinsics of it unless it has a bug.

Naming Identifiers

Variable names should not be redundant when read in a sentence. For example
IFileContentGenerator fileContentGenerator = new FileContentGenerator();
This is totally wrong. The golden rule is that we don't repeat when read as a sentence. This can be written as
var generator = new FileContentGenerator();

Another example that follows this is
var csvContent = this.fileContentGenerator.GenerateFileContent(fileContentGeneratorInput);

This is the redundancy i talk about. The file,content, generate is repeated 3 times in this statement that makes it difficult to understand on a high level read. This should instead be
var csv = generator.Generate(input);

This is not just an issue with user code, in .NET Framework there are some irritating methods like these
Directory.CreateDirectory(directoryName);
This is totally redundant, It could have been
Directory.Create(name);

The directoryName and name are parameter names as defined by the method. This is good enough to improve readability and put it in a concise manner. For further read on naming variables.

Partial Classes

Partial classes are useful when we want to separate the core logic and the helper functions associated with the core logic. This again improves readability and a chance to avoid merge conflicts in code. Let's assume that there is a class that has 1000 lines of code and does 6 different things. Don't hesitate to write the class into 6 files using partial classes. That way it gives clarity and a chance to decouple the code.
Particularly with ASP.NET Controllers , we can write all the API's in one file and it's internal logic into one or more separate files. This is not a replacement for single responsibility, but a way to improve the segregation of code.

Logging Metadata

When we log information we want to write the log with all the details, and in pursuit of that we end of making the logging statement too heavy and give it more responsibility. Consider the following statement

this._logger.LogInformation($"Processed the request for a network: {fileContentGeneratorInput.Name} and data type: {fileContentGeneratorInput.Type} and version: {fileContentGeneratorInput.Version}");

To capture context we express it heavily in the log statement and it repeats multiple times, because we need the context in all the places. Instead consider something like the following

logger.LogInformation($"Processed the request for input: {context});

and override the ToString() method for the object

public override string ToString()
{
    return $"{Name} and data type: {Type} and version {Version}";
}

Test content is not test

A lot of code in test will eventually be setup than invoking code under test and assertions. But if it starts to outweigh the test code, we either need to think of our design (or) we are trying to cover a large portion of code under one test. If both of them are not the case then we can safely say that we can move the test data to separate class and use it in test code to source dependency setup.
When writing such a class don't try to apply the DRY principle extensively, and try to write methods that are reused. Keep it repeated and simple so that the test code doesn't have any errors.

Example

fileGeneratorInput = new FileContentGeneratorInput()
{
  NetworkName = "Dummy Network",
  DataType = "Providers",
  Version = "V1"
};

fileContentGenerator.Setup(a => a.GenerateFileContent(It.IsAny<FileContentGeneratorInput>())).Throws(new Exception());
var controller = new FileController(_fileControllerLogger.Object, fileContentGenerator.Object);
Assert.Catch<Exception>(() => controller.Get(fileGeneratorInput));

The first part of it can be instead be written as follows inside a separate class . Here i have named the object tee. But the idea is that it returns the controller object.

var controller = tee.ControllerWithGeneratorThrowingException();
Assert.Catch<Exception>(() => controller.Get(tee.ValidRoster()));

There are lot of other concerns, but these are the ones that i reviewed in the code base i was working. If you find other interesting readability issues in code, post your comments.

Thanks !