Java Incremental Project Improvement

Our Use Case and Tools

EDITED 03/11/2023 Adding PMD suppression explanations.

I have recently decided to try to improve the code base we have been working on for 4 months or so by installing some static code analysis tools. The two main libraries in question are PMD¹ and Checkstyle - old classics of the Java world that still get regular updates despite their age.
Besides those two we also have a more modern testing library called ArchUnit² to add some architecture rules and guidelines to the project.

To nobody’s surprise the result of the initial checks were disastrous. We had more than 2000 PMD error violations and even more checkstyle warnings.

My initial approach was to try to fix and clean as many things I could possible do during free afternoon, but I quickly ended up in an entangled web of refactoring leading nowhere. Frustration quickly settled in and I was forced to call in the day with a very bad mood and no improvement at all.
If any progress was to be made I would have to find a way to break the problem into smaller bite-sized incremental steps.

PMD configuration

My first idea was to disable most of the PMD checks using exclude statements and fix the rest.

<rule ref="category/java/bestpractices.xml">
    <exclude name="GuardLogStatement"/>
    <exclude name="LiteralsFirstInComparisons"/>
    ...
</rule>

<rule ref="category/java/design.xml">
    <exclude name="LawOfDemeter"/>
    <exclude name="DataClass"/>
    <exclude name="ImmutableField"/>
    <exclude name="TooManyFields"/>
    ...
</rule>

This improved a little bit the quality of our code but there was still a glaring issue - new code could still be added that violates all of the excluded rules. We also had no idea how much work needed to be done to activate any of the disabled rules.

Our sprint was closing to an end, and we had a little bit of free time in between retrospective meetings and such.
This is when I had the idea to just enable all PMD checks and simply count the errors we had. We would then go over the list and decide which person in the team would fix which failing rules.

If we do this at the end of each sprint we should in theory be able to progressively activate more and more rules and hence improve quality gradually.

Maven Config

I quickly added some maven configuration for the pmd plugin :

Adding two pmd config files pmd-full.xml with all rules enabled and pmd.xml with our current state.

PMD plugin in maven

 <plugin>
     <groupId>org.apache.maven.plugins</groupId>
     <artifactId>maven-pmd-plugin</artifactId>
     <version>3.21.0</version>
     <configuration>
         <failOnViolation>true</failOnViolation>
         <failurePriority>3</failurePriority>
         <printFailingErrors>true</printFailingErrors>
         <maxAllowedViolations>5</maxAllowedViolations>
         <rulesets>
             <ruleset>${pmd-file}</ruleset>
         </rulesets>
     </configuration>
     <executions>
         <execution>
             <goals>
                 <goal>check</goal>
             </goals>
         </execution>
     </executions>
 </plugin>

And to switch between the two configuration files:

 <profiles>
     <profile>
         <id>pmd-full</id>
         <properties>
             <pmd-file>pmd-full.xml</pmd-file>
         </properties>
     </profile>
 </profiles>

Now running mvn pmd:check -Ppmd-full -pl :project-id will run maven with pmd-full profile that will load the full pmd check and running with no profile will run the default (current) pmd.xml.

Reactivating Rules

The next step was to start activating the excluded rules one-by-one by fixing whatever we can and ignoring the rest with the annotation:

Run pmd for your module
```
 mvn pmd:check -pl :my-module
```
Identify a rule that you wish to reactivate
Fix as much violations as you while time-boxing yourself to avoid endless refactoring.

Suppress the rest with annotations

 @SuppressWarnings("PMD.UnusedLocalVariable")

Hacking a Script

Let’s add a simple shell script that will clean the output of the command and then do sort and count so we get a nice output:

source_code_health.sh

if [ -z "$1" ]; then
  echo "Usage: source_code_health.sh <module to analyse>.
  exit 1
fi

mkdir -p "code_health"

mvn pmd:check -Ppmd-full -pl ":$1" > result.pmd
awk '/\[INFO\] PMD Failure/ {print $0}' result.pmd | sed 's/.*Rule:\([^ ]*\).*/\1/' | sort | uniq -c | tee "code_health/code_health_$1_$(date +'%m-%d-%Y')"

And here is the output result:

➜  my-project git:(feat/code_health) ./source_code_health.sh project-id
AtLeastOneConstructor
AvoidDuplicateLiterals
AvoidFieldNameMatchingMethodName
AvoidFieldNameMatchingTypeName
AvoidLiteralsInIfCondition
CallSuperInConstructor
CollapsibleIfStatements
CommentDefaultAccessModifier
DataClass
ExcessiveParameterList
ExcessivePublicCount
ImmutableField
LawOfDemeter
LiteralsFirstInComparisons
LocalVariableCouldBeFinal
LongVariable
MethodArgumentCouldBeFinal
MissingSerialVersionUID
NullAssignment
OnlyOneReturn
PrematureDeclaration
ShortVariable
SignatureDeclareThrowsException
TooManyFields
TooManyMethods
UnnecessaryCaseChange
UnnecessaryConstructor
UnnecessaryLocalBeforeReturn
UseLocaleWithCaseConversions
UseObjectForClearerAPI
UseUtilityClass

The file is also stored in code_health directory with a date stamp allowing us to commit and track how things evolve.

Next Steps

It would be interesting to add a git-hook and ci/cd step that compares the previous code health with the current.

We have started separating the rule violations into lots and assigning them to developers at the end of a sprint. This will effectively allow us to parallelize the work and measure the progress with the new source_code_health.sh script.

Archunit

The other tool we used to improve code quality was ArchUnit which is basically a unit testing framework for your architecture.

Of course we faced the same issue when we tried implementing our first rules. For example, there were tens of classes that did not respect our newly established rule that no Command class should reside outside of com.project.application package.

So instead of completely scrapping that rule or spending days on refactoring we ended up ignoring the list of offending classes by using the Archunit ImportOption API:

@DisplayName("Command Handlers")
class CommandHandlersArchTest {
  List<String> badClasses = List.of(
      AccountHandler.class.getSimpleName(),
      CompanyHandler.class.getSimpleName(),
      ...
      and many others
      ...
  );

  ImportOption knownBadFiles = (Location location) -> {
    for (String badClass : badClasses) {
      if (location.contains(badClass)) {
        return false;
      }
    }

    return true;
  };

  @Test
  @DisplayName("Should be in the application layer")
  void noCommandsOutsideAppLayer() {
    JavaClasses importedClasses = new ClassFileImporter()
        .withImportOption(knownBadFiles)
        .importPackages("com.project");

    ArchRule rule = classes().that().implement(Command.Handler.class).should()
        .resideInAPackage("com.project.application");

    rule.check(importedClasses);
  }
}

Once again we ended up with a rule that protects any newly added code and that clearly lists the remaining work to be done on the legacy. We can now easily divide it among colleagues and advance in small increments whenever we have free time.

Conclusion

When facing a big legacy codebase that needs improvement we need to divide and conquer the problem by finding a way to improve the code incrementally and measure the difference between each increment.

Do any alternative or better solutions exist for the problems discussed in this article ? Probably yes (PMD incremental checks maybe). It is however worth noting that even with rudimentary scripting knowledge it still relatively easy to hack a shell script and achieve the end goal albeit adding some manual steps.

Notes and References

PMD check plugin: https://docs.pmd-code.org/latest/pmd_userdocs_tools_maven.html ↩
Archunit user guide: https://www.archunit.org/userguide/html/000_Index.html ↩