Feedback on C++ solutions on interview training site AlgoExpert

Interviews for software engineer job position often involve coding interviews testing the basic coding and classical data structure and algorithm knowledge and understanding.

AlgoExport is a commercial site to train for coding interviews (including system design and behavior interviews). It is actually doing a good job at it. I have been using it when preparing for my job switch recently.

The coding aspect of the site consists of 155 coding questions. A list of all questions can be seen here. The unique selling point compared to Leetcode are the solutions and videos explaining the solutions in detail.

The license to get access to the training material costs around 100 USD. So we are not talking about a small hobby project here, but a commercial enterprise. I think this is important when we are talking about what the expectation on quality should be.

The solution always consists of running code and a comment indicating the runtime complexity. The solutions are available in 9 programming languages. And here the problems start. The C++ solutions have quality problems. While the solutions (as far as I can tell) produce the right results, there are two patterns of problems.

The usage does not reflect good usage of C++. For example, you will not find std::unique_ptr and many solutions have memory leaks. That is common over many interview preparation sites and might warrant its own article. While this is annoying, I will let this slide here.

What I want to focus on is that often time complexity stated is plain incorrect. Given that coding interviews often focus so much on the Big O notation, this is quite bad. A solution that is significantly slower than advertised is incorrect.

Often, the reason is accidental copying of vectors and strings. C++ passes objects by value unless otherwise specified, while other languages, e.g. Python and Java, use reference semantics for some variables and value semantics for others. That means if the code doesn’t state it otherwise, C++ will create a new object as a copy of an existing object, e.g. when passing an object as parameter to a function.

// just an example, in practice use std::accumulate
int calculate_sum(std::vector<int> numbers)
{
  int s{};
  for (auto i : numbers) {
    s += i;
  }
  return s;
}

vector<int> numbers{1, 2, 3};
int s = calculate_sum(numbers); // creates a (accidental) copy of numbers

Equivalent code in Java would not copy the vector and just pass the object as reference. In C++, the function should instead be declared as int calculate_sum(std::vector<int> const & numbers) being explicit in that it is passed as reference and explicit in that the vector should not be mutated.

I reviewed every solution and around ten solutions have material problems in that the runtime complexity stated in the solution does not match the code.

Let’s look at an example of the problem:

Note: I will need to quote the solution to some extent. I will shorten the examples to limit the publication of solutions developed by AlgoExpert while quoting just as much as needed to illustrate the point.

The question is about implementing a DFS on a tree structure. The basic algorithm implementation shouldn’t come to the surprise of anybody with a CS degree or equivalent.

The solution contains code that is structurally similar to this:

vector<string> Node::depthFirstSearch(vector<string> * array) {
    // [dirk] solution contains appending an element to array
    array->push_back(some_value);
    // [dirk] solution contains recursive calls in a loop
    for (auto node : [removed]) {
        node->depthFirstSearch(array);
    }
    return *array;
}

The solution time-complexity is stated as O(v + e) as the standard DFS is. I am not sure how much sense it makes to say O(v + e) in a tree when, by definition, e == v - 1, but anyway.

Worse, the runtime is not O(n + e), but O(n * n). Why? We call depthFirstSearch once for every node and add an entry to array (an out parameter) every single time. Each call, we create a copy of the current state of array. You can see the copy operation in line 103 of the assembly output of this godbolt link. On average, the size of array is 1/2 * n leading to an overall runtime of O(n * n).

This can be trivially fixed by changing the return value to void. There is just no reason to create a copy of the array in each and every call. The resulting code is simpler to understand and faster.

The question is about binary search (as the name of the question indicates). Again, a classical, well-known algorithm. The Solution 1 contains a recursive solution with a time complexity states as O(log n). The solution has the same problem with accidental copying.

int binarySearchHelper(vector<int> array, int target, int left, int right) {
    // either a or b in the usual algorithm
    // a)
    return binarySearchHelper(array, target, left, middle-1);
    // b)
    return binarySearchHelper(array, target, middle+1, right);
}

How can this be fixed? a) Changing the first parameter to const &. b) Changing the recursive call to binarySearchHelper(std::move(array), target, l, r) c) Using std::span (C++20).

O(log n) and O(n log n) is quite different in theory and practice.

Example 3: Product Sum

The question uses a std::vector<std::any> and recursive calls. Every entry is either an int or another instance of std::vector<any>.

Let’s shortly mention the fact that a std::variant would be the much preferred way to frame this question. In fact, I had to look up how to use std::any.

The solution contains the following code.

int productSum(vector<any> array) {
    for (auto el : array) {
        if (el.type() == typeid(vector<any>)) {
            productSum(any_cast<vector<any>>(el));
        }
    }
}

The solution is given as O(n) with n being the number int elements. The problem is that we have three extra copies of the vector and all its sub-vectors and elements per recursive call. One in the for loop where for (auto const & el : array) { would be better. In addition, there is an additional copy in the line of the recursive call. The function std::any_cast here returns a copy of the contained object.

This godbolt link contains a variant of the solution actually does have O(n) runtime, compared to the O(n * n) runtime of the original solution. Even getting to this answer required me to explore with different alternatives for some time. I am very skeptical if it is fair to expect this solution (or even this question) of an engineer in a whiteboard interview.

Other examples

Similar problems exist in other solutions, too:

  • Question: “Palindome Check”: Solution 3 where the solution is O(n * n) instead of O(n) due to repeated copies of std::string (assuming a C++11 compliant implementation of std::string).
  • Question “Min Height BST”: All three solution presented have the same problem as the Binary Search solution. The code presented in Solution 1 for example is O(n * n * log n) instead of O(n * log n) as claimed.
  • Question “Simple Cycle Check”: The solution is in O(n * n) instead of O(n).
  • Question “Subarray Sort”: The solution is again O(n * n) instead of O(n) as claimed.
  • Question “Dijkstra’s Algorithm”: The Solution 1 is O(v * v + e * e) instead of O(v * v + e) due to the usual copying instead of using references problem.
  • Question “Shifted Binary Search”: If the solution for the question Binary Search has the problem, it is not surprising the Solution 1 to this question as a similar issue: Again O(n * log n) != O(log n)
  • Question “Search For Range”: If the solution for the question Binary Search has the problem, it is not surprising the Solution 1 to this question as a similar issue: Again O(n * log n) != O(log n)

The solution are correct in that they produce the right result (as far as I can tell). They are incorrect in that the comment about the time complexity is not matching the code. When I interviewed recently, I was often asked to state the runtime and space complexity of the code. These mistakes can easily mislead those who trust these solutions too much.

All these problems were easily fixable by standard C++ practices. Nothing fancy. Just basic C++ practices.

Examples with issues where it does not lead to a regression in asymptotic runtime

Sometimes where are extra copies of data structures, but the resulting code does not lead to a regression in the asymptotic behavior.

Examples for that are:

  • Question “Generate Document”: Solution 1 and 2 have problems, Solution 3 is fine.
  • Question “Move Element to End: Solution 1. The solution contains an unnecessary copy of a vector. However, the solution is already O(n), so it doesn’t make a difference under that metric.
  • Question “Merge Overlapping Intervals”: Solution 1. The solution copies the vector O(n log n) times. The code would be made better by using std::pair and faster by having the lambda passed to std::sort take a const reference to avoid needless copy operations. Again, no real difference in the final result, but painful to read.
  • Question “Task Assignment”. Extra copy of the input vector where a const reference or a std::move be the much better choices. However, it doesn’t make a difference in the asymptotic runtime.
  • Question “Permutations”: Sure, in an exponentiation algorithm another O(n) doesn’t make a difference. However, interestingly the answer to the question named “Powerset”, which is structurally similar, is correct. While there is a structural problem with the C++ solution, some are correct. It appears again that there are multiple authors of the C++ solutions.
  • Question “Sort Stack”: The solution to Sort Stack again as a return value forcing a copy of the input and again the code is not using the return value. There is no reason for the extra copy of the input vector in the code. Changing the return value to void and the solution is faster and clearer. However, the solution is already O(n * n), so no real harm done.
  • Question “Longest Palindromic Substring”: Again extra copy operations that are not needed with more clear, standard C++ code. In this case, for example, instead of copying the input string, std::string_view might be a good solution.
  • Question “Group Anagrams”: This is worth mentioning because it is a different way how you can create copy operations without realizing it. The solution contains code like this.

      for (auto index : indexes) {
          string word = words[index];
      }
    

    The variable words is a vector<string>. So each round of the loop creates a copy of the string. A more normal usage of C++ would be string const & word = words[index];

  • Question “Valid IP Addresses”: Sure, the solution is O(1) because the input string is defined to be of length 12 or less, but that doesn’t make it a great idea to copy the string multiple times. It is O(1) with unnecessarily large constants.
  • Question “Reverse Words in Strings”: Same pattern. Usage of references or std::move or std::string_view would make clearer better code, but the issues present do not have push the solution beyond the O(n).
  • Question “Suffix Trie Construction”. Unnecessary copy operations, but no regression in asymptotic runtime.
  • Question “Same BSTs”: Again the same wasteful copy operations, but no overall regression.
  • Question “Calender Matching”: A lot of copy operations. A few std::move and the flow of ownership would have been much clearer. And the code would be faster.

It is fair to ask if, esp. given that the Big O notation is correctly stated in these solutions, this isn’t premature optimization. My argument would be that we are not talking about changes that are out of the ordinary. We are talking about a couple of basic principles of C++ programming. Premature optimization might be “evil” IMHO if it makes the code less readable, less maintainable. The proposed changes to not make the code worse. Often, they make the code actually clearer AND faster.

In addition, performance is often impacted by 1000 paper cuts. In most programs, none of these extra copies will show up in a profiler, but they are still wasteful. Don’t do wasteful stuff. A good C++ program should not have these problems, even if they do not cause a regression in the complexity estimation. Chandler Carruth’s his 2014 Cppcon talk “Efficiency with Algorithms, Performance with Data Structures” is worth viewing for more information about the topic.

Most cases are caused by the missing usage of references (or std::move or std::string_view/std::span) leading to wasteful copying of large data structures. This is not a single problem for in one question, but given that it is common in many solutions. It is a systematic problem on the site.

It is interesting that it is not the case in all solutions. I think that the site has multiple authors, and some authors were aware of these kinds of issues and others weren’t. Many solutions use references (correctly). References (and even const references) are not some obscure element of the language, but a core element to it.

The most likely explanation is that the solutions of other programming languages were translated into C++ without taking the differences between the languages into account. And this direct translation is the source of the problem. Given that the solutions and videos are the unique selling point of AlgoExpert, I would have expected more.

I did write the support team about these issues in November 2020. At this point, eight months later, the solutions have not been fixed, which is sad.

Let me be clear: AlgoExpert is a good service for coding interview preparation. I am recommending it. The selection of questions is okay. The videos are well-made and the topics well explained. But I am disappointed when statements, comments and code do not match up in such fundamental ways. I hope that the team eventually goes back to their solutions and fixes these issues. Until then, I would recommend to not use their C++ solution.

Leaving Pure Storage

Last week was my last week at Pure Storage.

Seven 1/2 years ago I joined Pure Storage working from a small chamber in the UK waiting for the US visa situation to work out. The overall engineering team consisted of a couple of dozen of engineers at the time.

Since then, the company grow many times over, and I had the opportunity to contribute to different parts of the company and the FlashArray product: From replication and data protection to the file system and since 2020 internal Engineering Productivity. In this time, I worked with exceptional engineers and managers. I learned a lot from them and I grow a lot as an engineer and as a technical leader. I am proud of what we have built in these years: The FlashArray is just an amazing storage system.

However, last Tuesday marked my last day at Pure Storage. Next month, I will join the Engineering Productivity team at Databricks.

What the Databricks team has built over the years is already truly impressive, and I excited to see what comes next. With an engineering team that is growing at the rate Databricks has grown and is growing, the engineering processes and tooling are critical to translate engineering team growth into sustained productivity and innovation. I am excited to join them.

Tip of the Week articles at FlashArray [Follow-up]

This article is a follow-up to the last article about Tip of the Weeks at FlashArray. We are now publishing these tips for more than a year. As part of our regular Engineering Productivity survey, we also got positive feedback.

Topics

  • TotW/27: Error handling at FA. We do not use exceptions. This article gives tips about how to raise errors, handle errors, and propagate errors.
  • TotW/28: Discusses different ways to asynchronously wait for events. It was the part of a bi-weekly series on concurrency that starts with TotW/22.
  • TotW/29: Discusses how to use C++ namespaces at FA
  • TotW/30: Discusses our multi-producer, single-consumer notification API.
  • TotW/31: An article about the explicit bool conversion operator in C++. Initially, we assumed that most articles would be about core C++. To our surprise, these articles were the minority, with most articles focussing on our custom libraries. This is the first core C++ article since TotW/20.
  • TotW/32: Discussed our multi-producer, multi-consumer notification API.
  • TotW/33: Gives tips about marking tests as expected to fail in the pytest test framework.
  • TotW/34: Discusses how to adjust the log levels of our logging API at runtime.
  • TotW/35: Discusses a hybrid data structure instead of a std::vector if only a few elements are expected. Hybrid data structure here means data structures that are optimized for a few elements (to avoid memory allocators and be cache-friendly), but allow growing to large number of elements seamlessly. To learn more about these data structures, I can recommend this talk from Cppcon 2016.
  • TotW/36: Discusses std::tie to write comparison operators in a less error-prone way.
  • TotW/37: Discusses how to write hash functions.
  • TotW/38: discusses a tip on how to write good log messages.
  • TotW/39: Discussed lambda captures in C++.
  • TotW/40: Discusses how to specify which of our memory allocators to use for a specific, important class.
  • TotW/41: Discusses the override specifier in C++. The publishing of this tip was synchronized with enabling the modernize-use-override clang-tidy check.
  • TotW/42: This was quite a basic tip about what constructors are and when to use them as well as what do not do in them.
  • TotW/43: Discusses how to “death tests” in our unit test framework.
  • TotW/44: Discusses how to use type hints in Python 3.
  • TotW/45: Discusses East Const style and West Const style in C++. Our C++ guidelines prefer East Const style.
  • TotW/46: Discusses how to register a class for automatic formatting in log messages.
  • TotW/47: A tip about how to use subprocess in Python.
  • TotW/48: A tip on how to remove tuples from our database. As discussed in this research paper from SIGMOD 2015, our database uses elide functions for tuple removal.
  • TotW/49: What happens if you call std::move. IMHO one of the most used and least understood operations in Modern C++.
  • TotW/50: Explains our ordered, concurrent map data structure we are using.
  • TotW/51: Discusses a specific aspect on how to write to our database.
  • TotW/52: Discusses our in-memory interval set class, which we intensively use.
  • TotW/53: TotW/12 discussed parameterized test. This tip discussed how to write tests parameterized on types using a system based on C++14’s generic lambdas.
  • TotW/54: Discusses an aspect of our smart pointer classes.
  • TotW/55: Discusses our different forms of assertions.
  • TotW/56: Highlights a different way to define a C++ unit tests which makes it easier to test asynchronous code.
  • TotW/57: Python fstrings.
  • TotW/58: Rate-limited in our logging infrastructure.
  • TotW/59: Python raw strings.
  • TotW/60: Enumerations and why enumerations should be preferred to bool parameters
  • TotW/61: A catch-all article about reasons to prefer Modern C++ to what I called C-style C++.

My goal was to reduce the number of articles that I wrote in the second half of the first year of TotW. And I did not fully succeed, as I still contributed most articles. However, four authors now contributing articles and one co-author stepping up and reviewing almost every article and taking over the main authorship on half a dozen.

C++ is still the main topic, with Python being the language used in 6 articles. Java was no longer had a dedicated article, but it was mentioned in TotW/60.

Most tips that I write are growing out of mentoring situations with other engineers or reviews. This keeps them relevant. Another source of inspiration are the large-scale refactoring efforts the team has done in the last year. Often you notice idioms that are outdated, but appear in new code due to code reading. Sometimes we are working on actively removing the old idiom. Sometimes we are improving the documentation. And, sometimes we write a TotW about the idioms and the tradeoffs.

Even after a year, I am still convinced that the TotW format is a great tool for internal knowledge transfer.

Tip of the Week articles at FlashArray

This week marks the half-year anniversary of FlashArray Engineering Tip of the Weeks.

These tips are short mails sharing knowledge about the programming languages we use and our software infrastructure. The idea is blatantly copied from Abseil’s Tip of the Weeks. As they say, imitation is the highest form of flattery.

Pure Storage’s FlashArray Engineering team has grown a lot in recent years. We rely a lot of code reviews for knowledge transfer, but relying on code reviews for knowledge transfer doesn’t scale well after a certain point. So, a year ago we started thinking about how to revamp knowledge transfer practices so that the software engineering organization can strive in its second decade.

Knowledge transfer isn’t getting easier by our style of C++, which differs in style and substance from general practice. From our own concurrency system including different locking primitives over our own smart pointers to our own allocators, guidelines you find about C++ might not always apply at FlashArray.

At the same time, it is getting more important in a time of Work-From-Home because traditional means of knowledge transfer are strained by the situation.

So we started to collect items that tech leads explain again and again and publish them in form of these tips. The tips follow a specific form, which I think are essential for the idea to be successful and not to become just another unread mail in the inbox:

What are the rules?

  • About coding. We would veto everything management-related or even tooling related. To stay relevant and trusted by engineers, we have to stay focused. On the other hand, all coding related tips are on the table. We have tips about C++, Java, and Python. If we have tips are pure C++ and tips covering our own infrastructure or how we apply C++ at Pure.
  • 200 - 500 words. Engineers should be able to read this in a few minutes.
  • Reflect what the consensus is. A TotW is not a place to publish new ideas. It should always reflect what the consensus about what should be done is. Each article is reviewed by other enginers and tech leads. They are good to keep you honest. I once wrote an article about “East Const” and it was vetoed because while I assumed “Prefer East Const” reflects the consensus at Pure, it actually didn’t.
  • Weekly: If you only sent a tip monthly, you cannot cover sufficient ground. If you sent out daily, people would stop reading. It would also become very stressful to produce that amount of articles. Weekly proved to be a great compromise.
  • Relevancy: Each tip needs to teach at least 30% of the audience something new and relevant. A tip shouldn’t only cover what is taught in our new hire training because then it would be boring for most readers. On the other hand, explaining an infrastructure that most engineers will never use in their careers at Pure is also not a great topic. We need to be relevant for the engineers if TotW wants to stay relevant. We cannot force anybody to read them or to apply them in practice.
  • Not a replacement for other forms of knowledge transfer: TotW is not a replacement for our new hire training and certainly not a replacement for documentation.
  • Tone: The tone is important. While our style is pretty down to earth, I am not good at injecting jokes into my writing, we always try to not preach from the pulpit. It is always important to treat engineers with respect. Nearly all guidelines have exceptions. We hardly use the word “Never” and each time we did there was a long discussion before we went with it.

In addition, each TotW gets a number and a permanent link. The hope is that engineers not only read them and apply them but actively quote them in code reviews. For that, we write the tips in simple Markdown and render the text into HTML to email them and into Confluence Wiki Markup language for the permanent link. The numbering scheme (also taken from Abseil) also makes referring to tips in reviews and discussion easier.

Topics

What topics did we cover in the last fix months? Most of them do not make sense outside of Pure, so I replaced the exact titles with a short summary.

  • TotW/0: Introduction into FA Tip of the Week
  • TotW/1: When noexcept matters? As we do not use exceptions, in general, we usually do not use noexcept. However, sometimes noexcept matters for performance, e.g. in move constructors.
  • TotW/2-3: Introduction into our own specialized string types. Due to some arcane technical detail of how our database is implemented, we have our own string types and they have a few pitfalls.
  • TotW/4: Assertions in unit tests.
  • TotW/5-6: Introduction into our error injection system in unit tests.
  • TotW/7: C++ and the auto keyword
  • TotW/8: Mark single-parameter constructors explicit by default
  • TotW/9: A tip on how to write good error messages
  • TotW/10: Introduction into our specialized vector-like data structure
  • TotW/11: A tip concerning w.r.t. to writing medium and large scale tests in Python
  • TotW/12: Parameterized unit tests
  • TotW/13: Tips about a strange detail about our concurrency system. In hindsight, this was the least relevant tip we published.
  • TotW/14: C++17’s return value optimization
  • TotW/15: An article about the principles of physical design and why we use struct more often then class. This is one of the questions that we get a lot from new hires because it looks very odd, but there are actually good reasons behind our approach to physical design.
  • TotW/16: Tips about our approach to configure the system.
  • TotW/17: Tips about primitive and boxed types in Java
  • TotW/18: Introduction into the Abseil’s btree_map. We do not use (yet?) Abseil in general. However, we made the btree_map available and are working on making the Swiss table available, too.
  • TotW/19: Why you never should use using namespace in header files.
  • TotW/20: Tips when and when to not use the Curiously Recurring Template Pattern (CRTP)
  • TotW/21: Further tips about configuration the system focussed on unit tests
  • TotW/22: Tips about high-level principles of highly concurrent systems. This is the first article in a bi-weekly series about concurrency.
  • TotW/23: Tips about when and how to comment code
  • TotW/24: Tips about own mutex implementation
  • TotW/25: Tips about our background job system
  • TotW/26: Tips about our concurrent hash table

We purposefully switch topics and even the difficultly so that even when one article wasn’t interesting to an engineer, there is a good chance that they next one might.

Conclusion

The feedback has been very positive. The tips are actively cited in code reviewers, which was the main goal. The tips have become an integrated piece into our training and knowledge transfer. One thing I learned how difficult it is to acquire more authors. Many senior engineers are active with reviewing articles (often multiple times) and ideas. However, getting people to write articles is hard. Of 26 articles only four have been written by other authors. My goal for the next 26 articles is that I write less than half of them by doubling down on outreach and outright bartering.

I actually enjoy writing from the tips to documentation, but (as you have figured out by now) I am not very good at it. While I live in the US, English is my second language. I wish we could have a technical writer, e.g. our user documentation team, supporting the project.

Summary

It is an approach to knowledge transfer with medium effort, focussed on the engineer and provides results. It doesn’t require tooling changes or workflow changes. It doesn’t require any special infrastructure. A mailing list and a Wiki page would do it.

Software Engineering Books for "NCG" (new college grads)

This blog post describes my top books for NCG (new college grads) starting a career as a serious software engineer.

Let’s say it right away: This is absolutely guided by my Pure Storage experience and targeted for a similar audience. Pure Storage is a Modern C++ engineering company who in the last decade built a high-performance, low-latency storage system optimized for Flash. A software engineer writing frontend code with JavaScript on the frontend might different requirements and should read different books. However, some of the books on my list would apply.

Titus Winters states his definition of Software Engineering in “Software Engineering at Google” as “Software engineering is programming integrated over time” (see also 2017 Cppcon Video). I see my work still more as a craft than an engineering discipline, but there is valuable, often ignored insight here.

The list is inspired by a conversation with a couple of junior engineers and my experiences in the first project I led some years ago where had the opportunity to mentor a number of new hires. All of them were really good at data structures and algorithms. The large emphasis on these topics in the interview process of all major tech companies helped. All were good programmers. All did a number of internships and had projects during their studies. However, at that time, they were not yet software engineers.

They do not have to. I didn’t expect that nor can I expect that. The CS programs have to focus on Computer Science. It is in the name. Computer Science is the cornerstone of systems and infrastructure development. You need to have a solid understanding of data structures and algorithms as well as principles of operating systems and concurrency. But it isn’t enough.

Programming is important because we actually have to deliver software, even if in my experience German universities would like to eradiate that part of the trade. We do not deliver a design document, nor a technical report or proof. We are expected to deliver working software. And we do have to maintain it for years to come.

There are a couple of books that can partly fill the gap left by CS programs:

The first book I want to recommend is “Effective Modern C++”. This is a must-read when writing modern C++ code. At Pure. we do not use exceptions so some recommendations do not apply. Neither do we use unique_ptr/shared_ptr but the principles of smart pointers, move semantics and strong ownership of memory still apply. The vast majority of tips from this book apply. So far so boring and uncontroversial, but the book is still more about programming in C++ then Software Engineering.

The second book on my list was “Effective Java (3rd edition).

Hear me out. I am a Java programmer by trade, so I might be biased, but I strongly recommend this book to C++ programmers. I wrote a blog post about it discussing why I think all C++ software engineers should read that book and how Java and C++ in some are getting very similar.

There is a significant lack in the C++ world on books discussing engineering for the higher-level concepts that “Effective Java” excels at. Effective Modern C++ is soo focussed down in the important low-level trickery of C++. With one chapter about smart pointers alone and one chapter about different reference types alone, that there is no space to discuss more higher-level concepts. But the higher-level concepts are equally important in C++ then they are in Java.

Honestly, no other book changed my perspective on software programming as much the first edition of this book did. I was really at the beginning of the transition from programmer to software engineer. The book focus on class contracts. For me it was an eye-opener. That you should write software that is not only correct in the moment, but still correct when code around it changes when it is used in different scenarios.

The third book on the list of quite old: “Code Complete 2” from 2004. The book is extremely comprehensive and not focused on a single language or domain. It covers everything from requirements, class design, method design, testing, debugging, refactoring, craftsmanship, and other topics. The book has a clear scientific approach to it containing lots of statistics why software construction and clean code matter as well as references to other literature and research. It has the advantage that is not to domain-specific as some of the other books.

Actually, a lot of these books are from the early 2000th. My first thought was this is my own bias because that is the time I grow up learning professional coding. But it appears that these are still the leading books in that area. So much has changed in the 10-20 years, but these books are still useful.

The fourth book on my list is “Clean Code: A Handbook of Agile Software Craftsmanship”. It is quite Java-specific. I would argue it is more Java specific than Effective Java (see above). So, why is it on the list? It overlaps with Code Complete, but is much shorter. There are certainly parts that might work in a Java environment, but not in a storage software environment, but if you keep an eye of for these cases it is good advise.

The next book is “Design Pattern: Elements of Reusable Object-Oriented Software” aka the classical “Gang of Four” book about design pattern. What is a design pattern? It is a “general, reusable solution to a commonly occurring problem in software design”. The book’s code examples actually use C++, although an outdated coding style. If you want to know why there are classes called “visitor” or “factory” this is the book to answer that.

The GoF design pattern and object-oriented programming got a negative reputation in recent years that I do not share. However, object-oriented programming without good guidance leads to a mess that might be worse then the alternative. That is why books like this are so important.

The book “Head First Design Pattern” is related to the previous book. Don’t be confused by all the pictures and the informal writing, it is indeed a really good introduction on the topic. This book proves that you can combine good even fun writing-style with complex content. If you are done with this book, consider going back the GoF book.

Next on my list is “Programming Pearls”. It isn’t a reference at all. It is actually a small book, but it contains essays about algorithm design, debugging, tuning, and solid software engineering. Not anywhere as comprehensive as “Code Complete 2”, but still interesting tips.

A bit different from the other books is “Code Reading: The Open Source Perspective”. Pure’s code base is quite big and any big, new code base can be overwhelming. Code reading certainly is thought in any university program that I have ever seen. The book is solely focussed how to deal with large, unknown code bases and how to understand, to navigate, and to maintain them.

The last book on my list is “Practical VIM”. One issue I see over and over again is that new college grads use VIM, but without any finesse. For example, when to change a file they close VIM and re-open a different file. No buffers, no splitting. I switched from VIM to vscode around a year ago (still using the VIM plugin, which is really good) so I am not saying you should use vim, but it you do knowing your editor will be increasing the productivity of engineers. So, reading this book is recommended for people who use VIM in the first place.

When I learned programming, there were many books for a language or even libraries. These have been replaced by websites and StackOverflow. However, the more general books explaining good code construction and good software engineering principles still contain important material. They are effectively time-less or at least the survived the last decades.