Introduction
We all know that feeling. We have this great idea about a new project, feature, function, piece of code.
What do we want? Write that amazing new code!
When do we want it? Right NOW!
The aim of this post is to try and give you at 3 good reasons to resist that urge and consider other options, be it in your business projects or your private projects. With an example of how I failed and how I tried to remedy that failure, on a very small scale.
The 3 reasons
1. New code takes time (and money)
Writing new code is an investment. Time and money will be spent on designing, implementation and code review. These introductory investments are however only a minor part of the total cost of writing new code. The code must be well documented and maintained. The code must be integrated to other parts of the systems. Last but not least, the code must be tested, and writing tests usually involves writing, well, more new code.
2. New code means new bugs
Even through our best efforts and testing, bugs will be found and will need fixing. Numbers on this seem to vary a lot, Code Complete by Steve McConnel estimates an industry average of 15-50 bugs per 1 000 lines of code.
3. We write what we know
Perhaps the most compelling reason to reconsider and resist the code-writing is not in the numbers and statistics, but in the simple realization that we usually write new code using our current knowledge.
Pausing for a while and spending time investigating on the current best practices and methods of solving the issue we are aiming to solve with our new code may not only save us and our business owners valuable resources, but also increase our knowledge base thanks to that investigation.
Putting it to practice in the R world
So we have this brilliant new idea. Instead of starting to write that shiny new code, we can also start with:
- Google - It is more than likely that someone has already stumbled upon this very same, or a very similar problem. How have they implemented it? What functionality have they used? What are the best practices and approaches to tackling similar issues?
- Stackoverflow and Rseek for R solutions - Can we find solutions to our problem there? Are those solutions good? Can we build upon them?
- Evaluate the options - If we have found any, which of them are the most suitable for us? If stability and maintainability is a major concern, can we find a solution with as few dependencies as possible ? If performance is a major concern, are benchmarks available (can we make them)?
- Propose a solution - After this research, do we still need to write the new functionality? If so, how much can we build on existing solutions? Are they easy to integrate?
- Do we care about dependencies? - The R world is special, one of the reasons for this is CRAN. The number of packages available on CRAN passed 13 000 and it is very convenient to just reach out and grab one more. This approach however has its caveats.
A simplest example - learning from my own mistakes
How I did it wrong
One of the first RStudio addins I have written for my own use was to run a script open in RStudio with R --vanilla
via a keyboard shortcut and open a file with the script’s output in RStudio. If I had to guess, my thought process was likely similar to the following:
- I will to write a new function to serve as the addin binding
- I will to write a new function to serve as command executor for both Unix-like systems via
system
and Windows viashell
- I will to write a new function to create the command to be executed by the above
- Maybe some utilities, like the ones converting
~
to a full path, figure out integrating the 4 together, passing arguments, etc.
So, there I was, some time and 92 lines of code and doc later, with a new useful RStudio addin. Oh and yes, there was also 102 lines of test code, fixed a couple of times, too.
How could I do it better
After a second look a few months later when actually reviewing this supposedly good functionality, I realized that
- There is a base function called
system2
, which seems like a much more user-friendly and easy to use version ofsystem
(andshell
), with no real need to write system-specific code and even though less configurable thansystem
, still perfectly sufficient for my purpose - I do not actually need to make the command, as extra options can be passed to
system2
as arguments, including redirecting output - Oh, and I definitely do not need a function to convert
~
to full path, there ispath.expand
So after a quick rewrite, we end up with a very similar functionality, only we suddenly need 35 rows of code, doc included and the tests shrink to 10 lines, as there is only 1 function to test instead of 4. That is less than a quarter of the original amount of code to be maintained and bug-fixed, with 0 new dependencies added.
This was of course a very trivial example. Real life problems of real-life projects will be much more difficult to solve. However, as complexity scales, the potential amount of time and resources saved will also scale.
Good luck resisting that urge the next time it comes ;-)