Dealing with Data means dealing with Software. Sometimes it makes sense to use off-the-shelf software solutions that others have spent years developing and sometimes it makes sense to build your own. At Eighty20, we like to live by our pragmatic name and there are many instances when we absolutely just do the former. However, there are lots of cases where it has definitely been the right decision .
Open Source
When we do, we’ve generally opted for Open Source Software (OSS) solutions, such as Python, as our preferred programming language and PostgreSQL and MySQL as our preferred transactional database technologies. As the exception that proves the rule, we use the proprietary Vertica database for our analytics workloads. There are a few reasons for this skew towards OSS:
1. We service multiple clients across multiple industries, each with their own preferred technology stacks.
They may operate a Microsoft ecosystem, or Oracle or SAP or Salesforce or some combination.
Given the complexity and deep integration of these systems and technologies, it would be impossible for us to attempt to become experts in all of them.
Many of our clients have vendors who sell them these components and then manage them on their behalf for just this reason.
One of the guiding principles behind OSS is to do one thing, do it well and make sure you’re able to interface with different components that do other things.
By not buying completely into any particular one of our client’s proprietary tech stacks, we give ourselves the ability to interface with all of them.
That said, we still aim to have a good enough understanding of those tech stacks to make sure that we can interface with them effectively.
2. We don’t want to spend time solving problems that have been solved before.
Most technology problems are like this and some helpful person in the open source community, who’s encountered a similar problem before, has probably published their solution.
So, if we need to query a particular type of database we’ve not encountered before, somebody has probably built a Python library that does this and pushed it to GitHub.
Or, if we need to run our own instance of that obscure database, we can pull an image off DockerHub and run that, without needing to fight to get all its dependencies installed before it will even start.
Of course, there are risks with just running random code one finds on the internet and so we follow established best practices for ensuring that libraries we use are vetted and safe, using tools such as Snyk.
In a worst case scenario, we’re able to fork an open source library and modify the code ourselves to ensure that it’s safe to use.
3. There’s one other small matter: OSS is generally free to use.
Some things are definitely worth paying for and it’s very easy to get caught in a trap of spending hours instead of dollars getting something to work. However, we’ve developed some good patterns that give us confidence that we can build solutions efficiently off our base tech stack.
For once-off projects and clients, or even for producing a proof of concept for an existing client, this low cost to just get started certainly has its advantages.
Our Swiss Army Knife
The saying goes that Python is the second-best programming language for everything. We’ve certainly found this to be the case.
For any specific problem or domain, there generally are languages that are probably better suited to that specific arena. Good examples are R for statistical analysis (which we still use when it makes sense to), GoLang for distributed processing and C or C++ for raw computational speed. However, those other languages either don’t work all that well outside of their specific domain, or they require a significant amount of time, training and experience to use properly without resulting in memory leaks or other subtle bugs.
The reason that Python can be used for basically anything, is not that it was designed to be an all-purpose generalist language but rather that it was designed to be easy to write and, perhaps more importantly, easy to read. A famous quote by Robert C. Martin (Uncle Bob) from his book “Clean Code: A Handbook of Agile Software Craftsmanship” goes: “Indeed, the ratio of time spent reading versus writing is well over 10 to 1. We are constantly reading old code as part of the effort to write new code. [Therefore,] making it easy to read makes it easier to write.”
Python reads like pseudo-code, an almost plain English description of what the code is attempting to do. Because of this, many people across varied disciplines – without much coding experience but with incredible domain knowledge – have gravitated towards Python. Together with more experienced developers, they have created libraries and frameworks that make working with Python in these domains easy and possibly even joyful.