Back in the 20th century, a media company willing to restructure its information management capacities would pass a 5 or 6-figure contract with an IT behemoth like IBM or SAP. It would receive a proprietary piece of software run by a handful of techies trained for this specific tool. The graphical user interface would be reminiscent of the inside of the cockpit of a Boeing 747, feature requests would be a distant dream and maintenance tied to a 10-year long deal.
Many a manager still thinks that software should be bought this way. For the purposes of datajournalism at least, they couldn’t be farther from the truth.
Datajournalism is first and foremost about trying new ways to do journalism. You cannot predict which tool you will need in 24 or even 12 months and contract a big-name company for the next 5 years. Instead, datajournalists should try out new tools as often as humanly possible.
and just 4 examples of the dozens of tools that sprouted in the past 3 years for the benefit of storytellers all across the web. Lists of tools
are regularly updated but their sheer number makes it hard to keep track.
What matters is that you and your team can adapt quickly to a new tool and are not prisoner to a specific vendor. A key point to watch out when choosing a tool is the export formats available. If a tool cannot export the data or its visualization in an open or commonly-used format, it should be a no-go as all your work could be lost, were the tool to cease operating at some point.
One of the main goals of datajournalism is to transform newsrooms into data-driven powerhouses to which customers (end-users or companies) can turn to when they need reliable data in any format (text or tables). Now, no one knows exactly how a data-driven organization looks like but we know that it has to do with retrieving and aggregating data quickly.
One of the key aspects in this approach is to structure the data in a coherent and reusable way. Project PANDA
is an open-source “data library” developed in part by Brian Boyer
. It serves as a repository for all data within your organization.
Go open source
, for instance). It means that if their servers crash and that your data is deleted, you have nothing to do but patiently rebuild. Another clause that you will find in most Terms has to do with licensing. Read it from Storify’s ToS
You grant the Service a royalty-free, non-exclusive, worldwide license to publish, use, distribute, modify and make derivative works from your Content and curated stories.
In other words, the service can re-sell your content to your competitor if it so wishes. This might be something that you want to avoid. Google Maps issued a wake-up call to the community in late 2011 when it abruptly changed its Terms of Service. Google Maps, which used to be free, would now cost as much as $4 per 1,000 requests
If you plan to use a tool on a strategic scale, you should definitely avoid these pitfalls. You can either deal directly with the provider to draft a new contract (the premium plans of most services include such clauses anyway) or install your own instance of an open source tool.
is a good alternative to Google Charts, for instance, as it is entirely open source and can be hosted on your own servers [disclaimer: I’m part of the team developing it]. When it comes to mapping, Open Street Map
can be used in place of Google Maps.
Chances are you or a journalist around you will deal with sensitive data at some point in the future. Even if you think that most of what you do is bland and unexciting, you should prepare for the eventuality of someone bringing you a disk full of leaked data.
In such scenario, what matters is to keep your data off the internet and on your own machines as long as possible. Several handbooks will teach you how to securely encrypt your communications, like Tactical Tech’s Security in a Box
. In terms of data handling, you should work on the unencrypted material only from a computer that is not connected to the internet. To process the data, use OpenOffice Calc
, the open source equivalent of Microsoft Office. Having your own instance of MySQL running on your laptop is a trivial thing to do thanks to MAMP
(Mac) or WAMP