Seymour Hersh at cij's summer school 2006. Seymour Hersh at cij's summer school 2006.

CAR guide

Investigative journalism requires evidence. Whether accusing powerful institutions of wrong doing or exposing fraud, corruption or other abuse by corrupt individuals, the investigative reporter must have the goods.

Traditionally, documents and records provided reporters such evidence, but the time and effort needed to read or even skim thousands of pages for evidence limited reporters’ investigations.

In time though, documents and records were entered into databases, and a wonderful thing happened for investigative journalists. For those who possessed specific computer skills, they could analyse electronic records in minutes that would have taken days or even weeks to organise and read if they were still on paper. Those reporters had a cutting-edge, powerful tool to dig for evidence.

The tool is called computer assisted reporting (CAR). It uses common computer programs – such as Microsoft Excel spreadsheets and Access databases or similar open source software such as Open Office – coupled with the power of a simple web browser.

Reporters can hunt for evidence online or request them on discs from government agencies, load the data into their computers, and analyse complex figures for the basis of their stories. Using such techniques, reporters around the world have exposed corruption, fraud and abuses of power that otherwise may never have seen the light of day.

The cij offers training in CAR at all levels. Here’s how it works:

All the computer assisted reporting classes are streamed by complexity and practical application.

Hands-on

Each hands-on session is open to 25 participants on a first- come, first-served basis. Therefore, it’s best to sign up well in advance for the sessions you want. In these courses, you’ll use data from the UK, while you learn spreadsheets, database managers and other software for your analysis. By the end of the session, you can return to your news organisation with story ideas and computer-assisted reporting skills that you can apply immediately.

Demos

Many of the newest, more advanced computer-assisted reporting skills are allowing us to find stories we might otherwise miss. See how statistics, mapping, open-source software, social network analysis and web tools can expand your CAR skills. While the demo sessions are not hands-on, you’ll still learn how these tools are expanding the reach of investigative reporting.

Complexity

The CAR classes are offered at two levels, beginner (B) and advanced (A). Instructors will assume participants have the following skills before beginning each session:

Beginner (B)

No computer-assisted reporting skills are needed to begin this course, though participants should be comfortable with Windows, using a mouse, etc. The Excel and Access courses are sequenced, so participants should not take Excel 2 without completing Excel 1, nor Access 2 without completing Access 1.

Advanced (A)

Participants should be familiar with using spreadsheets and database managers and analysing databases of government documents and records to hunt for story ideas. Instructors will assume participants possess effective and efficient online search skills. Completing the beginner courses during the summer school will prepare you for any of the advanced classes.

CAR Workshops

CAR Intro (B), Demo

What’s computer-assisted reporting? How can reporters benefit from gaining CAR skills? Digging in institutional databases can take your investigative skills to a new level.  In this class, participants learn about the tools of CAR and see examples of how it can enhance their reporting. In addition to demonstrations, there will be time to ask questions in this session.

Internet 101 (B), Hands-on

Learn the best practices for the most effective and efficient internet searching. This class will take you through the basics of search engines, directories, social networking sites and internet skills that all reporters use in their hunts for sources, data and stories. If your searching isn’t returning sites and data you need, this is the course to get on the right track.

Beyond the search engine (B), Hands-on

Google is good, but if that’s what you’re relying on for finding documents and data, then you could be missing important information. Learn how to find reporting sources by mining the deep web.

Excel 1: The power of data analysis for stories (B), Hands-on

Data is everywhere – from government computers to websites. This course introduces data analysis using a spreadsheet programme such as Microsoft Excel. Spreadsheets help reporters find investigative story tips in the data.  Participants learn basic calculations, rates, ratios and other analytic tools that generate story ideas. Previous course notes are available: excel 1

Excel 2: Finding patterns in the data (B) Hands-on

The second spreadsheet course covers built-in analytical tools, such as sorting, filtering, chart creation and others, that help reporters quickly find great story tips within databases.

Excel 3: Summarising your data for the big picture (B), Hands-on

To complete your spreadsheet toolkit, learn how to make pivot tables that will summarise trends in your data.

Advanced CAR: CAR and statistics (A), Demo

Reporters using sophisticated statistical analysis of public data have broken many big stories. You will be introduced to statistical analysis for finding story tips from patterns in public data. It will demonstrate, using SPSS software, the power behind cross-tabulations and regression analysis, while introducing the concept of statistical significance.

Excel 4: Summarising your data for the big picture – Statistics (B), Hands-on

Statistical analysis that produces good story tips does not have to be done with statistical software. Reporters comfortable with spreadsheets will find that many stats can be done using Excel. This session takes participants through cross-tabulations and regression analysis using a spreadsheet, and shows how reporters find stories with these techniques.

Access 1: Understanding databases (B), Hands-on

Spreadsheets are a great way to get started with CAR. But what happens when that dataset gets a little too big, or your analysis too complex? That’s when it’s time to move to a database manager like Microsoft Access. This class will introduce the basics of working with databases, including basic queries, filtering and sorting. Previous course notes are available: Access 1

Access 2: Digging for the story (B), Hands-on

The second Access course continues by introducing more complex analytical tools and techniques. The session will cover grouping, counting, summing and other aggregate functions.

Access 3: Joining databases for deeper analysis (B), Hands-on

Basic analytical techniques only go so far when you have multiple datasets to work with. The third class in the database series introduces joins — the real power of relational databases. In this session, you will learn how to take multiple tables of data and stitch them together to find hidden gems that make a great story.

Importing Data to Excel (B), Hands on

The web is flooding with data, but before it can be analysed, it needs to be transferred to a computer. Here, participants will be introduced to different methods of data transfer from web pages and PDF files using Microsoft Excel. Instructors will also demonstrate advanced web scraping.

CAR A-Z: Questions and answers (B), Hands-on

So you’ve completed much of the CAR training, but you still have questions and concerns. Can’t remember how to do a pivot table? Wonder how executing a database join will help you discover a story tip to pursue? This session is for you. Using a Q&A format, instructors will review any of the CAR skills taught in the Summer School. Come with questions.

Text mining: Seeing patterns in paragraphs and other unstructured text (A), Hands-on

Learn about simple tools that allow basic patterns searches in text – such as full-text indexing in database managers – to more complex approaches just now being explored by reporters.

Using CAR during a financial crisis – US Sub-prime data/UK data (B), Demo

It’s one of the biggest stories of the decade, and to investigate who’s behind the financial meltdown, you’ll need some data analysis tools to walk through all the numbers and trends behind the daily stories of bank closings and bloated budgets. Learn about many of the data and the tools that can take your coverage to a new level.

Using CAR during the financial crisis (A), Hands-on

When financial firms teetered during autumn 2008, much responsibility fell on the subprime mortgages, used throughout the banking and financial world. Learn how reporters used mortgage databases to find those responsible for the worst lending and how to trace the money through the system. While concentrating on how this is done in the US, the session also will look at public financial data in the UK and Europe.

Mapping for stories 1 (A), Hands-on

You’ve seen how Google maps can quickly show the geography behind the numbers. Learn how to make a basic interactive map from a list of addresses with help from Google maps.

Mapping for stories 2 (A), Hands-on

Mapping as a reporting tool is exploding onto new websites. This session will introduce you to geographical information systems that produce statistical maps and other visualizations. Learn how to use ArcView to analyse data geographically to dig deeper into your reporting. See accompanying notes about online mapping and ArcMap.

Social network analysis (A), Demo

Journalists often notice how various groups differ from the rest of the society in terms of sex, age, income level, etc. This course introduces the use of methods that enable us to examine the social structure inside a group and between that group and society.  It is now possible for a reporter to describe who has the most powerful connections in a community and how business boards are connected through interlocking directorships. Course notes are available on social network analysis.

Advanced CAR: Text mining to investigate corporate and other documents (A), Demo

Often, what we want to analyze is not in the nice rows and columns of a spreadsheet or database. Instead, it’s information in reports and documents organized by paragraphs, or what is known as unstructured text. See how text mining – a cutting edge technique – helped expose the lies the Bush administration used to justify the Iraq War.  Also learn new tools that will allow you to go deeper into text documents than ever before.

Freedom of Information Act for CAR (B), Demo

Through the CAR training courses participants gain the necessary skills to search, analyse and investigate electronic data, but how do you get that data in the first place? Few public bodies in the UK voluntarily provide such information even though it is collected and stored for the public’s benefit and at public expense. Heather Brooke has made her career getting this data, most notably with MPs’ expenses receipts, but she also tracks down other official datasets from bridge inspections to restaurant hygiene reports. She reveals her tips for getting official data from bureaucracies using all manner of tools from the Freedom of Information Act and EU directives on the reuse of public sector information to environmental law and politicians’ own promises for greater transparency.

Newsgathering online (Murray Dick) (A), Hands-on

Learn how to streamline your newsgathering in a world of information overload. This practical lab will explore various free sources for online news, spanning ‘push’, ‘pull’ and ‘push-pull’ technologies. It will also explore search theory, aggregated news, and personalised syndication in online newsgathering. Automated news discovery via RSS feeds (and their application across active and static sources) will be explored, as well as advanced filtration of syndicated content. This practical lab will introduce those cutting edge technologies (including real-time search) which offer more and more sophisticated means of turning your laptop into an up-to-the-minute, personalised wire service.

Finding people online (Murray Dick) (A), Hands-on

This course introduces various advanced and lesser-known search methods for finding investigative sources. Find whistleblowers and experts in esoteric fields using a number of methods that can help both improve accuracy while saving time during lengthy investigations. Participants are introduced to the ‘hidden web’ and other subscription, free, and non-indexed sources (including directories and archives) that can help in advanced online search. The application of various social applications will be demonstrated, and their application to the field of investigative journalism will be explored. Course notes are available: finding people online.

Build a personal research database (Luuk Sengers) (B), Hands-on

Never lose a shred of paper, a telephone number, a note, a good idea – put everything that’s related to your investigation into one simple database. This database works especially well for reporters who complete their investigations in small steps, separately from other tasks, and for projects in which several reporters work together. Participants receive the database software (‘Digital File’) for free.

Practical approach to improving computer and Internet security and privacy, Part II (Wojtek Bogusz), Hands-on

This session explains how to encrypt information on a computer or USB device, create and use a new secure email address, and browse the Internet anonymously. This hands-on training will be based on the Security-in-a-Box project. It is a toolkit of peer-reviewed, free and open-source software and includes user-friendly guides for improving the security and privacy of storing information and communication. The whole toolkit is available online from www.security.ngoinabox.org. The toolkit, with both the tools and guides is currently available in French, Spanish, Arabic, Russian and English.

A Non-technical Guide to Automated Web Browsing (Automating website data extraction) (Michael Schrenk), Hands-on

Online research often requires repetitive downloading of web pages. That process – along with extracting information found on websites, is tedious and error prone. This hands-on session describes a free, easy to use tool (iMacros) that allows journalists to automate the process of computer aided research. You’ll learn techniques to effectively use this screen scraping tool. You can combine these skills with the lessons taught in the related session ‘Conducting Research Anonymously’ (same tutor, TALKS strand) to create a very powerful set of research tools. See Mike’s notes on automated web browsing and anonymity.

My SQL 1 and 2, (A) Hands-on (Aron Pilhofer), Hands-on

This is a two-part introduction to the free, open-source database MySQL. Think of it as Microsoft Access on steroids: quite a bit more powerful, despite the fact that it’s completely free. Section one will be cover the basics of MySQL: The MySQL database environment, understanding client/server architecture, basic SQL. Section two will cover slightly more advanced topics, such as importing data, exporting data, more advanced SQL commands.

analysing data is the future for journalists
According to Tim Berners-Lee, inventor of the world wide web, “data-driven journalism is the future” find out why in this Guardian article.