My Final Year Project was a valuable learning experience for me. One of the things I learnt was how to use various tools and software to facilitate the research and writing process better so that I could focus on the project and not get bogged down by time-wasting procedures, e.g. data management, cleaning up citations, etc. I had been planning to blog about the tools I used for my FYP, so here I am with my first post.
Quick question for you: while in the midst of your research project, how do you consolidate your data, quantitative or otherwise? For many of you, it boils down to needing a program that imitates a table so you can fill in rows and columns. In many cases, people end up using Excel for this. Even until university, Excel appeared excel-lent (heh) for recording data. Many working professionals still use Excel at their jobs for databases. But at the preliminary stage of my FYP data collection, Excel began to frustrate.* I later learnt that Excel, while suited for basic databases, isn’t really purpose-built to be a database; it’s primarily an accounting software with some basic data analysis and reporting built in.
My FYP data collection, briefly speaking, was a literature review. I was reading academic papers, and trying to tag them with issue-based categories pertinent to my project framework, e.g. socioeconomic issue, conservation issue, wildlife trafficking issue, etc.
Here were the main issues I faced:
- When I wanted to ‘tag’ a paper (e.g. under a ‘Benefits’ column’, I basically had to type out a category. If I misspelled the category or decided to change the phrasing for the same tag on a subsequent paper, I had to edit the previous one manually (Ctrl-F and Replace, et cetera). I ended up with multiple versions of the same tag, which I would struggle to sort out when I wanted to come up with a descriptive statistics summary.
- My research database was initially paper-based (i.e. every row was a record for a paper, and I would fill in relevant information about everything else in the columns). I later realised that I also wanted my data to be species-based, i.e. I needed another table with every row being a record for a species, and relevant data filled up in the columns, including the papers which mention that species.
Not sure what I was talking about? Here’s a capybara.
A capybara (Hydrochoerus hydrochaeris) is one of the wild species that was being bred in commercial captivity, so it was included in my project. The capybara was being farmed in four countries and was documented in three papers. The papers were providing different assessments on the conservation impacts of farming the capybara. Do I sort this info according to species, papers, or country? Or should I sort all three categories in three different tables and keep all the relevant information consistent between them as I read more papers? The latter is the best idea but also the hardest to manage manually.
Problem (1) is something that can be averted in Excel if one is extra careful and meticulous. It’s just that Excel is prone to such errors, and data entry requires much more verification and double-checking than it should have. Problem (2) was something that neither Excel nor Google Spreadsheets seem able to do automatically. Something in the back of my mind nagged me that there was probably something out there that could do what I desired, but for months I basically filled up the Papers Table, then manually updated the Species Table with the same information. I didn’t even bother making a Countries table. It was prone to error, and I constantly wished for a better way.
What I didn’t know then was that I had independently ran into problems that data management professionals had been also frustrated with since the 1970s, and had already addressed with the creation of relational database systems. I later found out about SQL and Microsoft Access, but I couldn’t quickly figure out how to use them. It wasn’t intuitive straight after installation, and I didn’t have the time to devote myself to learning a database platform which I wasn’t able to pick up right off the bat.
Enter Airtable. I stumbled across it while hunting for database alternatives, and unlike the rest, I was able to use its basic functions almost immediately.
Airtable is free and accessed through the Internet, which was a big plus for me, since it was hardware-independent and I didn’t have to pay any money to give it a try. In Airtable, you create ‘bases’, which are the equivalent of an Excel file that can contain multiple sheets. These bases, like Google Docs, can be shared with others with varying access privileges e.g. owner, creator, edit only, read only. This opens up the base to collaborative use.
Airtable markets itself as a cross between a spreadsheet and a database, and anyone can see why. On the surface, it resembles Excel and other spreadsheet apps more than database apps like Access; it had rows, columns, and multiple tables in one base. The interface looked even simpler than Excel, and manipulating the cells just as easy; there was no meta view where you had to define the relationships and field types like in Access. But underneath the friendly appearance and ease of editing was a surprisingly versatile relational database.
Unlike Excel, the first row and column are frozen and locked by default, and you can’t change it. That is because the first row is used to define the data types for the respective columns, and the first column is used to identify the record for each row. Familiar to seasoned database managers is the ability to assign different data types to the various columns e.g. short text, numerical values, dates, etc. Some of the more unconventional but useful data types available include checkboxes and file attachments, the latter which can be immensely useful for sharing files between members of a base.
But the feature which got me the most excited was a data type called Linked Records. When selected, it converts words separated by commas into categories, and lists them on another table as the first column. This is the ‘relational’ part of Airtable. If you update the column with a linked record, it also changes the other table automatically. This was the part which solved both problems number 1 and 2 for me. Linking records or creating relationships between tables is supposedly something that you can also do in Microsoft Access and SQL. However, I spent hours trying to figure it out in Access and I still couldn’t replicate this functionality. Also, SQL would require me to learn their language before I could begin using it. Call me stupid, but doing the same thing in Airtable just required a few clicks through a spreadsheet-like interface I was already familiar with. If you need something that does the job without requiring a database management training course to use, Airtable is ridiculously simple and easy to pick up, yet incredibly functional, especially for a project with the scope of an FYP.
Airtable also features a Formula field, which allows you to manipulate data in a column in the same way you would in Excel. It has most of the same functions that Excel users will be familiar with. But unlike Excel, you write the formula in the header row to dictate all the results in the records, rather than copy-pasting/drag-copying the same formula down the column in Excel. There are plenty of functions available and creative ways to construct your database so that it best fulfills your needs; it’s definitely worth your time delving into how to use the formula field and the rollup field once you’ve figured out the basics. For advanced Excel users, these slightly more technical aspects of Airtable allow you to unlock a higher level of potential.
Another useful feature which apes an existing Google Docs function is the ability to create forms that can be shared like a survey to collect data. In this way, you can populate your database directly from online survey questions, instead of transferring the results of handwritten survey forms to a database one-by-one.
A few caveats: Airtable can’t really do data visualisation, but you can export the data as a CSV, then import it into a data visualisation software like Tableau or Excel. There also isn’t really conditional formatting like in Excel (although this function can be emulated somewhat). That being said, Airtable is a relatively new product that is still being improved, so a few of these features may come along at some point. Also, Airtable does do one thing worse than Excel: it’s not really meant for 2-dimensional calculations. That’s what a spreadsheet program like Excel is for.
I hope this basic introduction gave you a quick look at a tool which I found extremely invaluable during my FYP. Without Airtable, my data management and interpretation would have been more protracted and torturous than it should have been. I’m glad Airtable came out before I had to do my FYP, and I hope you also find out how it’s just a much better option for data management than Excel ever was.
If you’re interested in trying out Airtable for your FYP (or any other project), I’ll like to invite you to try it at this referral link, and let me know how it goes!
*Or in my case, Google Spreadsheets, which is pretty much the same thing, except reliably cloud-based. I wanted something I could access anywhere.