SQiRL Home Page

There are three sets of documentation for SQiRL. Pick the one that best suits your needs.

Quick Start Guide
User's Manual
Developer's Manual

Quick Start Guide

Download and unzip the SQiRL.zip file. Click on the "Start SQiRL" icon to begin.

When SQiRL first starts up, you'll see a large central canvas with a thin panel on the left. The dominant feature of the canvas is a doughnut-shaped widget, or ring. The side panel contains a two-level tree structure of attributes and values. In an opinion poll data set, the attributes represent questions on the survey, and the values represent the range of available answers.

At first, the canvas is blank, save for the empty doughnut in the middle. You can drag any attribute from the side panel to the dought to find more information about it. Try dragging the "2004: Who did you vote for?" attribute from the side panel onto the ring. The ring then fills with a graph showing the results to that question.

To see the results to multiple questions, you can drag other attributes over to the ring. Drag Do you have health insurance? and Annual Income to the ring. Now you see the breakdowns for all three questions, among the entire survey population.

While it can be interesting to look at the survey population as a whole, what you probably want is to see how certain segments of the population responded to the questions. To do this, double-click on one of the attributes in the side panel. The possible values for this attribute are revealed. Dragging one of them into the inner circle of the ring limits the population to only those who match that value. As a concrete example, double-click now on the Age attribute. Now, drag 18 to 25 into the ring's inner circle. See how the chart changes! Compared to the population as a whole, those who are between 18 and 25 years of age have significantly higher support for Kerry, significantly lower incomes, and significantly lower health care coverage.

Suppose we want to see how young men answered these questions. Double-click on the attribute Gender, and drag male into the ring's interior. We see that support for Bush (and income levels) increase somewhat, while health insurance coverage stays about the same.

Let's add a third variable into the mix. Does political ideology affect who one votes for? Is it related to income levels? Double-click on the attribute Ideology and drag Liberal into the inner ring. Not surprisingly, support for Kerry goes up significantly. Health insurance coverage changes only a little, but — intriguingly — salaries in the middle range also increase significantly. Before jumping to conclusions, however, notice the two numbers in the bottom-left corner of the canvas. This shows the number of survey respondents who match the sub-population specified in the ring's center. 23 out of 1,212 may or may not be significant, depending on what you're looking for. That's why it's important to exercise caution when adding lots of constraints to the population... when the population becomes too specific, the less statistically significant the results become.

You can remove any constraint from the ring's interior simply by dragging it out of the ring. Dragging it back to the side panel hides the icon, while dragging it to the space outside the ring (but still on the canvas) removes it from the current query but keeps the icon visible. In general, the space outside the ring is just a "sandbox" where you can keep recently-used icons, or icons that you plan to use shortly.

Similarly, you can remove any attribute from the doughnut's circumference by dragging its title either into the space outside the ring, or back to the side panel.

User's Manual

This document assumes that you already have some experience performing basic queries in SQiRL, as described in the Quick Start Guide. Here, we discuss some of the other features available in SQiRL.

Resizing the doughnut

To change the size of the doughnut widget, hover the mouse cursor over either the inner circumference or the outer circumference. The cursor will change to a two-sided arrow, meaning you are ready to resize the doughnut. At this point, you can click-and-drag the edge around to make the doughnut as big or as small as you'd like.

Screen Capture

To save a screenshot of the canvas, go to File → Save as JPEG image... and enter the filename you'd like to save the image to.

Loading and Saving State

Documentation coming soon!

Developer's Manual

Introduction

This document assumes that you are interested in using SQiRL with your own data. If you have other types of questions, please let us know and we will get back to you as soon as possible.

The most obvious kinds of data set with which SQiRL works are opinion poll results, census data, or other demographic surveys. With a little creativity, other types of data may also be used; but for the present discussion we will assume an opinion-based survey.

Let's say your data is an exit poll following a local election. Your exit poll asks four questions, as shown below:

Which candidate did you vote for?
What is your party affiliation?
What is your gender?
How old are you?

A real exit poll would likely have many more questions, but we've intentionally constrained this example to simplify the present discussion. Each of these questions could have a variety of possible answers, as shown below.

Which candidate did you vote for? Possible Answers: Bud Abbott, Lou Costello, Other/Write-in, N/R
What is your party affiliation? Possible Answers: Republican, Democrat, Libertarian, Green, Constitution, Other, N/R
What is your gender? Possible Answers: Female, Male, N/R
How old are you? Possible Answers: open

(N/R means no response, meaning the respondent left a particular question blank.)

Suppose that there are 2000 respondents to the exit poll. Let's take a look at the raw data for the 5 of them.

Candidate	Party	Gender	Age
Abbott	Democrat	Female	29
Costello	Republican	Male	33
Costello	Democrat	Male	58
Abbott	Constitution	N/R	20
Other/Write-In	N/R	Female	41

As you can see, the respondents have a mixture of responses, and some of them declined to answer some of the questions. (This is a fact of life; survey data is rarely 100% complete.)

Preparing the Database

Now, in order to analyze this data in SQiRL, we need to put it into a database. Fortunately, the column names for this table immediately suggest a schema we can use. Let's give our database table a simple name, say, Voters, and four columns: 3 of them (Candidate, Party, Gender) are text strings, and 1 (Age) is an integer. Thus, to find out the number of respondents who voted for Costello, we could issue the following SQL command:

SELECT COUNT(*) FROM Voters WHERE Candidate = "Costello"

As a database table, this is not bad. However, there is a glaring inefficiency in our design that can be fixed with a little more work. Since we are storing the candidates' names as text strings in our table, the database must perform a string-comparison on every row in the table — a time-consuming operation. Integer comparison, on the other hand, is relatively fast. Is there a way we can represent the candidates' names as numbers instead?

Let's say assign each possible response for the candidate question a unique number, say:

1 Abbott
2 Costello
3 Other/Write-In
4 N/R

The Candidate column can now be stored as an integer instead of a text string. This uses less storage space in the database, and also permits faster comparisons. Our table now becomes:

Candidate	Party	Gender	Age
1	Democrat	Female	29
2	Republican	Male	33
2	Democrat	Male	58
1	Constitution	N/R	20
3	N/R	Female	41

Our SQL statement to find the number of votes for Costello changes to:

SELECT COUNT(*) FROM Voters WHERE Candidate = 2

Can we do the same thing for the other text-based columns? Absolutely! Here's one possible mapping:

Candidate
   1 Abbott
   2 Costello
   3 Other/Write-In
   4 N/R
Party
   1 Republican
   2 Democrat
   3 Libertarian
   4 Green
   5 Constitution
   6 Other
   7 N/R   
Gender
   1 Female
   2 Male
   3 N/R

Incorporating these changes, our table becomes:

Candidate	Party	Gender	Age
1	2	1	29
2	1	2	33
2	2	2	58
1	5	3	20
3	7	1	41

This is better! We've removed all non-numeric data from the table, ensuring that our queries perform fast.

Making Continuous Values Discrete

Recall that SQiRL is most useful when an attribute can be broken down to a small number of discrete values. Candidate, Party, and Gender are trivially discretized; Age is not. Thus, we must artificially constrain the Age attribute to fit into a small number of values. The most straightforward way of doing this is to divide Age into a series of discrete intervals, and assign each a numeric ID. For example:

Age
   1 under 25
   2 25--35
   3 36--50
   4 51--65
   5 over 65
   6 N/R

This way, a respondent's age can be neatly categorized into one of 6 ways. Granted, this particular division is somewhat arbritary; other valid mappings are quite possible. A risk in doing any discretization is that we will lose some of the original data. To guard against this, we leave the Age column untouched, and simply create a new column, say Age5, and use it in our queries instead of Age. Using the above mapping, we can change our table thus:

Candidate	Party	Gender	Age	Age5
1	2	1	29	2
2	1	2	33	2
2	2	2	58	4
1	5	3	20	1
3	7	1	41	3

This is great. No further changes to our database table are required in order to use it with SQiRL. However, there is may be one change that might be beneficial, depending on the nature of the data. This is described in the next section.

Aggregating Infrequently-used Values

Recall that we defined 7 possible values for the Party column. Let's say that of the 2000 respondents in our survey, only 9 answered Libertarian, 2 answered Green, and 1 answered Constitution. Depending on the size of your data set, outliers such as these may not be statistically significant. In such cases, it might be beneficial to group all infrequently-occurring for a given category responses into the Other value. So as not to lose any of our original data, let's make a new column called Party3 with the following mapping:

Party3
   1 Republican
   2 Democrat
   3 Other
   4 N/R

Candidate	Party	Party3	Gender	Age	Age5
1	2	2	1	29	2
2	1	1	2	33	2
2	2	2	2	58	4
1	3	3	3	20	1
3	7	4	1	41	3

Preparing the metadata

Converting our data from text to integers improves efficiency for the database, but may make the data harder to read for people. Since SQiRL presents the data both in graphical and text format to the users, it requires some additional information beyond what is in the table as a bridge between the data and the user interface. What SQiRL needs, in fact, is metadata; that is, data about the data.

In addition to the database, SQiRL also requires a metadata file (usually called metadata.txt although alternate filenames may be used). The first 6 lines of the metadata file contain information about the database and table. The first line gives the name of the driver (a Java class) required for your specific database. The second line lists the JDBC protocol used to connect to that database. Both of these values may be found in the documentation from your database vendor. Here are the values for the Apache Derby database, which is conveniently included with the JDK:

DRIVER|org.apache.derby.jdbc.EmbeddedDriver
PROTOCOL|jdbc:derby

As you can see, the fields in the metadata file are pipe-delimited. The third and fourth lines of the metadata file give the database name and table name, respectively, where your data is kept. These will vary depending on how you set up your database. Example:

DB_NAME|pollster
TABLE_NAME|voters

As some database tables are password-protected, the next two lines of the metadata file allow you to specify a username and password with which to access the database. For example:

USER_NAME|poll
PASSWD|poll

Metadata: Column Definitions

The next section of the metadata file identifies the columns in the table: their column names and how they should appear in the GUI. For example, the column name Candidate is fine for internal use, but may not be as clear to a user. Users may prefer to interact with a more descriptive title like 2009 Mayor of Springville. For example:

1|2009 Mayor of Springville|Candidate|c
2|Political Party Affiliation|Party3|c
3|Gender|Gender|t
4|Age|Age5|t

In addition to the column name and description, there are two other fields on each line. The first field is a unique ID number for the column (we'll refer to this later). The last field is a single character, c or t. This field determines the position of that attribute on the canvas when SQiRL starts up. A value of c means the attribute should appear on the circumference of the ring. A value of t means the attribute should appear on the left sidebar (or "tray") instead. Any other value will cause the attribute to float in the space outside of the ring.

After the column definitions, the metadata file will have a single line consisting of three asterisks: ***. This marks the end of the column definitions and the beginning of the value labels section.

Metadata: Value Labels

Recall that, for efficiency, we refactored the table's textual or continous data into discrete integer values. We kept a mental note of which numbers corresponded to the "real" data values; in the value labels section of the metadata, we formally communicate this mapping to SQiRL. The purpose of this section is to provide a concise mapping between the numeric data in the database and the user-friendly descriptions shown to the user.

Let's start with an example, and then explain each part.

1|1|Abbott|Voted for Bud Abbott
1|2|Costello|Voted for Lou Costello
1|3|Other|Voted for another candidate
1|4|No Response|Mayor: No Response

The first field is the column ID, which was specified in the column definition section of the metadata file. The second field is the number corresponding to a specific real-world data value, as discussed above. The third field is a short description of the data value. The fourth field is a longer description of the data value. The short form is used in the side panel and the sectors on the circumference. The longer form is shown in the lozange-shaped icons that are dragged into and away from the ring's inner circle.

On occasion, you may want to use the same label for both the short and the long form. In this case, a shorthand syntax is available. Replace the long form with a single X and SQiRL will use the short form everywhere.

That's it for the metadata file. Once you have your database and table set up, and your metadata file written, the next step is telling SQiRL the name of your metadata file. This is done via the -m command-line option. For example, if your metadata was saved in a file named survey_metadata.txt in the current directory, you would invoke SQiRL like this:

java sqirl.Main -m survey_metadata.txt