Introducing shinyData (v0.1.1)

shinyData is an easy to use tool for interactive data analysis, visualization and presentation. It leverages the power of R and its vast collection of packages to allow users to efficiently perform common data tasks, such as slicing and dicing, aggregation, visualization and more (usually referred to as “business intelligence”). Almost no knowledge of R programming is required to use shinyData.

Current stable version: v0.1.1

Installation

To run the web based version of shinyData without installing anything, simply go to https://roose.shinyapps.io/shinyData/. To install the package locally, execute the following R code (you can use the same code to get updates as well):

if(!require(devtools)) install.packages("devtools")
devtools::install_github("trestletech/shinyAce@a2268d545e0ea30eb44c9dca517aec1165b06a51")
devtools::install_github("AnalytixWare/ShinySky@15c29bec4e7c5e694625e571656515a8ace7f376")
devtools::install_github("trestletech/shinyTree@522f8a7e28565bec0f634faf5aa1e75da247de44")
devtools::install_github("ebailey78/shinyBS", ref = "shinyBS3")
devtools::install_github("yindeng/shinyData")

Usage

shinyData::shinyData()

This will open your default browser and run shinyData locally on your computer.

To quickly get a flavor of what shinyData can do, simply open one of the sample projects in the dropdown box on the “Project” page.

Feature Overview

Data extraction and manipulation (Tab “Data”)

shinyData supports loading data from a text file. It can auto-detect the presence of header row and common text delimiters (such as comma or tab), as well as skipping banners at the beginning of the file if any, thus requiring minimal input from the user.

The other option of loading is by writing some R code. This gives the user a tremendous amount of flexibility. For example, by using the RODBC library, one can submit a SQL query to a database using an ODBC connection and get results back as a R data frame, ready to be processed by shinyData. The R code can also include names (when quoted in backticks) of other data sources in the same project, which will be evaluated to the corresponding data.table objects. This makes it easy to pre-process (like adding a derived column) a data source or join two data sources together before creating the desired visualizations. Examples of these customizations will be added to the sample projects over time.

To load data from an Excel file, refer to this blog post for a good comparison of a variety of different approaches available.

We are working on adding more GUI support for loading data.

After data is loaded, user can preview the data, customize the data source name and field names, and specify which fields should be considered measures. The implication of setting a field as measure is that it will allow for numerical aggregation on the field.

Data aggregation (Tab “Visualize”)

It’s often useful to aggregate numerical data (or measures). For example, one might be interested in the average store sales per region. shinyData integrates common data aggregations with visualization, so you can see the results quickly. When you map a visual element (like X, Y, Color, etc) to a field in the selected data source, you have the option to aggregate the field with a function selected from a dropdown list (or type in any R function that takes a vector and returns a single value). The aggregation is done conditional on all the fields that are mapped to in the current layer but not being aggregated, as well as any fields specified in facet columns or rows. Additionally, the data aggregation in each plot layer is independent of each other, so it is possible to have different levels of granularity in the same plot. Another special type of aggregation is sometimes also applied to a layer when “Stat” under Tab “Type” is not “Identity”. This is useful for creating statistical charts like box plots, for which you need to aggregate the data to get the quartiles. This type of statistical aggregation is done after the previously mentioned aggregations are done.

Tip: if you need to create a type of aggregation that’s not supported by either of the techniques mentioned above, you can create an intermediate data source with R code under Tab “Data” (see above), do whatever aggregation there, and use that data source for your visualization instead.

When there is any aggregation done in the base “Plot” layer, the aggregated data table is automatically added to the list of data sources, and it will be kept in sync with the changes made to the “Plot” layer. This is useful when a secondary aggregation is desired.

Visualization (Tab “Visualize”)

The following chart types are supported: Text (for adding data labels), bar chart, line chart, area chart, scatter plot, path plot, polygon plot, box plot, density plot, and smoother (or trend line with confidence bands). Visual elements can either be mapped to a field or set to a fixed value. You can add as many layers to the plot as you want, and they will be plotted on top of each other in the order shown. To make sure a layer is not hidden behind other layers (ie, plotted last), click on “Bring to Top” when that layer is selected.

The appearance of the plot is fully customizable. Customization can be specified at different levels and inherit through a tree-like structure. For example, “axis.title” inherits from “title”, which in turn inherits from “text”, so fonts set for “text” will automatically apply to “title” and “axis.title”, but can be overwritten. More information can be found here.

If you are familiar with the R library ggplot2, you should recognize the semantics right away since the back end of shinyData visualization is exactly ggplot2.

Presentation (Tab “Presentation”)

Thanks to the simplicity and flexibility of R Markdown, user can easily combine the plots to create beautiful reports and presentations. And if you are a R programmer, you can add arbitrary R scripts to include analysis results that are not supported by the shinyData UI. Again see the sample projects for examples of creating presentations.

Project management (Tab “Project”)

By saving the project to a file, user can pick up where he left off. User can also merge two projects together by selecting “Merge with existing work” when loading a project file.

Due to the web-based nature of shinyData, currently you cannot save changes to an existing project file. Instead you need to download the project as a new file. We are working on overcoming this inconvenience when shinyData is run locally, so it essentially behaves more like a desktop application.

A few things I learned about shiny (and reactive programming)

One of the things I really like about shiny is that it has excellent documentation: the tutorial, articles and gallery go a long way in helping newcomers as well as intermediate programmers mastering the structure and features of shiny. Still, there are a few things I found lacking from the documentation but important to understand especially if your shiny app is going to be more than just a few lines of R code. I’m sharing them here hoping they can help others avoid some of the roundabouts I took as I learn my way in shiny.

Reactive dependencies are dynamic

What does this mean? It’s probably best illustrated with a somewhat contrived example. Consider the following two observers:

observe({
  if(input$a=="good"){
    print("good")
  } else {    
    print(input$b)
  }
})

observe({
  a <- input$a
  b <- input$b
  if(a=="good"){
    print("good")
  } else {
    print(b)
  }
})

Are they equivalent since the second is simply a reorganization of the first? The answer is no. The subtle difference between the two is that the second observer will always have dependencies on both input$a and input$b (so will re-run whenever one of them changes), vs the first observer will only depend on input$b when input$a != "good" (so when input$a == "good", it will only print “good” once no matter how many times you change input$b). The reason for this behavior is that the dependencies are built as the code is run (ie, dynamically), not based on how the observer (or any reactives) is initially defined. This may seem like a bug to some people, but it’s actually consistent with how R works in general, and can be a very useful feature once it’s well understood. I personally used this feature extensively in my R package shinyData to help separate the business logic from the UI.

Note: see this discussion post for a user’s question on this topic.

Reactives: order of execution

Here’s my understanding of the order of execution for reactives: when a user changes something in the browser, the browser sends the new input value to the server, which triggers a flush event and invalidates all the input’s dependents. Then all the reactive endpoints (like observers and outputs) are refreshed, which may also trigger a refresh of the reactive conductors (or reactive expressions defined with the reactive function). The part is actually well documented. However, things get trickier when you have calls to update inputs (like updateSelectInput). They will not execute until all the currently invalidated reactives finished executing, then the server sends messages back to the browser for all the update-input calls. If this results in any changes in input values, then another cycle of refreshing continues. So if you are not careful, it’s pretty easy to end up with a infinite loop if your update-input calls actually change some input values (not just the choices of a select input, for example).

After a flush event occurred, you can control which reactive endpoints are refreshed first by setting the “priority” argument (see ?observe). In my opinion this technique should only be used when absolutely necessary since the order of execution for reactives are in general unpredictable. Additionally, if a reactive is going to be invalidated as a result of an update-input call, then it is going to be refreshed after all the currently invalidated reactives finished refreshing no matter how high the priority you set it to be.

Use of isolate to prevent accidental dependencies

When you app gets more complex, it’s really important to prevent unintended dependencies for your reactives. I found the following coding pattern very helpful in showing what the dependents of a reactive are:

observe({
  ## part of code this reactive should take dependency on
  ...
  ...
  isolate({
    ## part of code this reactive should NOT take dependency on
    ...
    ...
  })
})

Conditional panel

The online article on dynamic UI explains that the condition of a conditional panel can use the result of a reactive expression in addition to input values. This provides a tremendous level of flexibility. So you can do things like:

# Partial example
## ui.R
conditionalPanel(condition="output.foo", ...)
## server.R
output$foo <- reactive({
  ...
})

However, what’s missing from the documentation is that in order to make it actually work, you need to add the following line to your server.R:

outputOptions(output, "foo", suspendWhenHidden=FALSE)

This is necessary because you’re unlikely to display the value of output$foo in your UI, and shiny by default suspends all output reactives when they are not displayed.

reactiveValues

reactiveValues is a powerful construct in shiny. Most of the common objects in R (like lists and data frames) have copy-on-modify semantics that behave somewhat in between reference classes and value classes as in languages like C or PHP. Basically, what copy-on-modify means is that when a list is assigned to another variable, that list is not copied immediately, but it is copied when some elements are modified with the new variable. This mechanism works well in most usages of R, but it can get quite frustrating if you try any type of object oriented programming with R, as you would normally expect objects like lists to have reference semantics, ie, a copy is never made unless you explicitly choose to do so. Fortunately, this is the case with reactiveValues, as can be seen in the following simple example:

library(shiny)
values <- reactiveValues(a = 1)
isolate(values$a)
## [1] 1
v1 <- values
v1$a <- 2
isolate(values$a)
## [1] 2

What this means is that you don’t have to worry about creating accidental copies of a reactiveValues object you created so the dependent reactives don’t get updated.

Note members of reactiveValues can be any valid R objects (they can even be other reactiveValues). How does shiny determine if a member is changed if the member is a complex object like a list? Below is a simple shiny app I put together for testing this. As you can see for yourself if you run the app, shiny is pretty smart about detecting the change. It looks “deeply” in the object, so a change in the number of elements or an element’s value will both be detected.

## ui.R
shinyUI(fluidPage(
  sidebarLayout(
    sidebarPanel(
      numericInput('aa','a$aa',value=1),
      numericInput('bbb','a$bb$bbb',value=1),
      actionButton('addList', "Add to list a"),
      br(), br(),
      numericInput('rr','r$rr',value=1),
      actionButton('addToR', "Add to r")
      ),
    mainPanel(
      h3('values'),
      verbatimTextOutput('a')
      )
    )
))

## server.R
library(shiny)
shinyServer(function(input, output) {
  values <- reactiveValues(a = list(aa=1, bb=list(bbb=1)), r=reactiveValues(rr=1))

  output$a <- renderPrint({
    list(a=values$a, r=reactiveValuesToList(values$r))
  })
  observe({
    input$aa
    isolate({  ## use isolate to avoid dependency on values$a
      values$a$aa <- input$aa  ## no need to use <<- because of reference semantics with reactiveValues
    })
  })
  observe({
    input$bbb
    isolate({
      values$a$bb$bbb <- input$bbb
    })
  })
  observe({
    input$rr
    isolate({
      values$r$rr <- input$rr
    })
  })
  observe({
    if(input$addList>0){  ## not run when being initialized
      isolate({
        values$a[[paste0('aa', length(values$a))]] <- 1
      })
    }
  })
  observe({
    if(input$addToR>0){  ## not run when being initialized
      isolate({
        values$r[[paste0('rr', length(reactiveValuesToList(values$r)))]] <- 1
      })
    }
  })
})

That’s probably enough for this post. If others find it useful, I might put together a follow-on post.

Front-end for ggplot2

ggplot2 is a very powerful R package for data visualization, but mastering it is like learning another language in addition to R. Wouldn’t it be nice if there’s a simple GUI to ggplot2? That’s one major motivation behind the R package shinyData.

Leveraging the power of shiny (which is another R package), majority of the functionalities offered by ggplot2 can be captured by a web-based, highly interactive GUI. Below I will walk through a simple example of creating some basic charts with shinyData.

Since shiny is a web-based framework, shinyData can be run locally just as other R packages, but it can also be deployed over the web. Thanks to http://www.shinyapps.io/, shinyData is available to anyone with a web browser by simply clicking here.

Imgur

Once you’re on the homepage of shinyData either by following the link above or by installing shinyData as a package (see the instructions at https://github.com/yindeng/shinyData), a sample project named mtcars.sData should be visible near the top right corner. Click “Open” to load the project, which will take you to Tab “Visualize” displaying a boxplot (together with the raw data dots) of the mtcars data set.

Imgur

The second drop-down box on the left shows there are two overlays (or layers in ggplot2) to this chart. The controls below provide specifications for the selected layer. For example, when the “Overlay” layer is selected, the “Mark Type” (corresponding to geom in ggplot2) shows “Boxplot” and “Stat” also shows “Boxplot”. Just to experiment with the interactivity, if you change the “Mark Type” from “Boxplot” to “Bar”, you should see the chart immediately updated to something not quite useful. This is because “Stat” automatically defaulted to “Identity”, which makes the bars cover all the dots. To make the new chart a little more useful, you can change “Stat” to “Count” (corresponding to bin in ggplot2), so now the bars will represent the count of data points in each category.

Let’s change “Mark Type” back to “Boxplot” before we move on (note how “Stat” also defaulted back to “Boxplot”). Now switch to “Plot” layer. “Mark Type” should show “Point” and “Stat” shows “Identity”, which tells you this is the layer that generated the data dots. If you notice the data dots are being covered by the boxes of the boxplots, you can uncover them by simply clicking on the button “Bring to Top”, which changes the order of the layers and plots the current layer last.

Lastly, the “Mapping” tab lets you specify how data are plotted in each layer. One important feature to point out is, since the “Plot” layer is considered the foundation of the chart, other layers automatically inherit its data mappings if not overwritten explicitly. In this example, you can see “X” and “Y” are not mapped to anything in layer “Overlay” since they are the same as the “Plot” layer.

This should serve as a quick start to using shinyData as a front-end to ggplot2 to easily make charts. To load your own data, go to the “Data” tab. I will cover the specifics of that and more about shinyData in follow-up posts.