Code Ocean is a cloud platform designed to facilitate research reproducibility and collaboration. Each project is organized in a compute capsule, where code, data, and the computational environment are encapsulated in one place to enhance reproducibility. Code Ocean uses Docker and Git as underlying infrastructure components. Users without knowledge of the tools can still collaborate with each other by using the essential features via our graphic user interface. For advanced users, the Cloud Workstation Terminal environment is also available with all the Git command-line features.
To create a new capsule, you’d need to create an account first by signing up here:
Then you should see a window prompting for your email. Please use your academic email address so you get our special academic package of 10 compute hours to start with.
Create a new capsule
After confirming your email and logging in, here is how you get started with creating a new capsule:
- Go to the landing page: codeocean.com
- Choose either “Explore” or “Dashboard”
- Click on the blue “New Capsule” button
- After clicking on the button, you have 2 options to create a new capsule. You can either create a blank capsule, or import an existing public GitHub repo. Let’s create a blank capsule!
Configure the environment
- Here you see the user interface of a compute capsule. The folder structure is in the left section, where Metadata, Environment, Code, Data, Results all sit in different folders. The middle section is an editor that allows you to view and edit whatever you select in the left section. The right section shows the results of each execution of the code and version control history, which we will come back to in a second. By default, when a blank capsule is created, it prompts you to configure its computational environment. We prepared a few starter environments for you, and in this tutorial we will use the R environment, by clicking on the R icon next to “By Language”.
- Then select whatever version of R is available (you may see a higher version number).
- Now we are able to add packages to the R environment using package manager that’s available. Let’s add the library “ggplot2” by clicking on “+Add” next to R (CRAN), because that’s what we will use in the sample code later.
- After typing “ggplot2” above the package field, you can click on the “check” sign next to the “version” field. By default, if you don’t enter the version number, it is going to install the latest version of that package.
Add code and data to complete a first run
- Now we have the package added, we will add some sample code. The simplest would be to use the existing template by clicking on “Start with Sample Files” in the left section, just below the “results” folder.
- Now you see that a bunch of files appeared in the folders above. Now we are ready to run our first Reproducible Run! But just pause for a second, you probably noticed that in the right section there are “5 uncommitted changes”, with a default description message about the changes we just made. Code Ocean uses Git to version control all the files in the code, environment, and metadata folders. Once you click on the green “Commit Changes” button, all the changes will be tracked by Git and will be revertible by using the Git command line (more on that later). Just like regular Git, committing or not has no effect on the execution of the script, but it’s always a good idea to commit changes once the scripts are proven to work as expected. Let’s just click on the blue “Reproducible Run” button and see some results!
- Congratulations! Now you just completed the 1st reproducible run in a compute capsule on Code Ocean. You can see that there are more files generated in the right section. Also, there was a window just popped at the bottom of the middle section. Let’s look at them one by one starting with the files in the right section under “Run 9856131”. We can click on all the 4 files and you will see their content in the middle section.
The first file is called “buildLog”. If you are familiar with Docker, you probably realize what it is doing. Code Ocean uses Docker to build each computational environment to guarantee reproducibility. So here the buildLog is the log file of what happens when we tried to build a Docker image for this capsule. You can see that ggplot2 was installed in this step.
Next we can look at the 2 plots, fig1.png and fig2.png. These 2 files are the results from the sample code that we used. At this point you may be curious to see what produced them and what did the “Reproducible Run” do to generate the plots. The answer is in the file called “run” under the “code” folder in the left section. Let’s click on it to see what it is and what it does.
The master script
- The run file is a bash script that is often referred to as the master script. The purpose of the master script is to generate all desired results in an automated way without any human intervention, thus making the results less prone to variations from human input. If you are not familiar with bash scripts, don’t worry. All you need to know from this one is that it calls for an R script at the bottom of this file to do the actual work. Let’s click on “main.R” to see what it is.
- Here the “main.R” file is what generated the plots. It uses the library “ggplot2” and some simulated data to plot 2 graphs, and saved them in the “results” folder. So to recap, the “Reproducible Run” button runs the master script call “run”, which calls another script (it can be anything but in our case it was main.R) to do the work. If you are used to the command line environment, it’ll be just like running the R script from the command line!
- The last file in the folder is called “output”. It’s essentially what’s printed to the console, and you can see the content is the same as what’s popped up at the bottom in the middle.
- Since we just had a successful run, let’s commit the changes by clicking on the green “Commit Changes” button. Then you will see this record:
- For those RStudio lovers, we also have RStudio available in the Cloud Workstation here.
- Clicking on RStudio would first install the package RStudio, which may take a few minutes. Once it’s done, you will be able to do the work interactively in RStudio! Please note that since it’s running on the cloud, it will consume your compute quota as long as it’s on even when you are not doing anything. The “Reproducible Run” however, will only consume the quota when it is running a script.
- Before you quit, please remember to click on “Shut Down Cloud Workstation” so that any edits are synced back to the capsule.
- That’s it! Happy exploring