About / Frequently Asked Questions (FAQ)

Table of Contents

What is DataShop?
The Pittsburgh Science of Learning Center (PSLC) DataShop is the world's preeminent central repository for data on the interactions between students and educational software and a suite of tools to analyze that data. It provides secure data storage as well as an array of exploratory analysis and visualization tools available through a web-based interface.
How do I access DataShop?
DataShop access is free. You can access DataShop by going to: http://pslcdatashop.org. If you have a Carnegie Mellon email address, log in with WebISO. If you don't have a Carnegie Mellon email address and you've never logged in to DataShop, you need to create a free account. No information we collect will be distributed to third parties. For more information on accessing DataShop, see our help topic on the subject.
What are the capabilities of DataShop?
DataShop can store many types of data associated with online courses and learning-science studies. The analysis and visualization tools are particularly well-suited for click-stream data from interactive learning environments such as intelligent tutoring systems and virtual labs. In addition, you can store related publications, files, presentations, or electronic artifacts.
What can DataShop do for me?
DataShop facilitates data representation and collection, and exploratory analysis. Toward collecting data in a uniform format, we have developed a standard XML logging format and two logging libraries (one in Flash ActionScript, the other in Java) to write this XML. Data can also be imported using a similar tab-delimited format. After importing data or logging the data to the DataShop database, the DataShop web application can help you start exploratory data analysis with tools for common learning science analyses. You can also export data for further manipulation and analysis in other tools.

Researchers have utilized DataShop to explore learning issues in a variety of educational domains. These include, but are not limited to, collaborative problem solving in Algebra (Rummel, Spada, Diziol, 2007), self-explanation in Physics (Hausmann & VanLehn , 2007), the effectiveness of worked examples and polite language in a Stoichiometry tutor (McLaren, Lim, Yaron, & Koedinger, 2007) and the optimization of knowledge component learning in Chinese (Pavlik, Presson, & Koedinger , 2007).

I want to do x. What dataset should I use?
Contact us with some information about your goals and we'll do our best to recommend a dataset. If your goal is to just explore DataShop, see the list of recommended datasets at the top of the dataset list after logging in.
What statistical support is available in DataShop?
The statistical support directly available in the DataShop is limited to statistics on learning curves and knowledge component models. However, you can export the data to a file and use your favorite statistical software package.

DataShop integrates the Learning Factors Analysis algorithm (Koedinger, Junker 1999) so that you can view a predicted learning curve. With combinatorial search and a built-in mathematical model measuring student proficiency, knowledge component difficulty, and knowledge component learning rates, the algorithm is able to quantify the student learning process for different knowledge components and predict their performance on each use of the knowledge component. The LFA algorithm has been shown to be able to help identify hidden difficulties that may hinder student learning (Cen, Koedinger, and Junker 2005).

What format is the data in? In what format can I get the data?
DataShop accepts data according to the Tutor Message format. Data can come in as XML or tab-delimited text. Once processed, the data is stored in a relational database. Data can be exported to a tab-delimited text file.
What kind of data gets logged?
Primarily, DataShop stores data on learner interactions with online course and study materials that include intelligent tutors and virtual labs. We have plans of storing more types of data (e.g., audio and video data, writing samples) in the future.

Data is collected from the seven PSLC courses (Algebra, Chemistry, Chinese, English, French, Geometry and Physics) and various studies. There are also sources external to the PSLC that contribute to DataShop, such as middle school math data from the Assistment project at WPI.

How do I get my data into DataShop?
The best method for getting your data into DataShop depends on the state of your project and data. The two broad categories of data are: (1) data that is logged to DataShop as it is generated, and (2) data that is logged to file or local database, and is imported manually by the DataShop team. Along these lines, we've documented some common scenarios for both data logging (1) and data import (2), but in addition to reading those, we recommend you consult with us as every project is different.
Can I use DataShop data for my own research purpose?
You do not need permission to view or use public data sets; they are freely accessible to any researcher in the world. For private data sets, if you are the PI or have permission from the PI, you may examine the data sets and use them in your own research.

To gain access to private data sets, first create an account (see “How do I access DataShop?” above), then contact us with your name, role, and the name of the dataset(s) that you'd like to access. If you're not the PI for the study, we'll confirm with him or her that you're authorized to view the requested dataset and email you when we've given you access.

If you're not sure what data you need, please contact us and we'll do our best to help.

I ran a LearnLab study. Who has access to my data, and how do I control access?
The principal investigator of a LearnLab study has full control over his/her own data. With a new data set, we allow no one but the PI to access the data. We might not know you're the PI, so please tell us! Then, as permitted by the PI, we can add or remove other users. A user can have view access to the data set, or edit access, which allows the changing of metadata in the Dataset Info area and adding or removing of papers and files.
How do I get or create custom queries, analyses, or reports?
If you have a general feature or change in mind, we encourage you to contact us. In the past, a number of reports and modifications to DataShop have started this way. If the analysis is specific to your project and unlikely to benefit others, however, you might be better off exporting the data from DataShop and performing the analysis in another program such as SPSS, R, or Excel. (For instance, many kinds of reports can be generated from Excel if you know how to use features like Pivot Tables and Auto Filter.) The line between these two categories of analyses isn't always clear, so don't hesitate to start a dialogue with us regarding your needs.
What is the time frame between completing a study and getting data in/from DataShop?
The time frame varies depending primarily on the source of the data. Tutors which log directly to the PSLC server are moved into the DataShop’s database daily. For this reason, we encourage you to develop tutors using CTAT, which can log data to the PSLC server for you.

Tutors which produce log data but do not log directly to the PSLC server, such as Andes (Physics LearnLab) or the Carnegie Learning Cognitive Tutors (Algebra and Geometry LearnLabs) must go through a collection and conversion process. The length of this process depends on the availability of the personnel to collect and anonymize the data, as well as the state of the program needed to run the conversion. Also note that conversion of extremely large datasets can add time.

If you need a dataset urgently, please contact us and put "urgent" in the subject of the email.

What restrictions are there on publishing about the data?
As long as proper IRB rules and guidelines have been followed, and you are either the PI or have permission from the PI of the data, then you may publish the data or an analysis of the data. You must acknowledge the source of the data in your publication. You should say something like: Data used in this research was provided by the Pittsburgh Science of Learning Center DataShop which is funded by the National Science Foundation award No. SBE-0354420. We also appreciate citations of one of the scientific papers published by DataShop. Depending on your agreement with the PI for a private dataset, it may also be appropriate to cite a paper mutually agreed upon.
What is the relationship between DataShop, Cognitive Tutor Authoring Tools (CTAT), and the Open Learning Initiative (OLI)?
The three projects—DataShop, CTAT, and OLI—are often in communication with one another and in some cases build on each other's technology. CTAT is a research project at CMU that creates tools for building intelligent tutors. OLI, also a CMU project, researches and builds open and free online courses. In short, any tutor created with CTAT has logging functionality built-in and can create data in the format DataShop accepts, so we often recommend you use CTAT if you're developing a new intelligent tutor or application. CTAT tutors can log directly to DataShop, decreasing the amount of time between when your students use the tutors and when you can view your data in DataShop. (OLI tutors often log to DataShop as well.)

Many products of OLI development are used by DataShop, most notably the logging database and log server, and course delivery platform. This delivery platform has been re-branded as a PSLC resource and now runs on a PSLC server for researchers who want to implement OLI-style courses but with more flexibility than the OLI process allows. Contact us if you might be interested in using it.

How do I set up a PSLC/OLI course?
We don't yet have any external documentation on this process, so you should contact us if you want to learn more about this option.
I'm testing a CTAT tutor that should be logging but I don't see any log data in DataShop. Why not?
Although troubleshooting depends on lots of specifics, here are some general things to check:
  • Is logging turned off? Can you confirm it's explicitly turned on?
  • Is the log server (the location that should be receiving logs from CTAT) set to the same server as the one you're looking at via DataShop? Note that DataShop runs on two separate servers, QA and production. Each of these servers has to run a log conversion process before data will be available through its respective DataShop web application. On QA, this process is every two hours during the work day; on production, the process runs at 3am daily.
  • Have you set a dataset name to go along with the logs? If you haven't, your data will fall into an "Unclassified" bucket, making it hard to find your log data.
  • Are you logging to disk? If so, we need to obtain the log files and import them. We might not know about your project or be aware of your schedule, so please ask us about your data.

For troubleshooting logging from CTAT tutors, see a few pages on the CTAT website: Troubleshooting logging from Flash tutors and Logging from Java. Also, don't hesitate to contact the CTAT team.

Where can I get more help?
DataShop documentation is online at http://pslcdatashop.org/help

You can also subscribe to the DataShop users email list or email the DataShop team.