The PSLC DataShop provides two main services to the learning science community:
- a central repository to secure and store research data
- a set of analysis and reporting tools
Researchers can rapidly access standard reports such as learning curves, as well as browse data using the interactive web application. To support other analyses, the DataShop can export data to a tab-delimited format that can then be used in statistical software and other analysis packages.
Case Studies
Watch a video on how DataShop was used to discover a better knowledge component model of student learning. Read more ...
Guide to the Tutor Message format
Read the “Guide to the Tutor Message format”,
which describes the data format that DataShop accepts.
Read more ...
DataShop News
Monday, 16 April 2012
Automated detectors of affect and disengaged behaviors
Ryan Baker and his team have developed automated detectors of affect and disengaged behaviors that can be applied to DataShop data sets from Cognitive Tutor Algebra and Cognitive Tutor Geometry. They are happy to apply these detectors to DataShop data sets for any researcher who is interested. For more information, please contact Ryan Baker.
Friday, 10 February 2012
DataShop downtime
We are experiencing some unexpected hardware issues with one of our servers, which has caused DataShop to be unavailable. We are hoping to have DataShop back online by 3pm EST today. We will post more information here as it becomes available.
Update-1 on Friday, Feb 10, 2012, 3:00pm EST: We are still experiencing hardware issues with one of our servers. The current estimate is to have DataShop back online by 3pm EST on Monday, February 13, 2012. We apologize for any inconvenience this may have caused.
Friday, 20 January 2012
DataShop 5.2 released - DataShop terms of use, project pages, PIs, and bug fixes
Terms of Use
DataShop now has a terms of use, which you will be asked to agree to next time you sign in. The terms say in plain language many of the things that we generally communicated over time—for example, that data is for research purposes.
Project pages and PIs
To make the concept of a "project" clearer, we've created project pages for each project in DataShop, which are linked to from the My/Public/Other Datasets tabs. For now, a project page lists the datasets in that project and the principal investigator's name, but in the future, it could hold much more information (papers and files, for example).
As part of this change, we've also moved the principal investigator (PI) field from the dataset to the project. There is now one PI for each project in DataShop.
Minor changes
KC Models subtab. You will notice a new "KC Models" subtab beneath the main "Learning Curve" tab. This is the same page as the one currently below "Dataset Info"; we've just added a link to it so that you can move between KC Models, Line Graph, and Model Values more easily.
Importing inputs greater than 255 characters. For new data where a value in the "Input" column is greater than 255 characters, DataShop will split the text into multiple "Input" columns, each no greater than 255 characters.
Bug fixes. We've made various bug fixes to the code that determines the Problem View and Problem Start Time columns from raw data.
Friday, 14 October 2011
DataShop 5.1 released - Redesigned KC Models page, cross validation, citations, and an important change to the transaction format
Redesigned KC Models page
We redesigned the KC Models page as a table to make it easier to compare models. You can sort the models by any of the statistics in the table using the combobox at the top of the page. By default, KC models are now sorted by AIC instead of BIC (lowest to highest, or best fit with fewest parameters to worst fit with additional parameters) and then by model name. The sort order chosen also affects the order of models in the KC Models combobox seen in the navigation area of various reports in DataShop.

Cross validation
Cross validation statistics for KC models have been expanded to include two new methods of cross validation: student stratified and item stratified. Cross validation is a technique for assessing how well the results of a statistical model (in this case, AFM for a particular KC model) will generalize to an independent dataset from the same tutor. It's reported as root mean squared error (RMSE). Lower values of RMSE indicate a better fit between the model's predictions and the observed data. More information on these new statistics is available on the Model Values help page.
Tell other researchers how to cite your dataset
We've added two new fields to the Dataset Info page: Acknowledgment for Secondary Analysis and Preferred Citation for Secondary Analysis. Using these fields, you can display a citation and/or acknowledgement that others should use if they publish research based on your dataset. Once filled in, the citation/acknowledgement is shown in two other places: on a new subtab called Citation, and in a text file that is included with each export of the data.
Changes to the transaction format
We've added two columns to the transaction format, Problem View and Problem Start Time. Problem Start Time identifies when a problem is shown to a student. This information was missing from the tab-delimited transaction format even though it was possible to log using the XML version of the format. It's now possible to export a dataset that has problem-start information and re-import it without losing any data. Similarly, you can now import a new dataset in the tab-delimited format that describes when a problem is shown to a student. Problem View serves a complimentary purpose. It counts how many times the current problem has been encountered by the student. The addition of these columns fixes a long-standing limitation of the tab-delimited transaction format.
You will find updated definitions for Problem View, Problem Start Time, and Step Start time on the export help page.
Data changes
As part of the changes to the transaction format, we've made changes to how problem view, problem start time, and step start time are calculated. These changes will modify metrics for datasets that were created via the tab-delimited format (or, in rare cases, if logged in the XML format without problem start information). These datasets will change in the following ways:
- For the first step of the problem, the step start time is no longer indeterminate, so instead of seeing a dot (".") for these steps in the student-step table, you'll see a time value. The same applies for the step's duration.
- Because of how problem start times and problem views are calculated, it's possible for the "attempt at step" count (seen in the transaction table) to be different. This difference means that metrics for KC models such as as AIC and BIC may be slightly different.
Bug fixes
- Web Services: requesting both custom fields and specific columns for transactions now includes custom fields
- Web Services: updated user agreement to include public web applications as a restricted use
- Web Services: we now support requesting the transaction columns "problem view" and "problem start time"
- Web Services: column shifting is no longer present in the student-step export
Thursday, 21 July 2011
Interested in finding better KC models automatically!?
Ken Koedinger and Hui Cheng have developed a method for automatically applying the LFA search algorithm to any dataset with at least a couple existing KC models. LFA returns new KC models (aka cognitive models) that are usually better than any of the existing models.
This version of LFA model search is based on Hao Cen's LFA method (described in his dissertation), but it only requires some existing KC models to start its search for a better model—it does not require a researcher-generated matrix of difficulty factors by steps in the curriculum (a "P-matrix"). The better the existing KC models, however, the better the LFA model search results are likely to be.
If you are interested in finding better KC models, let us know and we will run the LFA model search on your data and attach the resulting KC models to your dataset. We will offer this service on a trial basis. Note that the LFA search can take days or weeks to run, depending on the size and complexity of the dataset.
A distinguishing feature of the LFA method is its semi-automatic model search process. Cen et al. formulated finding a better cognitive model as a combinatorial search problem. Given an existing KC model (a Q-matrix) and a combination of hypothesized factors or KCs from other existing KC models (all these factors are the P-matrix input), LFA search automatically incorporates those factors into models, and finds new models that researchers may wish to investigate further.