The PSLC DataShop provides two main services to the learning science community:
- a central repository to secure and store research data
- a set of analysis and reporting tools
Researchers can rapidly access standard reports such as learning curves, as well as browse data using the interactive web application. To support other analyses, the DataShop can export data to a tab-delimited format that can then be used in statistical software and other analysis packages.
Case Studies
Watch a video on how DataShop was used to discover a better knowledge component model of student learning. Read more ...
Guide to the Tutor Message format
Read the “Guide to the Tutor Message format”,
which describes the data format that DataShop accepts.
Read more ...
DataShop News
Wednesday, 17 February 2010
DataShop 4.0 Released
New DataShop Web Services features, plus more
There seemed to be a lot of interest in DataShop Web Services at the DataShop User Meeting in November. At the time of the meeting, we could only demo what was in development. We're now happy to release the services we previewed. We hope these two new features—Get Transactions and Get Student-Step Records—will make Web Services a useful approach for researchers who want to automate data retrieval and analysis.
Get Transactions
https://pslcdatashop.web.cmu.edu/services/datasets/[id]/[?samples/id]/transactions
- Get a tab-delimited response (can be zipped as well) of transactions for a given dataset or sample and your request parameters
- If a sample ID is not provided, transactions for the "All Data" sample will be returned.
Get Student-Step Records
https://pslcdatashop.web.cmu.edu/services/datasets/[id]/samples/[?id]/steps
- Get a tab-delimited response (can be zipped as well) of student-step records for a given dataset or sample and your request parameters.
- If a sample ID is not provided, student-step records for the "All Data" sample will be returned.
Learn more about these new services on the Web Services page.
We've also released the following tweaks and improvements:
- Project announcements. On the home page that lists the datasets in DataShop, you'll see a small box with the title "Announcements" that shows recent news about the project, with links to the full news posts.
- Learning curve point info "Obs" column. When clicking on points in a learning curve, you can now see the frequency of items going into the breakdown by KCs/Problems/Steps/Students. For example, before you could only tell that data for 13 steps contributed to an aggregate point in the learning curve, and you could see error rate values (for example) for each, but you didn't know how much each step contributed to the aggregate. Now, an "Obs" (Observation) column displays the frequency of each item in the aggregate, so you can tell which step is contributing most to that error rate.
- "#" column header renamed to "Row, "Total # Hints" renamed to "Total Num Hints". In all of the export formats, the "#" symbol, which appeared in the column header of the first column to represent the number of the row, is now the text "Row". In the transaction export format, the column header "Total # Hints" is now "Total Num Hints". We made these changes because the "#" character is a comment character in analysis programs such as R, so directly opening a DataShop export file was problematic.
- The DataShop import file verification tool was also changed to expect a column with the title "Row" instead of "#" and "Total Num Hints" instead of "Total # Hints". If you plan on importing data into DataShop, you will need to make these changes to your file(s).
- Study "Condition" in student-step export. You'll now see a "Condition" column in the student-step rollup. This new column appears as the last column in the table. In the case of a student assigned to multiple conditions (factors in a factorial design), condition names are separated by a comma and space. This differs from the transaction format, which optionally has "Condition Name" and "Condition Type" columns.
- Cached export file status. With the DataShop release in April 2009, we started caching transaction export files, resulting in less wait time and faster downloading of these files. Caching, however, is done on a sample-by-sample basis, and it wasn't clear from the DataShop interface which samples were cached or when they were created. We're now displaying a small table on the transaction export page that shows the cache status of each sample and when that cached file was created. This will tell you which samples can be downloaded most quickly and those that will take longer (but will be cached when you request them). The date and time of the cached file tells you the cutoff for data included in the file, useful if you're running a study that's logging to DataShop. To learn more about the various states of a cached export file, visit our help topic on exporting.
Tuesday, 15 December 2009
Book Chapter in Press
Koedinger, K.R., Baker, R.S.J.d., Cunningham, K., Skogsholm, A., Leber, B., Stamper, J. (in press) A Data Repository for the EDM commuity: The PSLC DataShop. To appear in Romero, C., Ventura, S., Pechenizkiy, M., Baker, R.S.J.d. (Eds.) Handbook of Educational Data Mining. Boca Raton, FL: CRC Press.
Pre-prints are available upon request.
Monday, 20 November 2009
DataShop User Meeting
Please join the DataShop Team for our upcoming user meeting, to be held on Monday, 7 December 2009 in Gates Center Room 6115, Carnegie Mellon Campus, Pittsburgh, PA.
Below you will find the finalized schedule of events for the meeting:
| 9:30 - 10:00 am | Breakfast, Meet and Greet | |
| 10:00 - 10:15 am | DataShop Accomplishments to Date (John Stamper) | |
| 10:15 - 11:00 am | Improvements since 3.0 Release (Brett Leber) | |
| 11:00 - 11:15 am | Break | |
| 11:15 - 12 noon | Future of DataShop (John Stamper) | |
| 12:00 - 1:30 pm | Lunch and Poster Session | |
| 1:30 - 2:15 pm | Advanced Methods to Discover Cognitive Models (Geoff Gordon or John Stamper) | |
| 2:15 - 3:00 pm | Detecting Metacognitive States in the Data / Exploring Motivation using Data Mining (Ryan Baker and Ben Shih) | |
| 3:00 - 3:15 pm | Break | |
| 3:15 - 4:00 pm | Analyzing Patterns in Student Errors (Ken Koedinger) | |
| 4:00 pm | Closing Remarks, Survey and Feedback | |
Download a copy of the schedule
Friday, 23 October 2009
DataShop v3.6 Released
Performance improvements, a new report for checking a study's logging activity, and the start of DataShop web services
Today, we are rolling out some fairly big changes to DataShop, all requested by researchers. One is an improvement under the hood that will affect how fast DataShop generates the samples you create or modify (and how fast logged data is made available to the web application in general). Another change is a new report to help you (the researcher or programmer) tell if the tutors in your study or course are logging. We call this new page "Logging Activity" as it gives you an overview of all logging activity on the production log server. We've also taken some big first steps for introducing DataShop web services, which enable you to query DataShop and retrieve data programmatically.
Logging Activity
How do you know if your course or study is logging? You might ask us to verify that DataShop is receiving log data from your study site, but we rarely know how much data is "enough". That approach also requires us to be in the loop, which isn't scalable. A better solution would be to get some diagnostics directly from DataShop. You might try your tutor before students use it, verify the data is being received by DataShop, and then also monitor the data-collection progress of your study or course as it progresses.
We created a new page for this purpose. It shows you recent logging activity at the logging server end--it displays counts of all recent log messages we received, organized by dataset and student session.
As we're not 100% sure how this page will be used or how its use will affect server performance, we're asking that you first click a button to request access to the report. (We've given many of you access already.) Try it out and tell us what you think.
Web Services
The goal of DataShop web services is to provide a way for researchers with a background in programming to enable their program or web site to retrieve DataShop data and (eventually) insert data back to the central repository. We've created the start of such a service--right now, the service allows you to authenticate with DataShop, and retrieve metadata about datasets and samples in DataShop. Coming next will be the ability to retrieve transaction and step-level data.
The service follows the REST guidelines, which means that requests to web services are done over HTTP using URLs that represent resources.
It's a work in progress, and documentation will be available here in the next few days
As with Logging Activity, we're asking that you first request access before using web services. Once we grant you access, you'll be able to retrieve access credentials for making web-service requests.
Wednesday, 8 September 2009
DataShop v3.5 Released: Changes to measures of latency
In this release of DataShop, we've made some significant changes to latency curves and and how DataShop determines latency.
A "latency curve", as we defined it, is a type of learning curve that graphs a duration of time at each opportunity to learn a knowledge component. When we first implemented latency learning curves in June 2008, we introduced two dependent variables of latency, "Assistance Time" and "Correct Step Time", which are essentially the time it took a student to reach a correct attempt on a step, and the time it took a student to reach a correct attempt when no errors preceded that correct attempt, respectively. Based on some researcher feedback, we learned a few things about these measures: 1) the names of the variables were confusing, and 2) we were measuring latency as the time between two events, regardless of what happened in between the two events (such as a student working on other steps).
To address these issues, we started by making our measures of latency more precise. To do this, we began by calculating a duration for every transaction. This enabled us to determine a step's duration by summing the durations of transactions that were toward that step, ignoring the rest. You can see the results of these changes in the latency curves and in the new "duration" column in the transaction export.
After making these changes, we renamed the variables. "Assistance Time" is now "Step Duration" since it's really the time spent on the step regardless of assistance sought. "Correct Step Time" became "Correct Step Duration", and we added another variable, "Error Step Duration". To simplify things, "Correct Step Duration" is now just the step duration when the first attempt was correct; "Error Step Duration" is just the step duration when the first attempt was an error. We propogated these changes through the learning curves and student-step rollup.
We hope these changes are useful for you in exploring data, and save you time when doing latency-based analyses outside of DataShop. We encourage you to explore the new learning curves and changes to the various table formats, and tell us what you think.
- History of changes to the student-step rollup and transaction formats
- See an example of how we now calculate Step Duration and Correct Step Duration
Wednesday, 24 June 2009
DataShop v3.4 Released
Ah, another summer another release. The one you've been waiting for.
- Exporting new samples is faster than ever
- Learning Curve Point Info Details
You can now drill down on the point of a learning curve to find out what problems, steps, students or knowledge components make up that point. Take a look at our 30-second video that demonstrates this new feature:
Exporting by transaction for a brand new, very large sample is no longer slow like we told you before. For example, a sample with 344,530 transactions took 2 hours to export before but now it takes only 7 minutes! That's 17 times faster.
Tuesday, 19 May 2009
Case Studies and EDM paper
On a new Case Studies page, you'll find stories of DataShop use—what were some research goals and how was DataShop used to approach them? The first, presented as a 7-minute video, illustrates the use of DataShop to perform exploratory analysis of DataShop data, generate a theory for optimizing a cognitive model, and test that theory both visually and statistically within DataShop. We hope these are helpful to others who have used DataShop, are considering using DataShop, or want to learn more about the project.
We also created a publications page and posted our Educational Data Mining 2008 paper there. We think it serves as a good introduction to the project and web application.
Friday, 1 May 2009
New DataShop FAQ
It turned out we had an out-of-date FAQ on learnlab.org and a few other sources of similar information, so we've revised and combined them into this new FAQ. Going forward, we'll keep this one updated with answers to real frequently asked questions.
Thursday, 2 April 2009
DataShop v3.3 Released
We have good news for you! There is a new version of DataShop that does two things faster:
- Creating new samples
- Exporting by transaction
You can now create samples faster than before, and the bugs associated with creating big samples on large datasets have been fixed as well.
Also, we are caching the transaction export files before you ask for them to make the download of these files "infinitely" faster than before. For example, the 'Algebra I 2006-2007 (6 schools)' dataset, which has 5.4 million transactions, has been cached.
But we still have work to do: exporting by transaction for a brand new, very large sample is still slow -- for example, a sample with 344,530 transactions takes 16 minutes to create and 2 hours to export by transaction. For samples larger than this, you should opt to wait a day to retrieve it so that DataShop has time to cache it, or contact us if it's urgent. The fix for this problem is coming in our next release in June.
Monday, 16 February 2009
DataShop v3.2 Released
We released DataShop 3.2 this afternoon, which introduces a faster way to cache the Transaction Export files. We are optimistic that we'll finally be able to cache them all now.
We also fixed bugs, but there are still some known issues.