CofC Logo

Synthesis Courses

DATA 101: Introduction to Data Science

Introduction to knowledge discovery techniques, emphasizing computer based tools for the analysis of large data sets. Topics include the data science process and inductive data-driven modeling. Students will have hands-on experience with statistical inference and data mining software and complete a project.

DATA 210: Dataset Organization and Management

A course to introduce the structure of databases and the management of datasets for information extraction. Concepts include the relational and entity relationship models, and local and distributed storage and access. The preparation and management of datasets for analysis is covered, and includes data cleaning, reorganization and security.

DATA 495: Capstone

A capstone course for the application of knowledge discovery and data mining tools and techniques to large data repositories or data streams. This project based course provides students with a framework in which students gain both understanding and insight into the application of knowledge discovery tools and principles on data within the student’s cognate area.  A data science capstone needs to include a non-trivial use of a full feature programming language and environment. In addition to this, it needs at least one of the following:

1. A novel improvement on data science foundational algorithms. For example, one could research new kernel based methods for big data based neural networks.

2. A novel improvement on the foundationals of big data organization and management. For example, students can work on extending the map-reduce/Spark computing paradigm.

3. A novel informatics application of state-of-the-art methodology and algorithms. Here the application is novel, and while the techniques used can be existing they must be state-of-the-art.

4. A novel software development project for the field of data science.

All of the above must be demonstrated with primary literature sources.

To support this course, the Data Science Program maintains high performance computing infrastructure configured for use on big data problems and applications. Please contact Dr. Paul Anderson ( for more information and login information.