Skip to main content

status

  • Operating in maintenance mode.

140.697.79
AI Tools for Data Science and Statistics

Location
Internet
Term
Summer Institute
Department
Biostatistics
Credit(s)
1
Academic Year
2026 - 2027
Instruction Method
Online Synchronous (at least one synch session/week)
Start Date
Monday, June 15, 2026
End Date
Wednesday, June 17, 2026
Class Time(s)
M, Tu, W, 9:00 - 10:50am
Auditors Allowed
Yes, with instructor consent
Available to Undergraduate
Yes
Grading Restriction
Letter Grade or Pass/Fail
Course Instructor(s)
Contact Name
Frequency Schedule
One Year Only
Next Offered
Only offered in 2026
Prerequisite
No prerequisites for this course.
Enrollment Restriction
This course is not restricted.
Description
AI tools can accelerate applied work in data science and statistics, but many analysts use them in ad-hoc ways that are inefficient, unreproducible, or unsafe. This course teaches students how to build structured, repeatable AI-assisted workflows for data pipelines, coding, debugging, documentation, and simulation, with attention to privacy, ethics, context management, and cost.
Introduces practical strategies for effective AI-assisted statistical computing. Emphasizes how large language models (LLMs) and agent-based tools can be integrated into analytic workflows to improve efficiency, reproducibility, and rigor. Teaches design, structured, reliable processes for coding, debugging, documentation, data cleaning, simulation, and pipeline development using AI tools. Includes context management, agent orchestration, LLM selection, and use of Model Context Protocol (MCP) servers. Covers responsible use, including privacy protection, handling of sensitive data, sandboxing, ethics, and cost control. Draw examples from epidemiology, public health, and applied data science. Exercises use R for illustration, but all concepts generalize to any statistical language. Leaves students with a concrete workflow for AI-assisted statistical computing that they can apply immediately.
Learning Objectives
Upon successfully completing this course, students will be able to:
  1. Design and implement structured AI-assisted workflows to support coding, data cleaning, debugging, simulation, and documentation.
  2. Use LLMs and agent-based tools effectively within statistical workflows.
  3. Critically evaluate AI-generated code, outputs, and recommendations for accuracy, appropriateness, and alignment with analytic goals.
  4. Apply principles of responsible AI use, including privacy protection, data safety, and ethical considerations when working with sensitive or regulated data.
  5. Create and manage context files and multi-agent configurations (e.g., MCP servers) to improve model performance.
  6. Identify and mitigate common pitfalls in AI-assisted statistical computing, such as handling model inaccuracy, hallucinatiosn, and output bloat.
  7. Integrate AI tools into epidemiologic and public health data analysis pipelines to improve efficiency, transparency, and reproducibility.
Upon successfully completing this course, students will be able to:
Methods of Assessment
This course is evaluated as follows:
  • 30% Participation
  • 30% Problem sets
  • 40% Final Project
Special Comments

Students will be required to purchase at a one month subscription for an AI agent tool (approximately $20).