How AI Tools Collect and Store Student Data

May 1 / Tiffany Stryck and Haley Boone

On December 28, 2024, a hacker used a single stolen employee password to access PowerSchool, the student information system used by more than 18,000 school organizations across 90 countries. By the time the breach was contained, records belonging to an estimated 62 million students and 9.5 million educators had been exfiltrated. It is the largest breach of children's data in U.S. history.1

PowerSchool is not an AI company. But the breach illustrates the baseline risk every school accepts when student data flows through a third-party platform, and AI tools introduce that risk at a new scale and in new forms. Most school leaders approving AI tools for classroom use have a reasonable understanding of FERPA and general data privacy. Many have not worked through what it means specifically when the tool is powered by a large language model. There is a meaningful difference.

What AI Tools Collect That Other EdTech Did Not

Traditional edtech collected structured data: names, grades, attendance records, demographic information. AI-powered tools collect all of that, and they also collect the content of student interactions: the essays students draft, the questions they ask, the problems they work through, and in some tools, behavioral signals like how long a student pauses before answering or which paths through a lesson they take.

The Future of Privacy Forum, a nonprofit that advises schools on data privacy, identified a critical question that most vendor reviews miss: will the AI tool use student inputs to improve the underlying model?2 Many edtech products embed AI through a third-party API and have terms stating that student data won't be used for model training. Others are less specific. The practical consequence of the distinction is significant: if a student's writing sample is used to train a language model, that information may persist in the model indefinitely, and there is no technical mechanism to fully remove it later.

This is not hypothetical. It is a documented gap in how many schools evaluate new tools, and it is the kind of question that does not appear in a standard app vetting checklist designed before generative AI existed.

The Legal Framework and Where It Falls Short

Two federal laws govern most of what schools are required to do. FERPA, the Family Educational Rights and Privacy Act, protects personally identifiable information in student education records and restricts who can access or disclose it. COPPA, the Children's Online Privacy Protection Act, regulates how commercial platforms collect data from children under 13 and was amended in January 2025 to shift the default from opt-out to opt-in consent: vendors can no longer assume permission for advertising-related data use.3

Both laws are necessary and neither is sufficient. FERPA was written in 1974 and does not address model training, behavioral data, or the distinction between a tool that processes student data and one that ingests it into a system that learns from it. COPPA applies to children under 13 but does not extend to high school students. And beyond these federal frameworks, schools must also track the patchwork of state law: as of 2025, there are more than 130 state student privacy laws across 43 states.4

The honest picture is that the legal framework is genuinely complex, it was not designed for AI, and it is not keeping pace with how quickly AI tools are entering classrooms. That does not make compliance impossible. It does mean that a district relying only on standard EdTech vetting processes, without additional scrutiny specific to AI, is likely leaving questions unanswered. 

What School Leaders Should Be Asking

Before approving any AI tool for student use, there are four questions worth putting directly to the vendor in writing.

First: what student data does the tool collect, and where is it stored? The answer should be specific. Vague references to 'usage data' or 'interaction logs' are not sufficient.

Second: will student data or student-generated content be used to train or improve the AI model? If yes, ask how the data is de-identified, who has access to it, and whether it can be deleted on request.

Third: does the vendor have a signed Data Processing Agreement ready to provide? A DPA is the mechanism by which vendors contractually commit to FERPA-compliant data handling. A vendor who hesitates on this question is a vendor worth reconsidering.

Fourth: what is the data retention policy? Students whose records persist in vendor systems long after they've graduated remain exposed to future breaches and misuse. The PowerSchool victims include students who enrolled years before the December 2024 breach. The question of how long data is held is not a formality.

The Question Worth Discussing Internally

The school districts that managed the PowerSchool breach best were the ones that already knew what data they had stored with that vendor, in what form, and under what contractual terms. That knowledge did not prevent the breach. It determined how quickly they could respond, how accurately they could notify families, and how clearly they could answer the questions parents and students were asking.

The same preparedness question applies to every AI tool a district is considering right now. Before the next tool is approved, it is worth asking: if this vendor's systems were compromised tomorrow, what data would be at risk, and would we know?