An **accommodation** is a change in how a test is presented, in how a test is administered, or in how the test taker is allowed to respond. This term generally refers to changes that do not substantially alter what the test measures. The proper use of accommodations does not substantially change academic level or performance criteria. Appropriate accommodations are made to provide equal opportunity to demonstrate knowledge.

An **African American or Black** person has origins in any of the black racial groups of Africa. Terms such as "Haitian" or "Negro" can be used in addition to "Black or African American."

An **American Indian or Alaska Native** person has origins in any of the original peoples of North and South America (including Central America), and who maintains tribal affiliation or community attachment.

An **Asian** person has origins in any of the original peoples of the Far East, Southeast Asia, or the Indian subcontinent, including, for example, Cambodia, China, India, Japan, Korea, Malaysia, Pakistan, the Philippine Islands, Thailand, and Vietnam.

An **assessment** is any systematic procedure for obtaining information from tests and other sources that can be used to draw inferences about characteristics of people, objects, or programs.

An **award incentive plan** links all or some of the contract deliverables to performance incentive payments beyond the fixed fee of the contract. There are minimum performance-based requirements that must be specified in order for a contract to be considered as an Award Incentive performance-based contract.

The **base weight** is the inverse of the probability of selection.

A **bridge study** continues an existing methodology concurrent with a new methodology for the purpose of defining the relationship between the new and old estimates.

A **Black or African American** person has origins in any of the black racial groups of Africa. Terms such as "Haitian" or "Negro" can be used in addition to "Black or African American."

The **capture/recapture** technique uses two independent frames to estimate the number of units missed on both frames. The first step is to match frames to provide counts of units on one frame, but not the other; as well as a count of units on both frames. With this information and several basic assumptions, it is possible to estimate the number of units missed on both frames. In practice, the two frames may not be completely independent; in which case, a number of assumptions will be necessary to proceed with this type of estimation.

**Classical test theory** postulates that a test score can be decomposed into two parts-a true score and an error component; that the error component is random with a mean of zero and is uncorrelated with true scores; and that observed scores are linearly related to true scores and error components.

**Clustered samples** are those in which a naturally occurring group is first selected, such as a school or a residential block, and then units are sampled within the selected groups.

**Coarsening disclosure limitation techniques** preserve the individual respondent's data by reducing the level of detail used to report some variables. Examples of this technique include: recoding continuous variables into intervals; recoding categorical data into broader intervals; and top or bottom coding the ends of continuous distributions.

**Confidentiality** involves the protection of individually identifiable data from unauthorized disclosures.

**Confidentiality edits** are defined as edits that are applied to microdata for the purpose of protecting data that will be released in tabular form. Confidentiality edits are implemented using perturbation techniques. These techniques are used to alter the responses in the microdata file before tabulations are produced. Thus, all tables are protected in a consistent way. Because the perturbation techniques that are used are designed to preserve the level of detail in the microdata file, confidentiality edits maximize the information that can be provided in tables, without requiring cell suppression or controlled rounding.

A **consistent data series** maintains comparability over time by keeping an item fixed, or by incorporating appropriate adjustment methods in the event an item is changed. To be recognized as a **Consolidated Metropolitan Statistical Area (CMSA)** an area must meet the requirements for recognition as an MSA, have a total population of one million or more, and have: (1) separate component areas that can be identified within the entire area by meeting specified statistical criteria, and (2) local opinion that indicates support for the component areas.

**Coverage** refers to the extent to which all elements on a frame list are members of the population, and to which every element in a population appears on the frame list once and only once.

**Coverage error** refers to the discrepancy between statistics calculated on the frame population and the same statistics calculated on the target population. **Undercoverage** errors occur when target population units are missed during frame construction, and **overcoverage** errors occur when units are duplicated or enumerated in error.

A **crosswalk study** delineates how categories from one classification system are related to categories in a second classification system.

A **cross-sectional** sample survey is based on a representative sample of respondents drawn from a population at one point in time.

**Cross-sectional imputations** are based on data from a single time period.

**Cross-wave imputations** are imputations based on data from multiple time periods. For example, a cross-sectional imputation for a time 2 salary could simply be a donor's time 2 salary. Alternatively, a cross-wave imputation could be the change in a donor's salary from time 1 to time 2 multiplied by the time 1 nonrespondent's salary.

A **cut score** is a specified point on a score scale such that scores at or above that point are interpreted or acted upon differently from scores below that point.

A **Data Analysis System (DAS)** is an analysis software system that generates tabular estimates and correlation coefficients in a framework that allows external users to analyze individually identifiable data without allowing the user direct access to individual data records. Users are denied access to individual data records because the data are not in a directly readable format. Additional safeguards come through the use of population subsampling and differential weighting from the sample design, as well as confidentiality edits. The degree of editing required is a direct function of the capabilities of the DAS. As an example, a DAS that provides weighted totals (i.e., a direct measure of population size) within cells would require more confidentiality editing than one that does not provide weighted cell totals, because there is a greater risk of disclosure in groups with small population size.

**Data swapping** is a perturbation disclosure limitation technique that results in a confidentiality edit. A simplistic example of data swapping would be to assume a data file has two potential individual identifying variables, for example, sex and age. If a sample case needs disclosure protection, it is paired with another sampled case so that each element of the pair has the same age, but different sexes. The data on these two records are then swapped. After the swapping, anyone thinking they have identified either one of the paired cases gets the data of the other case, so they have not made an accurate match and the data have been protected.

**DEFT** is the square root of a design effect.

A **derived score** is a raw score converted by numerical transformation into a new score providing a more meaningful and/or different measure (e.g., conversion of raw scores to percentile ranks, standard scores, or grade equivalence).

The **design effect (DEFF)** is the ratio of the true variance of a statistic (taking the complex sample design into account) to the ----variance of the statistic for a simple random sample with the same number of cases. Design effects differ for different subgroups and different statistics; no single design effect is universally applicable to any given survey or analysis.

**Differential Item Functioning (DIF)** exists when examinees of equal ability differ on an item solely because of their membership in a particular group.

**Disability** is a physical or mental impairment that substantially limits one or more of the major life activities (42 U.S.C. 12102).

**Disclosure risk analysis** is used to determine which records require masking to produce a public-use data file from a restricted-use data file.

**Domain** refers to a defined universe of knowledge, skills, abilities, attitudes, interests, or other human characteristics.

**Dual-frame estimation** uses a dual-frame design to combine two frames in the same survey to offer coverage rates that may exceed those of any single frame. Sometimes the best available list is known to have poor coverage and there are no known supplemental frames to provide sufficient coverage. For example, an area frame could be used as the second frame.

**Editing** is a procedure that uses available information and some assumptions to derive substitute values for inconsistent values in a data file.

**Effect size** refers to the standardized magnitude of the effect or the departure from the nullhypothesis. For example, the effect size may be the amount of change over time, or thedifference between two population means, divided by the appropriate population standarddeviation. Multiple measures of effect size can be used (e.g., standardized differencesbetween means, correlations, and proportions).

The **effective sample size** , as used in the design phase, is the sample size under a simple random sample design that is equivalent to the actual sample under the complex sample design. In the case of complex sample designs, the actual sample size is determined by multiplying the effective sample size by the anticipated design effect.

**Equating** of two tests is established when examinees of every ability level and from every population group can be indifferent about which of two tests they take. Not only should they have the same expected mean score on each test, but they should also have the same errors of measurement.

**Estimation** is the process of using sample data to provide a single best value for a parameter (such as a mean, proportion, correlation, or effect size), or to provide a range of values in the form of a confidence interval.

**Fairness** of a test is attained when construct-irrelevant personal characteristics such as race, ethnicity, sex, or disability have no appreciable effect on test results or their interpretation.

In a **field** test all or some of the survey procedures are tested on a small scale that mirrors the planned full-scale implementation.

A **frame** is a mapping of the universe elements (i.e., sampling units) onto a finite list (e.g., the population of schools on the day of the survey).

The **frame population** is the set of elements that can be enumerated prior to the selection of a survey sample.

A **freshened sample** includes new cases added to a longitudinal sample plus the retained cases from the longitudinal sample used to produce cross-sectional estimates of the population at the time of a subsequent wave of a longitudinal data collection.

The **half-open interval** technique is used to increase coverage. In this technique, new inscopeunits between a unit A on the previous frame up to, but not including, unit B (the nextunit on the previous frame) are associated with unit A. These new units have the same selection probability as unit A's. This process is repeated for every unit on the frame. The new units associated with the actual sample cases are now included in the sample with their respective selection probabilities. For example, in the case of freshening the sample, this technique may be applied to a new list that includes cases that were covered in a previous frame, as well as new in-scope units not included in the previous frame.

A **Hispanic or Latino** person is of Cuban, Mexican, Puerto Rican, Cuban, South or Central American, or other Spanish culture or origin, regardless of race. The term "Spanish origin" can be used in addition to "Hispanic or Latino."

**Hypothesis testing** draws a conclusion about the tenability of a stated value for a parameter. For example, sample data may be used to test whether an estimated value of a parameter (such as the difference between two population means) is sufficiently different from zero that the null hypothesis, designated H0 (no difference in the population means), can be rejected in favor of the alternative hypothesis, H1 (a difference between the two population means).

**Imputation** is a procedure that uses available information and some assumptions to derive substitute values for missing values in a data file.

An **Individualized Education Plan (IEP)** refers to a written statement for each individual with a disability that is developed, reviewed, and revised in accordance with Title 42 U.S.C. Section 1414(d).

**Individually identifiable data** refers specifically to data from any list, record, response form, completed survey, or aggregation about an individual(s) from which information about particular individuals or their schools/education institutions may be revealed by either direct or indirect means.

**Instrument** refers to an evaluative device that includes tests, scales, and inventories to measure a domain using standardized procedures.

**Item nonresponse** occurs when a respondent fails to respond to one or more relevant item(s) on a survey.

**Item Response Theory (IRT** ) postulates that the probability of correct responses to a set of test questions is a function of true proficiency and of one or more parameters specific to each test question.

**Keyvariables** include survey-specific items for which aggregate estimates are commonly published by NCES. They include, but are not restricted to, variables most commonly usedin table row stubs. Key variables also include important analytic composites and otherpolicy-relevant variables that are essential elements of the data collection. They are firstdefined in the initial planning stage of a survey, but may be added to as the survey andresulting analyses develop. For example, the National Assessment of Educational Progress(NAEP) consistently uses gender, race-ethnicity, urbanicity, region, and school type (public/private) as key reporting variables.

A **Latino or Hispanic** person is of Cuban, Mexican, Puerto Rican, Cuban, South or Central American, or other Spanish culture or origin, regardless of race. The term "Spanish origin" can be used in addition to "Hispanic or Latino."

**Linkage** results from placing two or more tests on the same scale, so that scores can be used interchangeably.

A **longitudinal** sample survey follows the experiences and outcomes over time of a representative sample of respondents (i.e. a cohort) who are defined based on a shared experience (e.g. shared birth year or grade in school).

**Metadata** contain information about the microdata.

**Metropolitan Statistical Areas (MSAs)** are those areas that: (1) include a city of at least 50,000 population, or (2) include a Census Bureau-defined urbanized area (of at least 50,000 population) with a total metropolitan population of at least 100,000 (75,000 in New England). In addition to the county(ies) containing the main city or urbanized area, an MSA may include additional counties that have strong economic and social ties to the central county(ies) and meet specified requirements of metropolitan character. The ties are determined chiefly by census data on commuting to work. A metropolitan statistical area may contain more than one city with a population of 50,000 and may cross state lines.

The **minimum substantively significant effect (MSSE)** is the smallest effect, that is, the smallest departure from the null hypothesis, considered to be important for the analysis of key variables. The minimum substantively significant effect is determined during the design phase. For example, the planning document should provide the minimum change in key variables or perhaps, the minimum correlation, r, between two variables that the survey should be able to detect for a specified population domain, or subdomain of analytic interest. The MSSE should be based on a broad knowledge of the field, related theories, and supporting literature.

**Multiplicity estimation** is a technique used to adjust selection probabilities when the unit of interest has multiple chances of being selected. For example, in a random digit dialing household survey, households with multiple phone numbers have a probability of being selected more than once. In this case by identifying the number of distinct telephone numbers in a household, the sampling weights can be adjusted to generate an unbiased household weight.

A **Native Hawaiian or Other Pacific Islander** person has origins in any of the original peoples of Hawaii, Guam, Samoa, or other Pacific Islands.

**New England County Metropolitan Areas (NECMAs)** are county-based alternatives to the city- and town-based metropolitan areas that are used in the rest of the country. The NECMA for an MSA or CMSA includes: (1) the county containing the city named first in that MSA/CMSA title (this county may include the cities named first for other MSAs/CMSAs), and (2) each additional county having at least half its population in the MSA/CMSA(s) whose cities that are listed first are in the county identified in step 1. NECMAs are not defined for individual PMSAs.

**Noncoverage** involves eligible units of the target population that are missing from the frame population; this includes the problems of incomplete frames and missing units.

**Nonresponse bias** occurs when the observed value deviates from the population parameter due to differences between respondents and nonrespondents. Nonresponse bias is likely to occur as a result of not obtaining 100 percent response from the selected cases.

**Nonsampling error** includes measurement errors due to nonresponse, coverage, interviewers, respondents, instruments, processing, and mode.

An **Other Pacific Islander or Native Hawaiian** person has origins in any of the original peoples of Hawaii, Guam, Samoa, or other Pacific Islands.

**Overall unit nonresponse** reflects a combination of unit nonresponse across two or more levels of data collection, where participation at the second stage of data collection is conditional upon participation in the first stage of data collection.

**Overcoverage** errors occur when units are duplicated or enumerated in error.

**Perturbation** disclosure limitation techniques directly alter the individual respondent's data for some variables, but preserve the level of detail in all variables included in the microdata file. Blanking and imputing for randomly selected records; blurring (e.g., combining multiple records through some averaging process into a single record); adding random noise; and data swapping or switching (e.g., switching the sex variable from a predetermined pair of individuals) are all examples of perturbation techniques.

In a **pilot** test a laboratory or a very small-scale test of a questionnaire or procedure is conducted.

A **planning document** includes a justification for a study, a description of the survey design and methodology, an analysis plan, a survey evaluation plan, and a cost estimate.

The **potential magnitude of nonresponse bias** can be estimated by taking the product of the nonresponse rate and the difference in values of a characteristic between respondents and nonrespondents.

The **power** (1-b) of a test is defined as the probability of rejecting the null hypothesis when a specific alternative hypothesis is assumed. For example, with b = 0.20 for a particular alternative hypothesis, the power is 0.80, which means that 80 percent of the time the test statistic will fall in the rejection region if the parameter has the value specified by the alternative hypothesis.

**Precision** of survey results refers to how closely the results from a sample can reproduce the results that would be obtained from a complete count (i.e., census) conducted using the same techniques. The difference between a sample result and the result from a complete census taken under the same conditions is known as the precision of the sample result.

A survey **pretest** involves experimenting with different components of the questionnaire or survey design or operationalization prior to full-scale implementation. This may involve **pilot** testing, that is a laboratory or a very small-scale test of a questionnaire or procedure, or a **field** test in which all or some of the survey procedures are tested on a small scale that mirrors the planned full-scale implementation.

A **point estimate** involves using the value of a particular sample statistic to estimate the value for a parameter of interest.

**Primary Metropolitan Statistical Areas (PMSAs)** are then the component areas of a CMSA. If no PMSAs are recognized, the entire area is designated an MSA.

The **probability of selection** is the probability that an element will be drawn in a sample. Ina simple random selection, this probability is the number drawn in the sample divided by thenumber of elements on the sampling frame.

A **public-use data file** includes a subset of data that have been coded, aggregated, or otherwise altered to mask individually identifiable information, and thus, is available to all external users. Unique identifiers, geographic detail, and other variables that cannot be suitably altered are not included in public-use data files.

**Public-use edits** are based on an assumption that external users have access to both individual respondent records and secondary data sources that include data which could be used to identify respondents. For this reason, the editing process is relatively extensive. When determining an appropriate masking process, the public-use edit takes into account and guards against matches on common variables from all known files that could be matched to the public-use file.

**Raking** is a method of adjusting sample estimates to known marginal totals from an independent source. For a two-dimensional case, the procedure uses the sample weights to proportionally adjust the weights to one set of marginals. Next, these adjusted weights are proportionally adjusted to the second set of marginals. This two-step adjustment process is repeated a number of times until the adjusted sample weights converge simultaneously to both sets of marginals.

A **random-digit dial** sample survey randomly selects respondents based on a sample of phone numbers and information obtained using a screener questionnaire.

The **reference year** is the year about which the data were collected.

The **rejection region** is defined by the alternative hypothesis H1 and the a level. If the test statistic is in this region, the null hypothesis is rejected.

**Reliability** is the degree to which test scores for a group of test takers are consistent over repeated applications of a measurement procedure and hence are inferred to be dependable and repeatable for an individual test taker.

**Replication methods** are approximate variance methods that estimate the variance based on the variability of estimates formed from subsamples of the full sample. The subsamples are generated to properly reflect the variability due to the sample design.

**Required response items** include the minimum set of items required for a case to be considered a respondent.

**Response rates** calculated using base weights measure the proportion of the sample frame that is represented by the responding units in each study.

A **restricted-use data file** includes individually identifiable information that is confidential and protected by law. Restricted-use data files are not required to include variables that have undergone coarsening disclosure risk edits.

**Sampling error** is the error associated with nonobservation, that is, the error that occurs because all members of the frame population are not measured. It is the error associated with the variation in samples drawn from the same frame population. The variance equals the square of the sampling error.

**Scaling** refers to the process of assigning a scale score based on the pattern of responses.

**Scoring/rating** is the process of evaluating the quality of the examinee's responses to individual cognitive questions.

**Section 504** of the Rehabilitation Act of 1973, as amended (Title 29 U.S.C. 794 Section 504), prohibits discrimination on the basis of handicap in federally assisted programs and activities.

**Simple comparison** is a test (such as a t test or a z test), of the difference between two means or proportions.

**Simple Random Sampling** (SRS) uses equal probability sampling with no strata or clusters. Most statistical analysis software assumes SRS and independently distributed errors.

**Stage of data collection** includes any stage or step in the sample identification and data collection process in which data are collected from the identified sample unit. This includes information obtained that is required to proceed to the next stage of sample selection or data collection (e.g., school district permission for schools to participate or schools providing lists of teachers for sample selection of teachers).

**Statistical disclosure limitation techniques** are used to prepare microdata files for release, included are perturbation techniques and coarsening techniques.

A **statistical inference** is a decision about one or more unknown or unobserved population parameter(s) based on estimation and/or hypothesis testing.

**Strata** are created by partitioning the frame; and are generally defined to include relatively homogeneous units within strata.

**Substitutions** are done using matched pairs, in which the alternate member of the pair does not have an independent probability of selection.

A **supplemental area frame** can be created. This is often done by first, generating a frameof geographic units where all the geographic units are represented providing full geographic coverage. Next, a probability sample of the geographic units is selected. An intensivesearch procedure is carried out in each selected area. This generates a supplemental areaframe for each selected area. Assuming no error in the search process, the supplementalarea frame has complete coverage and the cases can be weighted to represent a nationalestimate. The data from both the main list frame and the supplemental area frame are thencombined so that the weighted sample estimates provide complete coverage.

An individual **survey** is driven by one data collection form, such as the Private School Survey or the Academic Library Survey.

A **survey system** is a set of individual surveys that are interrelated components of a data collection, such as the Schools and Staffing Survey or the Integrated Postsecondary Education Data System.

The **survey year** is the year in which the data were collected. Back to top

The **tail** of the sampling distribution of the test statistic contains the rejection region for the hypothesis tested, H0.

The **target population** is the finite set of observable or measurable elements (i.e., sampling units) that will be studied.

**Taylor-series linearization** is an approximate variance method in which an estimate is linearized as a first step. The variance of the linearized estimate is then computed using either an exact or approximate variance formula appropriate for the sample design.

**Total nonresponse** reflects a combination of the overall unit nonresponse and item nonresponse for a specific item.

**Type I error** is made when the tested hypothesis, H0, is falsely rejected when in fact it is assumed true. The probability of making a Type I error is denoted by alpha (a). For example, with an alpha level of 0.05, the analyst will conclude that a difference is present in 5 percent of tests where the null hypothesis is true.

**Type II error** is made when the null hypothesis, H0, is not rejected when in fact a specific alternative hypothesis, H1, is assumed true. The probability of making a Type II error is denoted by beta (b). For example, with a beta level of 0.20, the analyst will conclude that nodifference is present in 20 percent of all cases in which the specific hypothesized alternative, H1, is true.

**Undercoverage** errors occur when target population units are missed during frame construction.

**Un-duplication** involves the process of deleting units that are erroneously in the frame more than once to correct for overcoverage.

**Unit nonresponse** occurs when a respondent fails to respond to all required response items (i.e., fill out or return a data collection instrument).

A **universe** survey involves the collection of data covering all known units in a population(i.e. a census).

**Validity** is the extent to which a test or set of operations measures what it is supposed to measure. Validity refers to the appropriateness of inferences from test scores or other forms of assessment.

**Variance** is the error associated with nonobservation, that is, the error that occurs because all members of the frame population are not measured. It is the error associated with the variation in samples drawn from the same frame population. The variance equals the square root of the sampling error.

A **wave** is a round of data collection in a longitudinal survey (e.g., the base year and each successive follow-up are each waves of data collection).

A **White** person has origins in any of the original peoples of Europe, the Middle East, or North Africa.