Higher Ed in 4k Project

An accessibility analysis of up to 100 web pages from every college and university in the United States.

Look up your institution to see how it fared and request access to a complementary Pope Tech account. Continue reading to learn about the types and patterns of accessibility issues that were detected.


Introduction

In November 2019 Pope Tech ran an accessibility evaluation of every top-level .edu domain in the United States launching the Higher Ed in 4k project. The accessibility analysis is performed using the Pope Tech platform which is powered by the WAVE testing engine. This analysis only looks at automatically detectable errors that can be identified by WAVE. (There are most certainly more accessibility issues on these sites that can only be identified by manual testing.)

The Higher Ed in 4k Project was inspired by the WebAIM Million Project. After the WebAIM Million project launched, one of our takeaways was the opportunity to help make websites more accessible in a significant way. What would happen if we went deeper than the home page, if we focused on one group (Higher Education in the US) and let the institutions rescan their data on demand? What if we gave them free access to automated accessibility testing? The Higher Ed in 4k project is more of a living experiment than a one-time study.

While we will periodically rescan the entire data set, institutions can request access to their full study account to view the full results and rescan on demand. These results are updated each hour on the 4k website. Institutions that weren't included in the initial data set can request and have their data added. The goal is to better measure accessibility in Higher Ed over time and see if we can improve it.

The living sample

The category data and URLs for the initial launch of the 4k project were obtained from the US Department of Education's IPED List containing 7,153 institutions. This list was trimmed of any non .edu domains along with any subdomains. For example if institution.edu was in the sample along with sub.institution.edu only institution.edu was used. This trimmed list of institutions containing tags for state, number of students, private/public etc was then uploaded to Pope Tech and set to crawl up to 100 pages and 4 levels deep including sub domains. Some websites timed out and 29 had aggressive robots.txt files blocking all bots which were respected. The end initial sample was 3,832 higher education institutions, 17,470 websites, and 314,305 pages scanned. This will only grow with time.

We didn't try to influence where the crawler would go, it was just the first 100 pages returned that were linked to from the institutions homepage within 4 links.

The WAVE Engine

The WAVE accessibility engine was used to analyze the rendered home pages (i.e., the DOM of all pages after scripting and styles were applied). The WAVE engine uses heuristics and logic to detect patterns in web page content that align with end user accessibility issues and Web Content Accessibility Guidelines (WCAG) conformance failures. All automated tools, including WAVE, are limited in their detection of accessibility issues—only around 25% of possible conformance failures can be automatically detectable. Absence of detectable errors does not indicate that a site is accessible or compliant. Despite these limitations, the data presented in this project provide a meaningful representation of the state of web accessibility in Higher Education in the US.

Crawling Method

We didn't try to influence where the crawler would go, it was just the first 100 pages returned that were linked to from the institutions homepage within 4 links.

We don't know who visits which pages the most, or which page is most important, or which ones are prioritized – this is both practical and the intent of our chosen methodology to reduce human bias. The method and logic applied to crawl these pages was consistent with all institutions. But all pages in the sample are publicly available, so we can assume real people, including those with disabilities, go to them at least sometimes. The errors detected and thus rankings reflect potential end user barriers on those pages.

What this isn't

This project is not a condemnation of Higher Ed in the US, remember that .edu, .us and .gov had the lowest number of average accessibility errors of all common Top Level Domains (TLDs) in the WebAIM Million project. Interestingly some errors found by the study were created by members of the Pope Tech team when they worked for Universities. We understand the challenges of accessibility in Higher Ed, now lets see if we can improve it.

This also isn't a silver bullet, automated tools can't detect everything or even close to everything. Only a human can determine true accessibility. WAVE is a suite of tools to help in this process and not the end goal.

How did we do?

For the initial analysis 0.078 percent of institutions had no errors or 99.922% had detectable WCAG violations. On a page level 93.331% of pages had detectable WCAG violations. There were a total of 7,464,465 detectable errors found or 23.8 errors per page.

Now remember that the WebAIM Million report had 60 errors per page overall and 36 errors per page on .edu domains. This is encouraging for higher education. It is interesting, but makes sense that as we include less complicated pages beside the home pages we would see fewer errors. This project also includes many more colleges and universities than were included in the WebAIM Million project.

The "average" institution

The average institution in the data set had the following:

  • 82 pages scanned
  • 24 errors per page
  • 27 alerts per page
  • a user would encounter an error on 1 in every 30 page elements

The 5 most common errors

The 5 most common errors made up 91% of all detectable errors. Two of these, Linked Image Missing Alternative Text and Image Missing Alternative Text require the same solution. In other words by fixing just Contrast Errors, Empty Links, Missing Form Labels and Alternative text errors we would fix 6,773,773 accessibility errors.

1. Very Low Contrast

  • What it means?
    • Very low contrast between foreground and background colors.
  • Why it matters?
    • Adequate contrast is necessary for all users, especially users with low vision.
  • 4k results:
    • 64% of all errors were low contrast errors
    • 15.4 contrast errors per page
    • lowest institution had 0 contrast errors
    • highest institution had 1,435 contrast errors per page

2. Empty Link

  • What it means?
    • A link contains no text.
  • Why it matters?
    • If a link contains no text, the function or purpose of the link will not be presented to the user. This can introduce confusion for keyboard and screen reader users.
  • 4k results:
    • 14.6% of all errors were Empty link errors
    • 3.6 empty link errors per page
    • lowest institution had 0 empty links
    • highest institution had 341 empty link errors per page

3. Missing Form Label

  • What it means?
    • A form control does not have a corresponding label.
  • Why it matters?
    • If a form control does not have a properly associated text label, the function or purpose of that form control may not be presented to screen reader users. Form labels also provide visible descriptions and larger clickable targets for form controls.
  • 4k results:
    • 4.3% of all errors were missing form label errors
    • 1.1 missing form labels per page
    • lowest institution had 0 empty links
    • highest institution had 100 missing form labels per page

4. Linked Image Missing Alternative Text

  • What it means?
    • An image without alternative text results in an empty link.
  • Why it matters?
    • Images that are the only content within a link must have descriptive alternative text. If an image is within a link that contains no text and that image does not provide alternative text, a screen reader has no content to present to the user regarding the function of the link.
  • 4k results:
    • 4.2% of all errors
    • 1 per page
    • lowest institution had 0 empty links
    • highest institution had 178 per page

5. Missing Alternative Text

  • What it means?
    • Image alternative text is not present.
  • Why it matters?
    • Each image must have an alt attribute. Without alternative text, the content of an image will not be available to screen reader users or when the image is unavailable.
  • 4k results:
    • 3.6% of all errors
    • .9 per page
    • lowest institution had 0 empty links
    • highest institution had 44 per page

State/territory rankings

With this data we are able to see how each state is doing. Alaska has the least detectable accessibility errors with only 9 per page, followed by Idaho, Montana, Hawaii and Wyoming which all are below 15 errors per page. While these states tend to have lower population levels the overall state population didn't correlate too much with the number of errors. Texas and Pennsylvania were right at the average with 24 errors per page and the bottom 6 states were Vermont, Utah, New Mexico, Florida, South Dakota and Arkansas with over 30 errors per page each. Arkansas had 44 errors per page.

When we initially ran this we only looked at the total errors per page reflected above, as we discovered some larger outliers with thousands of errors at one single institution we changed the ranking to be by the median of each institutions average per state. For example this moved Wyoming up to the top and Main to the bottom.

On our State Rankings page you can see each state ranking, median, errors per page. We also show this for just public institutions. These are updated in real-time as institutions rescan and (hopefully) improve their web accessibility.

Rankings by tags

With this project each institution was tagged with IPED List data and mixed with a few other data sources including the UCEDD institutions to allow us to compare institutions. Below are some of the interesting comparisons we found.

Highest Degree Offered

results by highest degree offered
Type Number of institutions Errors per page
up to Associates degree 1,488 24.8
up to Bachelor's degree 500 25.1
Post Bachelor degrees 1,843 23.3

Private vs. public institutions

results by funding model
Type Number of institutions Errors per page
Public 1,581 20.21
Private not-for-profit 1,518 25.3
Private for-profit 733 30.1

Public institutions have 20 errors per page, and are more accessible than private ones. Private not-for-profit institutions are much more accessible (with 25 errors per page) than private for-profit institutions which have 30 errors per page. It would make sense that public institutions would be better as they have additional laws beyond the Americans with Disability Act including Section 508.

Student enrollment

results by number of students enrolled
Type Number of institutions Errors per page
Enrollment under 1,000 1,485 28.3
Enrollment 1,000 - 4,999 1,311 24.06
Enrollment 5,000 - 9,999 454 19.02
Enrollment 10,000 - 19,999 304 18.98
Enrollment 20,000 and above 208 14.02

There was a direct correlation between student enrollment numbers and accessibility errors per page. The more students, the fewer errors. This could be because the larger institutions have more resources and budget. Institutions with over 20,000 students enrolled had only 14 detectable accessibility errors per page.

Land Grant

results by Land Grant vs non-Land Grant institutions
Type Number of institutions Errors per page
Land Grant institutions 101 15.4
Non-Land Grant institutions 3,731 24.3

Land Grant Universities were much more accessible than non-land grant institutions with only 14 errors per page.

In a system

results comparing institutions that are part of a system vs non-system institutions
Type Number of institutions Errors per page
In a system 1,322 23.1
Not in a system 2,510 24.6

Carnegie classifications

results comparing Carnegie classifications
Type Number of institutions Errors per page
Doctoral/Research Universities--Extensive 143 15.26
Masters Colleges and Universities I 454 19.1
Baccalaureate Colleges--Liberal Arts 205 20.78
Associates Colleges 1,002 21.52
Baccalaureate Colleges--General 275 24.36
Medical schools and medical centers 36 27.84
Schools of law 18 30.66

When comparing Carnegie classifications it is important to understand that not all institutions had a classification in the IPED List, the data only reflects those with a specified classification. Doctoral/Research Universities were the most accessible classification with 15 errors per page. It is interesting that the two least accessible classifications were Medical schools with 28 errors per page and Schools of Law with over 30 errors per page.

UCEDD vs. non

results comparing UCEDD vs non-UCEDD institutions
Type Number of institutions Errors per page
UCEDD 64 13.8
Non-UCEDD 3,768 24.3

UCEDD stands for University Centers for Excellence in Developmental Disabilities Education. The vision of the UCEDD program is, "a nation in which all Americans, including Americans with disabilities, participate fully in their communities." There is at least one in every US state and territory housed inside a host university. The host UCEDD institutions have less than 14 errors per page, which is 10 less errors per page than the average institution.

ARIA

WAVE automatically detects the presence of ARIA and we were able to see this per institution. If you are not sure what Accessibility of Rich Internet Applications (ARIA) is or want to learn more about it, we recommend the WebAIM article on ARIA.

ARIA in the wild

In our analysis we found an institution that had an average of 5,552 ARIA per page or 533,002 ARIA across their 96 page sample. Another Institution had more ARIA than page elements with 4,907 per page. These were two large universities including their admissions pages in the sample.

This is impressive and took some real effort to accomplish, but this is a reminder that ARIA doesn't improve accessibility unless done correctly. Without knowing anything else we can safely assume that having 4,907 ARIA attributes on a University admissions page is not correct. 1,773 of these ARIA attributes are tabindexes with values 0 or less and over 1,000 ARIA popups and ARIA expanded attributes.

These are a conscience decision either at the site developer, template creator or CMS creator level. The good news is with a little education and minimal effort these could be simply removed to significantly improve the accessibility of this website.

Relationships between ARIA and detectable errors

In our analysis we found a slight correlation between increased use of ARIA and detectable errors. When we changed the analysis to take into account page density (the number of elements on a page), this correlation reversed. It would make sense that as a page is more complex there would be more need for ARIA and more potential elements to have errors. It is also important to understand that in the example above with 4,907 ARIA per page there were less than average detectable errors from an automated tool but very many impactful accessibility issues that were not detectable.

As a comparison the WebAIM Million project found that as ARIA increased detectable errors tended to increase as well.

Structure vs overall errors correlation?

We looked for a correlation between if a page used html regions or didn't have a h1 element present if there were more likely to be detectable errors but didn't find any correlation. Even though no correlation was found, a semantic structure by itself is very impactful to screen reader users to navigate a page.

Interesting and random tidbits

In the evaluation there were 6,964 skip links that didn't have a target. Meaning someone went through the effort to add a skip link but then either never tested it or it was broken with a template update.

There were 170 marquee tags still around.

We found 335,183 layout tables and only 53,715 data tables. A data table is classified as a data table if it is a properly structured table with proper heading rows. Realistically we suspect that there are relatively few true layout tables, but many tables with data without heading rows.

We found 655,992 links to PDFs or 2 per page. These may or may not be accessible, but as we know from the 2019 WebAIM screen reader survey, 74% of screen reader users are either Very Likely or Somewhat Likely to encounter significant issues accessing a PDF document.

Conclusion

While there is still significant work to be done to ensure Higher Education websites are accessible to everyone, we are encouraged by how much better Higher Ed results are compared to non-Higher Ed websites. We are hopeful that this project and other endeavors by the Higher Education web accessibility community can help bring more awareness to web accessibility and are optimistic that over time we will see additional improvement.

On this website, any institution can view their rankings in any of the categories of the project as well as request access to their study account to view their full results page by page and rescan their institutions sample to see their improvement. Since we first presented these results at Accessing Higher Ground in November 2019 and began allowing higher education institution access to their 4k accounts, we have been very happy to see many institutions making substantial improvements to their websites within even a few weeks.

As one of the initial findings from this project is that smaller institutions with presumably less budget tend to have more detectable accessibility errors we are also offering a free Pope Tech account of up to 250 pages to any institution with a top-level .edu domain (domain.edu). This could be a starting point for high traffic parts of your website or potentially a complete monitoring for some smaller institutions.

There are countless ways this data can be analyzed and explored. We also see potential of additional studies and further analysis against the ongoing WebAIM Million Project. We are open to feedback on ways to make this more impactful for the Higher Education Community. If you have questions about this project or feedback please contact us.

Look up your institution's results