Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

Common Crawl Foundation

Enterprise
non-profit
Verified
https://commoncrawl.org
commoncrawl
commoncrawl
Activity Feed

AI & ML interests

Crawled data and metadata

Recent Activity

malteos  updated a Space 1 day ago
commoncrawl/cc-citations
greglindahl  published a Space 1 day ago
commoncrawl/cc-citations
jenglish-cc  updated a dataset 4 days ago
commoncrawl/gneissweb-annotation-testing-v1
View all activity

malteos's profile picture Pedro Ortiz Suarez's profile picture Laurie Burchell's profile picture Luca's profile picture Sebastian Nagel's profile picture d's profile picture Jason Grey's profile picture Hande Celikkanat's profile picture Thom Vaughan's profile picture Paul Lazar's profile picture Greg Lindahl's profile picture Ford H's profile picture Jen English's profile picture Thijs Dalhuijsen's profile picture
Organization Card
Community About org cards

Common Crawl

Welcome to the Common Crawl Foundation's Hugging Face page!

We aim to provide metadata and experimental versions of our latest data products here.

Useful Links

  • Common Crawl's official website
  • Our existing statistics webpages (GitHub repo)
  • AWS infrastructure status page

Datasets

Explore our datasets hosted on Hugging Face:

  • Common Crawl Citations
  • Common Crawl Citations, Annotated
  • Common Crawl Statistics
  • EOT 2024 Host-Level Logs (only available to EOT collaborators)

We look forward to supporting the research and development community with these resources.

spaces 2

Running

cc-citations

📜

Scientific articles using or citing Common Crawl data

1 day ago

models 0

None public yet

datasets 5

commoncrawl/gneissweb-annotation-testing-v1

Viewer • Updated 4 days ago • 12.2B • 995

commoncrawl/statistics

Viewer • Updated 18 days ago • 600k • 446 • 25

commoncrawl/web-graph-testing-v1

Updated 22 days ago • 15

commoncrawl/citations

Viewer • Updated Oct 16 • 9.18k • 227 • 1

commoncrawl/eot2024_hostlevel_logs

Viewer • Updated Oct 9, 2024 • 271k • 10 • 1
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs