Article

Impacts of Artificial Intelligence: A Historical Parallel

By now, you've probably heard that a new wave of Artificial Intelligence (AI) is emerging, possessing great power and the potential to change virtually everything. Opinions vary widely, with some predicting a future of either absolute horror or unending utopia. The impact of AI…

Impacts of Artificial Intelligence: A Historical Parallel

By now, you've probably heard that a new wave of Artificial Intelligence (AI) is emerging, possessing great power and the potential to change virtually everything. Opinions vary widely, with some predicting a future of either absolute horror or unending utopia. The impact of AI is often likened to the invention of nuclear technology, which gave rise to both atomic bombs and nuclear power plants. I believe that while AI's impact will likely fall between these extremes, the atomic bomb analogy is insightful for another reason.

Scientists use a method called radiocarbon dating to verify the age of objects containing organic material. They do this by focusing on a very particular type of carbon (carbon-14) which has a certain amount of radiation present. This radiation is released at a known rate and, thus, by measuring how much radiation is present we can date how long the object has been around. The less Carbon-14, the older it is.

However, all of that changed in the early 1940s. As nuclear testing increased and ultimately nuclear bombs were created and deployed, this overall level of radiation went up. We now have to use a different half-life to date anything after that time period from the time before. That's how big of a mark it left.

Similarly, the internet, essentially a network of databases and datasets, was solely human-generated until 2023. That year, ChatGPT emerged, demonstrating that machines could produce content indistinguishable from that created by humans.

AI-generated content is now ubiquitous, reshaping search results and influencing the creation and editing of articles. In the future, we might pinpoint the moment when AI began significantly contributing to our global knowledge base, analogous to the increase in atomic radiation levels.

One intriguing project I've been following is AgentSearchV1. Available on the Huggingface repository, this dataset compiles the latest from reputable sources like Arvix, Google Books, OpenWeb Math, Stack Exchange, Wikipedia, and more. It represents a snapshot of pre-AI-dominated data, curated by humans in mid-2023.

Realizing this, I did what anyone would do. I attempted to download it. All of it. This turned out to be much more complicated than I thought. I wanted to use .git so that I could potentially run diffs on future versions. In total this dataset is just over 2.5TB and this took several failed attempts before getting it correct.

So this 2.5TB human-curated dataset, perhaps the last one containing only human-authored content is one that I want to see preserved. I've reached out to the authors and am exploring putting this together as a torrent file to make it easier/more accessible to others.

Of course, it could turn out that the AI-generated content is actually better than what came before. In any case, I find it interesting and want to see what others will want to do with it. That's what open-source is all about after all.

Update: You can now download (and seed) the Agent-V1 Dataset here:

Archive note: The original download for AgentSearch-V1.torrent is not currently mirrored in this archive.

Continue reading

View all

April 14, 2025

CURSOR REVIEW: Debugging with AI Precision

Cursor rightfully leads the AI-assisted coding space. I use it daily, particularly when debugging active web apps built with frameworks like Next.js. Cursor's strength lies in terminal interactions, drastically reducing debugging time. A notable project was a…

Read more →

April 13, 2025

BEST PRACTICES: Key Understandings for AI-Assisted Coding

This article is part two in my AI coding series ahead of this week's anticipated pre-o4 model announcement. In the first article I looked at the overall AI coding landscape and why it's exciting. I also have specific app reviews for Cursor and Repo Prompt with…

Read more →