{"version":"https://jsonfeed.org/version/1","title":"Aaron Gustafson: Content tagged URLs","description":"The latest 20 posts and links tagged URLs.","home_page_url":"https://www.aaron-gustafson.com","feed_url":"https://www.aaron-gustafson.com/feeds/urls.json","author":{"name":"Aaron Gustafson","url":"https://www.aaron-gustafson.com"},"icon":"https://www.aaron-gustafson.com/i/og-logo.png","favicon":"https://www.aaron-gustafson.com/favicon.png","expired": false,"items":[{"id":"https://www.aaron-gustafson.com/notebook/salvaging-linkrot-with-the-wayback-machine/","title":"✍🏻 Salvaging linkrot with the Wayback Machine","summary":"While making some updates to the site, I did a 404 scan of my link blog and the results were… less than awesome. So I decided to work some Eleventy magic to recover from them.","content_html":"
While making some updates to the site, I did a 404 scan of my link blog and the results were… less than awesome. So I decided to work some Eleventy magic to recover from them.
I make ample use of Eleventy’s global data files, but 404s didn’t feel like something I needed to have as part of the data cascade. Instead, I’m logging them to a YAML file in my ./_cache
folder. For simplicity, they get logged like this:
https://path.to/original/page/that-is-404ing/: true
I chose YAML as it’s about as bare-bones as you can get when it comes to file formats and is pretty easy to work with in the context of Eleventy.
If you’re not familiar, Eleventy allows you to create directory-level data files that can be used to augment file-level data. I was originally using it to define the layout and permalink front matter variables for all the links using the JSON option, but as a JavaScript file, directory-level data becomes even more powerful.
Setting up your data file is relatively straightforward using module.exports
:
module.exports = {
layout: \"layouts/link.html\",
permalink: \"/notebook/{{page.filePathStem}}/\",
eleventyComputed: {
custom_property: (data) => {
return some_value_based_on_data;
}
}
};
Here I’m defining two static values (layout and permalink) and a computed value (the hypothetical custom_property).
As I mentioned, the 404 logging happens separately and results in updates to _cache/404.yml
. To make use of all this in the Eleventy data file, I need to set up a few things at the top of the file:
const fs = require('fs');
const yaml = require('js-yaml');
const cached404s = yaml.load(fs.readFileSync('_cache/404s.yml'));
Here I’m bringing in Node’s File System and JS-YAML. Then I am loading the YAML file into memory as cached404s, leveraging those utilities.
Next up is defining a helper function to search cached404s for a match:
function is404ing(url) {
return ( url in cached404s );
}
This function takes the URL as an argument and returns true
or false
. Making use of this in the eleventyComputed
section is straightforward:
module.exports = {
layout: \"layouts/link.html\",
permalink: \"/notebook/{{page.filePathStem}}/\",
eleventyComputed: {
is_404: (data) => {
return is404ing(data.ref_url);
}
}
};
In my case, ref_url is the front matter field storing the URL I’m linking to from my link blog, so I return the value of passing that to is404ing()
as is_404.
The next thing I want to do is generate a link that has a good chance of working for my readers. Thankfully the Wayback Machine has a predictable URL structure for entries and it’s pretty good about handgun redirects to the most temporally-proximate snapshot when you give it a date to work from. Knowing that, I set up another helper function:
function archived(data) {
let archive_url = 'https://web.archive.org/web/{{DATE}}/{{URL}}';
let month = data.date.getUTCMonth()+1;
month = month < 10 ? \"0\" + month : month;
let day = data.date.getDay();
day = day < 10 ? \"0\" + day : day;
archive_url = archive_url
.replace('{{DATE}}', `${data.date.getUTCFullYear()}${month}${day}`)
.replace('{{URL}}', data.ref_url );
return archive_url;
}
Note: I know this isn’t the most elegant/efficient code, I wanted to show step-by-step what’s happening here.
This function takes the data object as an argument and composes a URL that points to a snapshot of the given page (data.ref_url) at the time I saved the link (data.date). The data.date value is already a JavaScript date, so it’s pretty easy to turn it into the format the Wayback Machine expects (YYYYMMDD). In the end, this method returns a URL that looks something like this:
https://web.archive.org/web/20150102/http://andregarzia.com/posts/en/whatsappdoesntunderstandtheweb/
With that helper in place, I can make use of it within eleventyComputed
:
module.exports = {
layout: \"layouts/link.html\",
permalink: \"/notebook/{{page.filePathStem}}/\",
eleventyComputed: {
is_404: (data) => {
return is404ing(data.ref_url);
},
archived: (data) => {
return is404ing(data.ref_url) ? archived(data) : false;
}
}
};
Now every link in my link blog will have an is_404 value that is true
or false
and an archived value that is either a valid Wayback Machine URL (if the page is 404-ing) or false
.
I use Nunjucks for most of my site’s templating, but you can make use of these computed properties in any supporting templating language. Knowing if a linked URL is 404-ing allows me to
I am only going to share code with you for that final bit as it should give you enough of a sense of how you can use these properties in the other contexts too.
{% if is_404 %}
<p>This link is 404-ing{% if archived %}, but
<a rel=\"bookmark\" href=\"{{ archived }}\">you can view an
archived version on the Wayback Machine</a>{% endif %}.
</p>
{% endif %}
Here you can see I am injecting a footer
into the markup when the entry is 404-ing. Within that footer, I note the link’s status. Then I inject some additional text to point to the Wayback Machine’s archive of the page. It’s worth noting that I am being overly cautious here and only injecting the link if post.data.archived is truthy. This will ensure that the link won’t be shown if something fails in my code or I change how I am implementing the archived property.
Relying on an unverified URL, even one at the Wayback Machine, is risky, but so far this approach seems to be working. If you’ve got a link blog suffering from link rot, you might consider setting up something similar. Hopefully this will help jumpstart that project for you.
","url":"https://www.aaron-gustafson.com/notebook/salvaging-linkrot-with-the-wayback-machine/","tags":["this site","URLs"],"image":"https://www.aaron-gustafson.com/i/posts/2022-08-31/hero.jpg","date_published":"2022-08-31T21:37:31Z"},{"id":"https://www.aaron-gustafson.com/notebook/locking-down-your-github-hosted-domains/","title":"✍🏻 Locking down your GitHub-hosted Domains","summary":"The other day someone claimed a hostname on a domain I own and it took me a while to track down how. Turns out it was via GitHub pages.","content_html":"The other day someone claimed a hostname on a domain I own and it took me a while to track down how. After a lot of digging around, trying to figure out how the hijack was accomplished, it turns out it was via GitHub Pages.
When you set up a custom domain with GitHub pages, you have to point your domain at GitHub’s servers. There are a bunch of ways to do this, but if you use an A record, you need to be careful with your DNS settings. The site in question had a wildcard hostname (*) A record pointed at GitHub’s servers. At the time I’d set it up, that was the recommendation if you wanted all traffic to go to the same place.
Fast forward a few years and it’s become a known exploit of GitHub Pages: when wildcard hostnames are in play, any repo can add a CNAME file to their repository and claim ownership of a hostname belonging to that domain. GitHub even warns you not to do this anymore, but I hadn’t checked the docs in years. In my particular case, it was an archived domain that I don’t really use anymore, but I wouldn’t have been aware of the DNS hijack if the attacker hadn’t taken the step of claiming the domain on Google’s Webmaster Central.
Thankfully the fix was simple: Remove the wildcard A record and point the Apex domain at GitHub’s IP addresses.
If you use GitHub pages to host any of your own domains, I highly recommend auditing their DNS records to ensure this doesn’t happen to you. You can also use domain verification for GitHub Pages and organizations to further protect yourself.
","url":"https://www.aaron-gustafson.com/notebook/locking-down-your-github-hosted-domains/","tags":["hazards","URLs","the web"],"image":"https://www.aaron-gustafson.com/i/posts/2022-08-11/hero.jpg","date_published":"2022-08-11T20:15:48Z"},{"id":"https://www.aaron-gustafson.com/notebook/links/ios-11-safari-google-amp-sharing-url-scheme/","title":"🔗 iOS 11 Safari will turn Google AMP links back into regular ones when sharing","content_html":"https://twitter.com/cramforce/status/900478709215281152
This is great to see! I think link[rel="canonical"]
is not used often enough. I’d love to see all sharing protocols adopt this approach for things like cross-posts, m-dot sites, and more.
I think we can all agree, link rot is a problem. A 2014 study by Harvard Law School determined that roughly 50% of the URLs referenced in U.S. Supreme Court opinions no longer work. That’s terrifying.
When I was mid-way through writing the Second Edition of Adaptive Web Design, I realized that it was pretty likely some of the links I was referencing might disappear over the years. Little did I know, some of them would disappear while I was writing the book!
The Internet Archive’s Wayback Machine is pretty good, but it doesn’t archive everything, and I often find captured pages end up broken—especially if they rely heavily JavaScript, but often images go missing as well. I wanted to make sure that when you pick up the book a year from now or even 10 years from now, the links will still work.
I evaluated a few options for creating a permanent archive of each and every link in the book (there are over 200), but then it dawned on me that Perma.cc might be the perfect answer.
Perma.cc was created by the Harvard Library Innovation Lab in reaction to the paper I mentioned earlier. It is a distributed archive of URLs for scholarly and legal documents, supported not only by Harvard, but over 90 (and counting!) libraries, distributed all over the world. It’s also open source. Each URL is preserved as a live view, an archived view, and a screen capture taken when the link is added. Archived URLs are kept for a minumum of 2 years, but may be “vested” into the permanent archive by a member organization.
I had contributed some CSS to the project a while back, so I reached out to my contacts to see if they might be interested in vesting all of the links for the book. Turns out they were big fans of the First Edition and enthusiastically offered their support.
Converting all of the links took time (and a lot of double- and triple-checking), but the result is that every article, blog post, and web page that I referenced in the book will remain accessible to you in perpetuity. I think that’s pretty awesome. And, as an added bonus, since Perma.cc creates unique URLs that are relatively short, those of you who read it in print won’t have to re-type the often incredibly-lengthy original URLs.
I can’t thank Matt Phillips, Adam Ziegler, Jack Cushman, and everyone else at the Harvard Library Innovation Lab enough for creating Perma.cc and for offering their service to my readers. You all are amazing!
","url":"https://www.aaron-gustafson.com/notebook/avoiding-linkrot-in-print-with-the-help-of-perma-dot-cc/","tags":["the future","the web","URLs","user experience","writing","hazards"],"date_published":"2015-12-02T21:03:35Z"},{"id":"https://www.aaron-gustafson.com/notebook/url-inception/","title":"✍🏻 URL Inception","summary":"The awesomeness of this URL is almost indescribable.","content_html":"It’s like a snake eating its own tail:
Wonder what’s going on? Well, Facebook’s mobile site (m.facebook.com/ClydesOnMain
) is tracking the URL (using the refsrc
GET parameter) I came from (www.facebook.com/ClydesOnMain#!/ClydesOnMain
)—a single page “web app”, hence the hash-bang: #!
—that was tracking the original referral page (www.facebook.com/ClydesOnMain
).
I hope they got all that.
","url":"https://www.aaron-gustafson.com/notebook/url-inception/","tags":["web design","URLs"],"date_published":"2015-05-18T18:38:29Z"}]}