Putting HTML on the web is a lonely activity.
So what is the simplest way to get feedback on your own and your friends' websites?
The answer I came up with is a PWA that I call local.html.
The app is a feed with a built-in crawler that runs entirely in the browser.
You can explore the project here: Live demo /
Source code.
The user seeds the app with URLs. Those URLs are then crawled recursively looking for
links with rel=friend attributes.
When the app encounters new content at a URL, it saves the content and
displays it in the feed.
All discovered URLs are crawled periodically to detect new content.
Each user’s graph is local and may differ depending on their seed URLs.
Conceptually, this is the inverse of Webmention. Instead of notifying a
target when you link to it, you declare relationships explicitly with
rel=friend, and the graph is discovered by crawling instead of push
notifications.
Crawling the web from the browser is challenging for various reasons.
Same-Origin Policy
Cross-origin crawling is constrained by the Same-Origin Policy.
Targets that set permissive CORS headers can be fetched directly.
If no CORS headers are set, the PWA can be configured to use a proxy.
JavaScript
<script> elements are stripped from the HTML before being
inserted into the feed.
All <iframe> elements get a sandbox attribute.
But there are still many opportunities to insert JavaScript in HTML.
Therefore security is dependent on the PWA being served from an origin that
sets a restrictive Content-Security-Policy header.
CSS
Inline <style> blocks are rewritten to display properly
inside a web component shadow root.
body and :root selectors are rewritten to
:host and viewport relative units are rewritten to container
relative units.
Inline style attributes are left unchanged. External CSS is stripped.
CSS is allowed to load external resources.
Storage
Resources from third party sites must be cached to persist in the feed.
Updates to a mutable remote resource (e.g. a media file) would update all
feed items referencing it, destroying history.
Therefore links to external resources are rewritten, versioned and snapshotted.
This preserves history when upstream media changes.
Even with the best of efforts, storage may be evicted unpredictably on some
platforms, so the feed is not guaranteed to persist.
Persistence
The feed reflects the current state of the discovered graph, not a complete
historical timeline.
If a URL is updated multiple times between crawl intervals, intermediate
states are not delivered. I consider this a feature of the web.
Privacy
The PWA does not aggregate or transmit user social graphs. But user behavior
can still be observed on target sites.
There is no access control mechanism, so all content in the social network
is publicly accessible.
The PWA will attempt to load external content like images, fonts or iframes
from any origin.
Linked sites may set security headers that affect how their content can be
embedded. Since this is not always an enforceable requirement, the user may
choose to configure the PWA to use a CORS proxy.
Any CORS proxy that the PWA is configured to use is a privacy and security
risk because it terminates TLS and rewrites upstream responses.
Future Work
- Direct posting of updates to a URL from the PWA
- Paging/lazy loading feed items from database
- Access-controlled relationship links
- Visualization of the social graph
- Search/filter feed based on extracted structured metadata (e.g. JSONLD)