[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[LONG/PROPOSAL] UUID/GUIDs Within RSS and RDF
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Can I get some feedback on this? I have been trying to stay productive over the
last few days.
This is a proposal to add GUIDs to RSS in a secure an elegant manner.
Feedback appreciated. Wonder if it is too late to change the RSS 2.0 GUID
mechanism since it isn't secure! ;)
Sorry for the x-post but I felt this was relevant to all communities.
Permalink to this entry: http://www.peerfear.org/rss/permalink/1031620231.shtml
A number of people have suggested that RDF and RSS need to support GUIDs for
RDF/RSS triples.
RSS 0.94 has support for a GUID. Bill Kearney (Syndic8 creator/developer) has
mentioned that he needs GUIDs for RSS. Reptile needs GUIDs to refer to RSS
items. In short a lot of RDF/RSS software needs support for GUIDs.
I would like to propose a method to support GUIDs within RDF/RSS that is:
- - Secure. We don't have to trust hostile peer to include the correct GUID.
These GUIDs are based on the SHA1 hash of the base content.
- - No producer obligations. RSS producers are not required to produce GUIDs but
they MAY include them if necessary.
- - Portable. These GUIDs are portable within XML attributes and are supported
within URIs and filenames.
- - Compatible with previous versions of RSS and future (though downlevel) RSS
versions (RSS 2.0).
- - Mutable Statements. Produces immutable base elements within a triple but the
triple as a whole is mutable (if you add additional non-base (optional)
elements).
- - Supports (distributed) reification. Basically just RDF/RSS statements about other RDF/RSS
statements. In practice this would work to add descriptions of RDF produced
which doesn't include descriptions.
* Explanation
Why do we need GUIDs when we have URIs?
Are these the same?
<item>
<title>foo</title>
<link>http://www.foo.com</link>
</item>
<item>
<title>foo</title>
<link>http://www.foo.com</link>
</item>
(answer - yes)
How about these:
<item>
<title>foo</title>
<link>http://www.foo.com</link>
</item>
<item>
<title>bar</title>
<link>http://www.foo.com</link>
</item>
(answer - no... they use the different titles)
The second set is not identical and using the link as an identifier will fail
and result in unusual behavior when used within RSS aggregators and RDF agents.
One could say "hash the whole item"... that may work. One would need to use the
dsig canonicalization method.
The only problem here is that what if the user adds metainfo:
<item>
<title>bar</title>
<link>http://www.foo.com</link>
<dc:description>
Break the hash by adding new data.
</dc:description>
</item>
This would break a hash if it were used as a GUID.
This means we need a GLOBAL unique identifier. This has to be generated from
the RSS source when publishing. The only problem is that humans are terrible at
managing this GUIDs with the required amount of entropy:
"Humans are incapable of securely storing high-quality cryptographic keys, and
they have unacceptable speed and accuracy when performing cryptographic
operations. (They are also large, expensive to maintain, difficult to manage,
and they pollute the environment. It is astonishing that these devices continue
to be manufactured and deployed. But they are sufficiently pervasive that we
must design our protocols around their limitations.)" - Kaufman, Perlman, and
Speciner quoted in Anderson's 'Security Engineering'
I am using UUIDs in my peerfear.org feed [1]:
<item record:uuid="1030570664">
One solution for this problem is to use the SHA1 message digest algorithm (hash)
to compute the GUID.
SHA-1( channel/rdf:about + item/rdf:about + item/title +? item/description )
Of course this would need to happen on every RDF schema/vocabulary that exists
(only a problem for vocabulary developers). The channel/rdf:about element is
added to localize the GUID to a specific channel.
This would allow GUID generation within user agents but wouldn't place the
burden of management on the user.
This would require the base (title, link, description) RDF triples and RSS items
to be immutable. If one changed an RSS item at runtime it is essentially a new
item (as far as aggregators that are using the GUID are concerned)
The only problem with this proposal is that GUIDs would break if the user
changed the channel rdf:about (channel URL). This would only be a situation if
a peer were to get a handle on old RDF/RSS triples/items without having a handle
on the old RDF channel link. In this situation we would need a mechanism for
including the RDF channel link of the old RSS feed. It is recommended that the
channel be duplicated in whole (including channel) when syndicating cached
content.
* Security
Security is very important with this mechanism. UUIDs (128 bit non-colliding
IDs) and automatically generated URIs/URLs MUST NOT be used. This mechanism of
GUID use is not secure and it would be possible for an attacking peer to us the
GUID of another triple and convince a peer to use the GUID of an invalid item.
The symptoms of an attack could range from invalid cached entries, broken
reification, and even potentially harming the reputation of a producer (making a
statement about a GUID that was incorrect (Alice (friend) is stupid).
SHA1 hash based GUIDs avoid this problem by placing the burden of GUID
calculation on RDF/RSS aggregators. Hashes can be validated locally by an RSS
aggregator and easily generated for RSS channels (and RDF) that don't support
GUIDs.
* No Producer Obligations
RDF/RSS producers MAY include the GUID within their RSS feed via the rdf:ID
attribute but are not REQUIRED to do so. Non-RDF vocabularies MUST NOT include
GUIDs unless support is explicitly added.
Note that the inclusion of GUIDs within an RSS feed will only bloat the RSS feed
and should rarely be used in practice. There are a few scenarios where RSS
aggregators may choose to use GUIDs explicitly so that they make it clear what
GUID they are using within their own internal database.
* Portable
The GUIDs generated are portable across RSS implementations, can be used within
*any* XML and can be somewhat easily managed by humans (note that hashes are
long and somewhat bulky)
* Compatible
This method is compatible with all versions of RSS even if they do not support
GUID generation. the only caveat is that if the version of rss in question does
not support modules the GUID MUST NOT be included within the rss. It may only be
computed by RSS aggregators.
* Mutable Statements
Statements can be mutable except for the base elements. For example you can add
Dublin Core metainfo, any RSS 1.0 module, additional XML and namespaces,
additional RDF all without breaking the GUID.
The only requirement is that the base elements not be modified. If they are
this will generate a new GUID and a new triple (as far as a user of the GUID is
concerned)
* Supports (distributed) Reification
Since GUIDs can be calculated for any RDF triple we can support distributed
reification. For example (remember GUIDs are verbose but optional and not
required by producers):
<item rdf:ID="urn:sha1:1b17864eeb6c68294c9b2db0324a2b773401f0da0537d82626c24a7850e15ef2d6c4265dcd5e85f1"
rdf:about="http://www.cnn.com">
<title>CNN</title>
<link>http://www.cnn.com</link>
</item>
<item rdf:about="urn:sha1:1b17864eeb6c68294c9b2db0324a2b773401f0da0537d82626c24a7850e15ef2d6c4265dcd5e85f1">
<dc:description>CNN is a cool website that is 0wned by a big company</dc:description>
</item>
The second triple adds a Dublin Core description to the first entry.
* Representation and Calculation
All GUIDs are calculated with the following method:
- - Canonicalizing the RDF triple (RSS item) via XML Canonicalization [2]. This
is required so that small whitespace doesn't break hash verification. (note that
the RAW data isn't used because we want to include the semantic (qnames)
representation).
- - Base elements (required and commonly use elements) including rdf:about are
concatenated together.
- - The entire triple/item is now signed:
sha1(item)
- - The hash is then represented in base32 so that it is portable as a URI and
within a attribute. If we didn't use base32 it would be binary. Base32 was
chosen over base64 so that the hash can be portable and doesn't include
breaking characters such as / ? etc.
The GUID is then represented as:
urn:sha1:base32(canon(rdf:about + BASE_CONTENT))
Note that all RDF vocabulary and RSS modules are required to specify what is
included as "base" content so that canonicalization can take place. Usually
this is just going to be the required elements and high use optional elements.
For RSS 1.0 this would be title, link, and description, and would support Dublin
core metainfo as mutable additions.
Note that it is possible for aggregators can use their own hashing algorithm but
a HASH URI is required if someone wants to refer to a specific RSS item or RDF
triple in a standardized manner.
The burden is placed upon the GUID producer to use a hashing algorithm that is
supported by the majority of others if GUIDs are included within the feed.
The urn:sha1 hash mechanism MUST be supported by all aggregators that support
GUID.
* Feedback
I would like to officially request feedback on this idea. If RDF/RSS Working
Group members think this is a good idea I would like to push this towards
standardization and to develop much better documentation.
1. http://www.peerfear.org/rss/index.rss
2. http://www.w3.org/TR/2002/REC-xml-exc-c14n-20020718/
- --
Kevin A. Burton ( burton@apache.org, burton@openprivacy.org, burton@peerfear.org )
Location - San Francisco, CA, Cell - 415.595.9965
Jabber - burtonator@jabber.org, Web - http://www.peerfear.org/
GPG fingerprint: 4D20 40A0 C734 307E C7B4 DCAA 0303 3AC5 BD9D 7C4D
IRC - openprojects.net #infoanarchy | #p2p-hackers | #reptile
Evolution has an IQ only slightly greater than 0 which is enough to beat
entropy and create wonderful designs given enough time. -- Ray Kurzweil
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)
Comment: Get my public key at: http://relativity.yi.org/pgpkey.txt
iD8DBQE9fpW5AwM6xb2dfE0RAqDqAKCdKgVTmpLNy7JOrHsY/z48s0VAUgCfQ0PM
Xh35n/yewUp2cIRYQYOeztU=
=/Cci
-----END PGP SIGNATURE-----