Tech Notes

Postel’s Law Has No Exceptions

Nov 192010

[from Aaron Swartz]

As Mark Pilgrim is fond of saying, “There are no exceptions to Postel’s Law.” (Postel’s Law is generally quoted as “be liberal in what you accept and conservative in what you put out” or something to that effect.) The message of the law is that interoperability is the primary concern, and that programs should accept things, even things that are against the spec, if necessary to achieve interoperability.

HTML, as you may know, is a mess. It’s contorted in a hundred different ways with tons of bugs and their work-arounds encrusted into the Web and browsers are expected to make sense of all of it. The XML people saw this and said “we have to fix this”. Their solution was to break Postel’s Law.

With XML you are supposed to die and never look back if the document you come across violates the spec. The idea was that if everything died on invalid feeds, no one would ever write them. This is wrong for three reasons:
1. Even with the rule, there will be invalid documents. Someone will write some code, test it, see that it works and move on. One day the code will be given data that trips one of XML’s exceptions (AT&T is a common example — XML requires it be written AT&T) and an invalid document will be created.

2. XML apps compete for users. Users want to read these documents, even if they’re broken. Users will switch to apps that read these documents and the rule will be useless, since folks will likely test with those apps. The only way we can keep the rule in effect is by getting everyone who writes an app to act against the wishes of their users, which seems like a bad idea.

3. Essentially the same effect can be achieved by having a validation display (like iCab or Straw’s smiley face that frowns on invalid documents) and an easy-to-use validator.

This is not to say that all apps should have to process invalid documents, or that they should work hard to guess what the author meant, or that we should encourage or tolerate invalid documents. We should try still try to get rid of invalid documents, but taking things out on the users is the wrong way to do it.

The creators of XML were wrong. Postel’s Law has no exceptions.

Power Laws, Weblogs, and Inequality

Tech Notes

Nov 192010

“A persistent theme among people writing about the social aspects of weblogging is to note (and usually lament) the rise of an A-list, a small set of webloggers who account for a majority of the traffic in the weblog world. This complaint follows a common pattern we’ve seen with MUDs, BBSes, and online communities like Echo and the WELL. A new social system starts, and seems delightfully free of the elitism and cliquishness of the existing systems. Then, as the new system grows, problems of scale set in. Not everyone can participate in every conversation. Not everyone gets to be heard. Some core group seems more connected than the rest of us, and so on.

Prior to recent theoretical work on social networks, the usual explanations invoked individual behaviors: some members of the community had sold out, the spirit of the early days was being diluted by the newcomers, et cetera. We now know that these explanations are wrong, or at least beside the point. What matters is this: Diversity plus freedom of choice creates inequality, and the greater the diversity, the more extreme the inequality.

In systems where many people are free to choose between many options, a small subset of the whole will get a disproportionate amount of traffic (or attention, or income), even if no members of the system actively work towards such an outcome. This has nothing to do with moral weakness, selling out, or any other psychological explanation. The very act of choosing, spread widely enough and freely enough, creates a power law distribution.

Power law distributions, the shape that has spawned a number of catch-phrases like the 80/20 Rule and the Winner-Take-All Society, are finally being understood clearly enough to be useful. For much of the last century, investigators have been finding power law distributions in human systems. The economist Vilfredo Pareto observed that wealth follows a ‘predictable imbalance’, with 20% of the population holding 80% of the wealth. The linguist George Zipf observed that word frequency falls in a power law pattern, with a small number of high frequency words (I, of, the), a moderate number of common words (book, cat cup), and a huge number of low frequency words (peripatetic, hypognathous). Jacob Nielsen observed power law distributions in web site page views, and so on. […] The shape of […] several hundred blogs ranked by number of inbound links, is roughly a power law distribution. Of the 433 listed blogs, the top two sites accounted for fully 5% of the inbound links between them. The top dozen (less than 3% of the total) accounted for 20% of the inbound links, and the top 50 blogs (not quite 12%) accounted for 50% of such links.”

Blogs as social software and social networks

Tech Notes

Nov 192010

Dina Mehta says, “My blog is my Social Software and my Social Network“. Many bloggers, including Lilia Efimova, seem to share this feeling. Dina cites the richness of content of a blog compared to a profile pages on a networking cite as the primary reason. She also points to the various back-end tools that allow her to carry on an extended, public conversation with other bloggers. Lilia sees a balance between the two, appreciating the immediacy of information that a profile on a social networking site provides, but also the depth that comes from reading someone’s blog. She summarizes it by saying: “I’m thinking about YASNs and weblogs in terms of contact management (knowing whom and how you can reach) and relation management (knowing why you do it and why they would react). For networking you need both…”

I, too, appreciate the role of both. I do, though, see some limitations to blogging, particularly for the impending mass of mainstream users:

While blogging itself is fairly accessible, many of the back-end tools that allow that connectivity are not. Every new blogger will most surely end up pinging weblogs.com and blo.gs, but how many will learn how to properly use Trackback, properly join and ping the Blog Network, or set up different categories in their blog to ping the proper categories at Topic Exchange? The basics of posting and linking are easy enough, but the stuff that enables blogs to be really useful as social networking tools are still somewhat exclusive to the digerati.
Blog conversations are NOT egalitarian, as has been discussed recently by Joi Ito, Clay Shirky, Marko Ahtisaari, et al. I’m not saying that’s “wrong”, or “bad”, just “inaccessible” for most users, compared to the openness of a discussion forum or mailing list. With blogging, you simply have to work a lot harder to be heard by anybody.
Reading and writing blogs can be enormously time consuming. I appreciate the rich picture I can get of someone from reading their blog, but do I really want to do that with every single person I meet? It’s completely impractical. As we point out in our book, The 5 Keys to Building Business Relationships Online, the number of people in your network and the average strength of your relationships with them are inversely proportional, constrained by the time one is willing to spend building relationships. Furthermore, different people need different balances between numbers and strength of relationship. Someone selling $15 e-books needs more people at a much lower strength of relationship than someone selling $150,000 enterprise software. Blogs are great to have there to provide the depth when wanted/needed, but they’re not as useful as a purely exploratory tool.

Bottom line: I, too, see the role both can play in building a rich and diverse social network. If you’re primarily interested in deep, long-term relationships, then your blog should be your focal point, and the profiles in the various social networking sites merely additional points of presence to invite people into the richer communication of your blog. On the other hand, if your needs are more towards a broad visibility, not just among the blogging technorati, then less frequent blog posting and more time spent in group discussion in the social networks is a better strategy. If your relationship needs are far more focused, i.e., wanting to meet people, or people in specific roles at specific companies, then business-oriented social networking sites, not blogs, are the place to do that.

They can all be part of the mix, but the portion of your time and energy spent in each platform should be aligned with what your objectives are for social networking.

Idea file – vCite (2003)

Tech Notes

Nov 192010

I had talked about this with friends in the past. It is basically the citation format equivalent of a vCard. It should allow simple swapping of citation or citation-like information. It should be built on open standards, perhaps using the Dublin Core as the basic semantics. It might also use something like the Netscape RSS.

vCite must be XML-based. It should be able to hook into a hypertext link (perhaps as a XLINK or in some extension of ANCHOR tag) that allows a vCite block to be “grabbed” by another application. This might be things like:

– an automatic citation scanner used to compile link directories (useful for collaborative filtering)
– bookmark manager
– enhanced subject gateway or hypertext link display (cursor over and see a structured catalog entry for the link).
– exchange of vCites between a range of applications, for example, sending a catalog URL to a customer PDA much like the vCard allows a user to send a business card.

One can imagine going into a store with a PDA and then allowing a user to have selected items automatically forward their descriptions for later review and comparison shopping.

Students would be able to grab vCites from library catalogs as a simple transportable and interoperable set of semantics that they could then use for creating bibliographies.

Promoting this development would allow a user to catalog a link or some record and then share it more easily. Since users and automated tools would be the ones disseminating the vCite, the onus is on the originator to ensure a useful, accurate vCite. This helps encourage stronger linkages and fewer transcription and other errors in sharing citation information.

IDEA: CITATION ANALYSIS and NOTETAKING

Citation analysis seems underutilized for information retrieval to me. Why don’t organizations use citation analysis to figure out what important things were referenced in their particular area of interest and use this information to help determine things like digitization priorities (like making sure the most important and cited articles/texts are available to their users) or to create secondary “survey” products like annual compilations of important stuff.

Now if only we had a mechanism to clearly identify what portions of documents that people stop and reference, then we have something interesting. Too bad electronic highlighting is so crude. It would be useful if we could have highlighting in browsers and have it used for automatic notetaking or summarization. Rather than just saving links (which are terribly unstructured) the objective should be to capture anything and summarize and/or catalog.

Here again is where my idea of vCite would really be helpful.

Citations online. Why have programs like Endnote/Procite on the desktop? Why not make them accessible from anywhere on the web and allow users to have a variety of outputs (HTML, PDF, .doc, RTF) and let them chose the citation style as well.

This would be a useful service and allow notetaking from anywhere.

It would be useful to be able to upload a local file and to download one as well. Import from Procite/Endnote would also be a great feature to have. Work with ISI to develop the product?

One objective is to support scholarly and educational work.
Another is personal subject trees: how to manage links better is the objective here.

Another twist might be to make this an online bookmark manager. Allow users to move their bookmarks to a central repository (saves them in the event of an accidental overwrite) and to make all or portions of their links accessible. If this were matched with vocabulary tools online for creating personal heirarchies
and automatic linkchecking, and even metadata cataloging, then I think there is a nice application environment here.

How to make money? It must be a subscription service purchased either by single users or institutions. Viral market by providing a limited service for users (100 citations/200 links, etc).

Existing citation management tools are lame and have not advanced significantly.

Performance must be great.

It would be useful to allow users to define access for various parts of their personal bookmarks, subject trees,