Personal branding is not my strongest talent. I blame my absence on the web to the delusion I keep telling myself that web technologies were still in their infancy when I had the age of a whizzkid, and that my dumbness was born out of frustration with the tools that were at hand. They were not what I wanted them to be, or they simply didn’t exist. They still don’t. Thus I became a print designer: the technology has matured over five centuries, and the printed book has been the single most efficient knowledge container ever devised – at least to-date.
I felt comfortable with the visuality of the tools offered by the Black Art. By having a good look at it, I could understand the undisguised mechanics of a printing press. By merely looking at the command line, however, I still get seized with a feeling of vertigo – staring into a black hole, horrifying memories of my childhood and the imminent devastation of a prompting
PageMaker was my tool, then InDesign. I do almost anything with my page layout workhorse. I got fluent with paragraph and object styles, swapping fonts and placeholder color swatches, emulating templates. When I first tried to dive into CSS, I was struck with unbelief to find there was not something like variables or simple math. I was shocked to learn that the W3C committee didn’t even want to consider the idea. A few years later SASS and LESS finally arrived. The W3C is catching up. Meanwhile my website got stuck on a dev-subdomain, dreadfully messy code under the hood and strain with the visual marks of frustration with the ill-conceived box model.
Some more ranting
So, the web matured, but the tools are still lacking. One cannot wait a lifetime for the deus ex machina to descend from the cloud(s). Stephen Wolfram confessed he was a scientist by heart, first and foremost, but that his tools were frustrating his work as a physicist. So, he tells, he “realized that the only way [he] was really going to get them was to build them [himself].” (BTW, it’s a must-read.) Hence he became an entrepreneur and perhaps one of the most brilliant computer scientists ever. Donald Knuth, similar case. And like TeX was a god gift to the sciences (of which we still haven’t seen, after three decades, the full glory, if its power would ever be unleashed amongst the masses), I expect we will be seeing huge things yet to come from Wolfram|Alpha. I sometimes muse how fantastic it would be, to co-design the future, to collaborate on the structures of thought, turning them into tools and products that are usable to common people.
Separation of Concerns
But what concerns me most, is the third layer. The Model. Here we are at loss, there is no satisfying offer. We need a tool for modelling. I don’t mean something like Doctrine. Not quite an ORM. No, I’m imagining something more universal, something that stands alone and is not a mere commodity for one particular workflow or dev stack.
Collecting and List Making
Those are the tools for tooling. But what about the tools for thought? The apps for everyone? The promise of information technology was about the information, wasn’t it, rather than about the tech? I don’t blame readers, when they pay no attention to my microtypographic scrutiny, while consulting a book I designed. In fact, that’s precisely what typographers should be trying to achieve, be invisible. After all, the purpose of a book is the storage and transmission of information, it’s not about the type in which it’s set. My art should serve the facilitation of knowledge exchange. Typographia ars artium omnium conservatrix. Same case with IT; they’re one of a kind. Hence, IT ought to become invisible too. At least the tech part.
Information is fact collecting. I’ve been collecting things all my life: fossils, stamps, books… And all the information relating to those things. Knowledge. About facts, stories, techniques, materials, people, places, our earth and its history. In my childhood, in summer, I browsed through the volumes of my encyclopaedia, read all the articles that seemed interesting. And there were lots of those. Now I have my daily Wikipedia reads. I flatter myself that I may be called an erudite, a polymath, a walking encyclopaedia. I know things. But that’s not a truly unique quality, it’s only because I’m a collector. I collect information.
Man belongs to a collecting species. People are collecting data all the time. They live from it. Facts about what’s edible and what’s not, about the environment, the forecasts for the crops, about threats, facts that may keep us ahead of the competition. It’s intelligence gathering. We love to gather stuff, tricks, how-to’s, usable things, purposeful data. We collect them and list them up. We’re list makers. But mere lists are blunt and incomprehensible. We want to do things with the individual nodes and bits. Arrange them, rearrange them, divide them into classes, subclasses and divisions, compare, collate and draw conclusions from them. So we make other lists, and lists of lists, and eventually we want to share those lists, share experience, spread knowledge, let our fellow creatures know which collections we have and that we’d happily lend our collectibles, and our knowledge about them. All to better our lives.
But at some point there are so many lists, that we must collect them as well. We develop techniques for good practice and we make sample lists, dummies, examples that show which things need to be on a list of a specific kind, templates in fact, and forms. We find ourselves not only collecting the raw data, but we also start thinking about the relations between the entries on a list and inbetween the lists. We look for structures. We are visualising knowledge. To draw insights from it. I suppose that’s the reason why I got obsessed with typography and information design. Shaping thought.
Then the lists grow out of proportion. They’re not simple words anymore, line by line, but the lines get stuffed with words, they get indented; important words stand apart, in italics or bolded, or set in small caps. Look at printed dictionaries. To make things worse, within entries, we add references to other lists, hyperlinks so to speak, and symbolise them with pointing indexes and arrows. The graphics get cluttered. So we start looking for better visualisations, that give us immediate insight. Charts and diagrams. But infographics are a lot of work, so we must be sure the list is complete beforehand. Alas, that’s impossible, as our lists are by definition incessantly incomplete. There’s the problem. We keep collecting data, while we need the insights from it immediately, at all times, updated without any latency.
A few years ago I got enthusiastic for the Open Access movement. I started working on ideas for a digital Republic of Letters. The goal: crack down on the monopoly of academic publishers and their self-righteous impact factor manipulating journal titles. I wanted to contribute to the noble cause of giving back to authors and to the academic community the control over their content. You did know publishers are charging university libraries big money to access the content their professors wrote. To add insult to injury, the professors too are charged to have their papers published in A-rated “referral” journals. That’s charging twice for a product one doesn’t even produce! Well, it’s a proven business model and it’s good business. Big business. The problem with activists, I guess, is that they tend to use the same worn tactics over again. They count on good will, all while exclaiming with indignation. But you can’t win with that, you must beat the monopolists at their own game, calmly and with determination.
Social Networking for Scholars
So, I considered, if academic publishers are in fact nothing but box movers, what then is their added value? They own a forum, they monitor the moves on it, and sell the metrics. Then what if we could establish an alternative forum and do a better job with the metrics? That was possible! We could use the weaponry of our generation. The web, social networking in cyberspace. I was thinking of an advanced academic blogging platform, where authors would come together, collaborate on ideas and research, with state-of-the-art authoring tools. The tools had to be awesome indeed, real killer apps, socially integrated office software. Although the eye candy got a bit outdated, I’m still proud of my UI designs for the penultimate word processor. But what really got me into an obsession with document markup and semantic structures, was my conviction that my version of Word had to be a true xml editor, but without the developer-targeted looks and features of things like oXygen.
Anyway, on this social network scholars, scientists and researchers would read, write and edit their papers, set up open access journals, and publish their discoveries and findings in real time. They would be encouraged to rate each other’s work, set variable parameters for automated rankings, and so on. Facebook-like peer review. I gathered all sorts of info on university rankings, studied bibliometrics. But how could we cure academics from their Stockholm syndrome, liberate them from their bad, paper-based academic publishing habits, and eventually lure them away from the sharks that feast on them? Then I reckoned the long established paper based worldwide social network of scholars and academia could be easily ported to the web by taking advantage of bibliographic metadata. I started researching the market and technology for library management tools and software. I concocted up a strategy for harvesting the metadata, connecting the dots, letting the graph emerge. But since I’m a designer too and foremost, I drew a few dozens of mockups that packaged my ideas, visualised the required features, the workflow and the widgets to generate revenue. All while breaking my head over a viable business model for such a service, carefully balancing solid commerce with the philanthropic mission. Meanwhile I kept an eye on the competition. The few me-too ventures of corporate academic publishers, were not a serious threat. Then Academia.edu popped up. I followed their early days, witnessed how they got on TechCrunch’s radar and then the reports on their first rounds of funding. Now I think they somewhat missed their momentum, are growing too slowly. Judging from the features they have implemented so far, I still don’t see the long-term vision. I think one could have done a better job. My business plans and .sqs database schema’s are gathering dust on my hard drive.
As I was researching bibliographic management systems, OA repositories and harvesters, I realised bibliography had to be a core component of an academic authoring toolkit. And it had the potential of becoming a product in it’s own right, requiring it’s own dedicated platform and business strategy. In fact, I had always wanted to have such a solid application myself, for cataloguing my private book collection.
Needless to say I didn’t ever begin to do all that tedious work in something like Excel. The readily available book software was not good enough either, the forms were shortsighted, essential fields were lacking. Moreover, if I would enter a record and do all that typing, wouldn’t it be obvious, that I could share the data right away? And, vice versa, why not import a data dump from other book collectors, or syndicate to a bibliographic metadata source? Yep, social cataloguing. Then LibraryThing popped up, and a few competitors. But I needed something better, something not only targeted at book amateurs, but something that could be used by scholars and book historians, too. Something like EndNote, the Flickr-way. Something like WorldCat, the IMDb-way. Again some mockups, again about a dozen of database schema’s, somewhere in the purgatory of my hard drive.
Metadata the Right Way. The Case of Bibliography
I still think there is still room for a high-end bibliographic social cataloguing platform. Here you have the low-end apps and social websites for common book readers; mass-market publishers and resellers are hitchhiking. Then there are the pricey library management systems, offering more advanced features with way more scientifically sound data structures, but they’re not social at all. Antiquarians and analytic bibliographers are however cataloguing their books yet somewhat different. Authors just need a more convenient way to manage their sources. These apparently very different markets could be integrated. After all, they are all still just recording titles with some metadata. But it’s exactly that, that they all need other metadata, and they are all doing it a bit different. There is no solution for that. Yet. What’s out there fails short in it’s data model. There you have it: the Model again. Structuring data structures.
MARK21, Dublin Core: it seems as if the committees that wrote the specs never consulted a book historian, or actually a book maker or a printer, for that matter. Google Books does an even worse job. Go have a look: two volumes of an encyclopaedia are considered to be two different publications, while two editions of a work are presented as two copies of the same publication. You get volume I in the fifth edition, the second volume is missing, and vol. III is that of the editio princeps. All very messy, bad user experience. I know something about bibliography, I fancied, I could build a better database. I still do believe that, but I too know that there are people out there who now much better than I ever will. It’s an intrinsic fallacy to believe one would ever be able to create the ultimate and final data structure for something, sole-handedly. We would be better off if the data structures of our websites and databases, were to become an organic continuum, alive, created and maintained by a never-ending, collaborative process.
Freebase and FuidInfo
I have been following Freebase from its first release. The brilliant architecture of its tuple store simply struck me with awe. Dynamic, user-maintained ontologies! Then Google started slowly smothering Freebase. They bought the startup, and now they have it compete in their private arena with FusionTables. Google’s release-early-fail-often strategy might be apt for the Molog they are, but it delivers poor products and bad user experience. It really is trial and error. Scoop some ad revenues on the go, spill some more cash, and make a statistical bet that at least one of those random dropkicks will become a cash cow that makes up for the losses of the others. Due to the sheer scale of their user base, Google gets away with it and it seems time-to-market optimisation prevails over careful deliberation and maturity. I loved the appeal of Google Wave… Well, collateral damage. Meanwhile, Freebase may soon be added to the cemetery of Google+’s ghost town. It clears the stage for something else.
Something very interesting, something coming from a Tim O’Reilly funded one-man startup. The ideas behind FluidInfo are very similar to what I am dreaming of, but… – I try to be mindful here, for I have deep respect for what they’re building. But, I might point out some of its conceptual flaws, and the vision behind its business model. I think it might have something to do with the programming background of its founder. I’m quite sensitive for design, attention to detail, pixel perfect type. User interfaces are my benchmark: I go for the good looks. I believe the way a workflow is designed, says a lot about the way the data is structured underneath. The layout of a form reveals the clutter of the model.
Nevertheless, imagine something with an engine like FluidDB under the polished hood of a Wunderkit-like coachwork…
Open Standards, Ontologies, APIs, and… the User
I guess today’s architects of tomorrow’s web are still too much focussed on the technical bits and bytes of standards description. They seem to love writing white papers. Developers are a more pragmatic flock, but they follow suit, putting most of their efforts into APIs and protocols. They too are surely doing some very good thinking on ontologies, semantic vocabularies and schema design. But ordinary users don’t read JSON, the browser is their HTTP-client, they do not know ontologies, and neither do they have to know how the hundreds of apps on their devices access, store and retrieve the data. Still, users need the data, always and everywhere, and not only do they consume it ubiquitously, ever faster, but in an even more rapidly growing fashion, they are creating the data themselves. Obviously, almost unconsciously, through the automated one-click processes of their apps or with the features provided by their social networking services. But that’s today. Today, users are still and foremost consuming data: they syndicate to RSS feeds, Flickr and Twitter streams, consult weather info, stock rates, all through dedicated apps.
Those are few data mines. There are not that many others. The amount of different types of data structures may seem large already, it is however all still very manageable. Developers only need to plug into a quite limited set of APIs. The data types to choose from and to build apps with, are limited: tweets, status updates, pictures, video and audio, blog posts, maps and geotags, some demographics… You can do manyfold wonderful things with those already. To open data evangelists the available datasets are still too few: they encourage public authorities to open up their data and publish the databases in standards-compliant formats. Meanwhile we are collecting data source addresses and URIs, scrape them or convert them into accessible or better queryable APIs, and then publish those, add them to the list. Really great tools are coming out. ElasticSearch and so.
I recall the early days of the internet, long before search engines changed the game. I was writing down URLs whenever I saw an interesting one; portals came along. Lists, again. Then lists of websites, now lists of online databases. We should see further. Instead of looking up into the skies, we are gazing at the holes in the soil, at the lacunas of already made available datasets. We ought to start creating them ourselves. Or, better even, why don’t we figure out which tools might help ordinary people with creating and publishing online collaborative databases?
Think of the valuable energy spent and spilled by the open data community, persuading (e-)governments, subsidised public companies and other kafkanian strongholds of a paper based bureaucracy, to publish their databases. Now think, for example, of the thousands of commuters already equipped with devices that can broadcast the exact position, timestamp and direction of the vehicle they’re in. Then imagine some subversive creatives who’d put RFID stickers clandestinely all over the place, and suppose devices would run apps that would pick up a nearby DOI, package it with a geolocation and timestamp, send it in to a service that would map and stitch the whole thing together. A DIY DOI service. Then we would have live timetables, without even needing to bother public authorities or railroad companies. Imagine what this could do for developing countries, where mobile devices very likely may become mainstream much more sooner than governments could set up electronic public services. Now really let loose your imagination, and dream of the manyfold use cases guerilla tagging could be applied to. But even without wireless digital graffiti, enabling end users to register and share bits of data, all times and everywhere, could once again change our world.
Data Modeling for the Masses
Tomorrow will be different. The great majority of tomorrow’s users will still be consuming data, rather than producing it, very much like most surfers read blogs and website content, while only a minority actually creates the content. But it’s not unthinkable that every single user and every single node in our worldwide network, may soon enough become a datasource, that may want to broadcast its datasets. With file sharing services in the cloud, it is already happening, small-scale; with smart hash-tagging and geolocation check-ins, on a somewhat bigger scale. But this personal data is still crude, raw material, barely structured, let alone easily searchable.
Should users remain for ever passive data consumers, sit and wait until someone develops a dedicated app for the creation and management of the specific kind of data they want to collect and share? What about cabinet-makers and book binders, what about the many wonderful artisans who collect data about their materials and techniques? What about historians and philologists struggling with home-brew Excel-based “databases” of archival records? Poets are no developers and they never will be. (Although code might be poetical.) And very likely, app developers do not even know what weird but fantastic data collectors are out there.
When a zoologist is laying out an inventory of submarine invertebrates, will we be putting him besides a consultant, for ever? Or will we refer him to some generic XML schema for taxonomic classification? Let him struggle with a XSD editor, only to find out that his carefully crafted schema is useless, unless he still hires some IT professional to convert it into a database? Should he go and look for a UX designer, again, to create the front-end he actually needed? He’s running out of funds and his institution persists on why the hack he would not just use the in-house CMS – at best.
Say we want to create a website for amateur cooks: will we pick RecipeML as the data structure for our recipes, or REML, or what else? What if one particular attribute or entity we need for our app is missing from these “standards“? Should we divert and create our own? What if we could instead “fork” one?
“Ordinary” users, I mean, the big majority of us using computers, don’t know about object-oriented design or data bindings. But that doesn’t mean we’re ordinary. In fact, we may be highly trained specialists in our field, who just want to get things done instead of first retrain to another discipline. Users don’t know about database layout or schema design, they never heard of ontologies, let alone that one may expect them to create and maintain some themselves. How are we ever to hope successfully explaining the subtle differences to users between the conceptual, the logical and the physical?
GUI vs API
Today’s IDEs and modelling tools are a graphical horror chamber. They seem to implement a one-to-one translation of programming objects into a cluttered canvas of wireframes and trees. Again, they’re about the APIs, and only programmers can understand these messy diagrams and flow charts. Instead of (or rather in addition to) APIs, we need truly visual tools. Tools that show the actual data, that visualise structures, proportion and relation, that reveal logic, causality and deduction. Tools that are designed, from the start on, for a hazzle free experience in creating and maintaining highly complex data structures, without the technical complexity ever shimmering through. Tools that hide the architecture and the models of the implementation from the end user, and breathe with the transparency of live data. We need an application that leverages the knowledge of people who better know the semantics of their collections and collectibles than any information architect or database engineer is ever going to.
If – making a bit of a shortcut – our focus should be on the user interface (GUI) at least as well as on the programming interface (API), then perhaps we will have to do away with the several competing, self proclaimed standards that have come to emerge in recent years amongst the aficionados of the semantic web. RDF, OWL and SPARQL are still academic experiments that lack wide support. Microformats never got foothold in the market. If browsers start supporting HTML5 semantic tags and web designers start using them, that’s because of marketing and SEO, not because users could do useful things with it. Perhaps we should build our own “standard”, from scratch. RESTful, of course. JSON or BSON, very likely. XML, in some use cases, undoubtedly, and traditional HTML for standards-compliant, general purpose solutions. But apart from the output format, the underlying data model could be anything. But anyhow it will be implemented, it will still have to be stored and managed in a (perhaps proprietary) format that syncs, that may or may not strictly support ACID, but that, foremost, needs to be flexible and which goal and ambition it should be to allow the structuring, organisation and storage of all kinds of data, right out of the box.
Science, Technology, and the Liberal Arts
I strongly believe that the architectural backbone of such a meta-model for the organisation of knowledge (rather than data), should be built bearing on traditional philosophical logic and linguistics, too, in pair with the innovative ideas of computer science. As a joint effort of engineers, scholars of the humanities, graphic designers and data artists. The semantics of “ontology” are very different to computer scientists and philosophers. That’s a pity, since it should not be like that, if only these disciplines could bridge the digital divide that grew them apart. In the days of the hierarchical file system, XML was (co-)devised by a philologist; the age of distributed networks will once again need its Sperberg-McQueens. Some time, I hope to elaborate on these confusing insights, and explain more clearly that we might need an epistemologic data infrastructure, rather than a semantic web.
I guess we will need to put some intelligence into the data model, and at least some conditional logic. Which in fact would imply that we are merging the data model with the object model, the data structure with the application logic. Impedance mismatch is not just a technical issue, it’s right at the conceptual core of the way we have been dealing with data management so far. NoSQL should not be a mind-fuck to programmers, it’s the natural thing to do, it’s how the human neural network organises its random access memory.
Suppose users would be able to add, remove and rearrange attributes to the entities they collect, on the go, in a non-destructive and all times backward-compatible fashion. Suppose they could share schema’s and templates, as easily as they share vCards, iCalendar-snippets, .csv-files or spreadsheets. Suppose not only the data within the templates would be synced, but the schema’s as well, and the very forms through which the data is entered and retrieved. Suppose people could collaborate on specific types of information, solely or with others, in small communities or on a massive scale, both on the data entry as on the structure of it, simultaneously. Then we would have a platform for an economy of knowledge.
Towards an Economy of Knowledge
People could sell their datasets, borrow or lend them, share their costs, benefit from each others investments. Collectors could collect, archivists could weed out, data miners could retrieve, visual creatives could mash-up infographics and querying widgets. And they could all focus on their speciality. If the system knows the atomic datatypes out of which compound data structures are built up, and if logic is built right into the structures, then general-purpose programs and algorithms could be made to good use on all kinds of data. Users could invent such general purpose algorithms, design visual templates, give them away or sell them. And still unprogrammable human intelligence could be offered as well, as services very much like Amazon’s Mechanical Turk, but empowered with highly structured atomic data.
If all this might become reality, sooner or later, then perhaps we should already start rethinking the role of apps, too, within such an environment. The App Store is gradually “redefining the web”. But what does that mean, from a business perspective? What does it imply for entrepreneurship? If we’re all going to jump on the bandwagon of the app hype, and give up the form factor of the website as the principal outlet for doing business with data, then perhaps we’re extraditing ourselves to a new platform war, in which we’ll be just the pawns of the big players, iOS, Android and what else. Whoever controls the Store will solely benefit from the financial trades happening inside, while app developers will be continuously confronted with outrageous fees, ever changing EULAs and terms, and, not in the least, the decline in the perceived value of software and the devaluation of app prices. Users, on the other hand, get locked in a closed ecosystem where their data is obfuscated in an ungraspable cloud, entangled in several dozens of proprietary storage formats. Back to square one, to the days before the internet.
Data, not Apps
Maybe this is all a bit too soon, to worry about. The most interesting things are still to be seen coming out of the app candy store. Apps that begin harvesting, scraping and aggregating open data, are only just about coming to arise. Impressive mashups and truly useful data magic are being released every month or week. But it’s all risky business, I think. Sure, those apps and services quickly draw the attention of short-term-return-on-investment capitalists. And the startups get tempted by the luring amounts of easily available seed capital. But is this sustainable entrepreneurship, can you really grow a business with a cosmetics bubble? What if the core feature gets implemented in the OS? You’re out of business. Most of these wonderful apps are not real products, they’re commodities. The real product here, is the device, the OS, the network, the hardware infrastructure. Manufacturers, network providers and energy suppliers are the sole beneficiaries of our present information society. Instead of an economy of knowledge, today’s internet is a giant supermarket of data. Users are nothing but consumers, strolling in the great shopping mall of the web, while app developers are the obedient servants of the internet’s Wall-Marts, they are the brands that fill the shelves and lure consumers inside.
If open data is our trade, and open knowledge is the mission, then obviously data should be at the forefront of our products and tools. Not apps. Let users provide the data, their data, their shared data. Let them manage their content, collaborate, and publish as they like and give them access at all times, the way they want. A platform for open data management should provide the means for this: storage, synchronisation, access permissions, security, social collaboration, editing, sharing, publishing…
An Open Call
I have some ideas about how such a platform might look like, and I believe a prototype may be built with technologies that come right off the shelves. We’re past the mockups, I’m through with diagrams. It’s time to build. This is an open call: I would love to see people come by and say “Hello, nice to meet you too!”