Some questions about EPUB, WordPress, tools

I have a couple of questions for discussion in this jiscPUB project, please any and all of you, use the comments!

If you publish EPUBs now, what tools do you use?

I asked jiscPUB team member Liza Daly via email what she uses to make EPUBs, and she said asciidoc.

Asciidoc lets you create documents in your text editor of choice using one of a family of lightweight wiki-style text formatting languages. Unlike Wiki formats, though, asciidoc is designed to create richly structured documents, as discussed on this page. This post from an O’Reilly author explains how it works to create multiple output files. I’ll do a post on how these tools work with EPUB.

Now, I am interested in who uses what?

  • Anyone else use asciidoc?
  • Are there pandoc users reading this? Bruce D’Arcus , have you made EPUB? I tried, but it does not support intra-document links.
  • Are some of you hand-crafting HTML like Mark Pligrim then feeding through something like Calibre?
  • Anyone use their word processor to make HTML and get EPUB from that?

(And just on the off chance, has anyone done a pandoc/markdown to asciidoc converter?)

What’s considered best practice for EPUBs?

I have been making EPUBs by feeding things through various processors. Different tools are using different levels of styling by default.

What’s best practice, in terms of what level of CSS styling to put in and so on? The top hit I got on Google for this was an Adobe page from 2008 that didn’t actually tell me anything useful.

I think that when we’re talking about word processing documents being transformed for the web what often works best is to have consistent styling for headings and plain paragraphs but authors do need some control over what goes on in tables, for example. This will require some figuring out for EPUB I know the team at USQ had problems with large and complex tables in their testing with USQ courseware, mainly using iOS devices.

JISC project people: What do you have to do to get your reports up in JISCPress?

JISCPress is a site where a variety of project output documents can be annotated by the community. It uses the digress.it comment system to allow paragraph-level annotation. It says on the site: We are currently operating JISCPress on a trial basis, with a view to making it a fully fledged JISC service if the trial goes well.

I wondered if anyone reading this has used it, and what the experience of contributing to it is like. This is both relevant to this project and to potential future explorations of how something like JISCPress might work in an environment where some people might be commenting on documents using ebook reader software and some using the plain-old web with some way of aggregating both.

When I called for sample documents for this project, Owen Stephens (@ostephens) sent me a test document, I am still working on making a nice EPUB out of it, fiddling with the tool as I go. He tells me it was ‘converted by hand’ to go on this site, which is not quite like jiscPress but does allow comments.

Anyway, I am wondering:

  • How much effort are people putting in to getting JISC project outcome documents on the web?
  • I know there are templates for JISC reports, which seem pretty light and simple but what about JISC deliverables, like toolkit documents etc?
  • Assuming most of this kind of output is written in Word or other word processors, would people be interested in a template (and tools) that had:
    • Embedded metadata that could be used by machines to process documents.
    • A way to preview your work quickly and easily to make sure that the final output is going to be OK?
    • Enough styling cues to create good web pages, maybe ebooks via automated uploads.There’s a trade-off here between having something that’s easy for authors to use, like treating the word processor like a typewriter (which is usually more costly in the long run) and getting people to invest in learning tools.

Comments?

Copyright Peter Sefton, 2011-04-12. Licensed under Creative Commons Attribution-Share Alike 2.5 Australia. <http://creativecommons.org/licenses/by-sa/2.5/au/>

HTTP://DBPEDIA.ORG/SNORQL/?QUERY=SELECT+%3FRESOURCE%0D%0AWHERE+{+%0D%0A%3FRESOURCE+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%2FBIRTHPLACE%3E+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FRESOURCE%2FSYDNEY%3E+%3B%0D%0A%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%

This post was written in OpenOffice.org, using templates and tools provided by the Integrated Content Environment project.

Posted in Uncategorized, Workpackage 3 | Tagged , | 8 Comments

Metadata in word processing monographs

[Reposted with spell checking a few minutes after original post, posted draft by mistake.]

[And shortly after that added missing bibliography]

Introduction why worry about metadata?

I have been working on a simple service to take word processing documents Word and OpenOffice.org mainly and create mobile-readable EPUBs from them. One of the issues in this process is metadata: how do we get quality metadata into the EPUB format?

EPUB readers, like music applications use metadata to provide browse and search access to content.

graphics1Illustration 1: Calibre’s metadata-driven management interface

Obviously, for books to be useful to readers, and to store-owners, publishers and repositories, metadata is an issue.

But it’s not just for ebook delivery that this is an issue. A thesis has to be submitted for examination, and sent to an institutional repository, and maybe to a discipline repository or a publisher. And papers are often submitted to multiple sites over their lives conference management systems, journal management systems, repositories and so on. The current state of scholarship is that every time you make such a submission you have to re-enter metadata. Upload a paper to a conference site, and chances are you will have to enter the author names into a form, even if they are already on the paper. Not to mention that every time you type in a name, you are generating low-quality string-based non-linked data. Some of us think there is a slow revolution happening in metadata, using URIs and making links.

So one of the things I would like to consider for this project is how to embed metadata within documents so that the various applications that process them can do all the hard work. And I want to think not just about strings but high-quality linked-data metadata. To discuss this I will work through one of the use cases for the jiscPUB project and look at the life-cycle of a thesis.

Thesis workflow

The aspect of workflow we’re interested in here is that:

  1. If the candidate is lucky the university or supervisor provides them with a template for writing up their thesis.
  2. The candidate writes up the thesis and sends it to their supervisor and possibly other reviewers during this process.
  3. Depending on the quality of the template there is work to do for submission, generating tables of contents, making PDF files maybe, probably, in future, making web and mobile-ready versions.
  4. Someone deposits the thesis file into (at least) the repository at the university, maybe also other databases, entering metadata about it who knows how many times.
  5. Also in the future making sure all the provenance for all claims is available via data that is linked to or bundled with the thesis. (Out of scope for this post, but I will come back to it).

In this post I am going to look at 1-4 above, looking at how template design might aid in preparing a thesis for mobile delivery. I’ve been thinking for a few years now that the university should not just provide a template but pre-fill as much of it as possible with machine-readable metadata. And note that there’s probably a much more compelling case for machine readable metadata in articles, which tend to be submitted to more places.

Thesis metadata

The university of Edinburgh, host of this jiscPUB project via EDINA, has a word template for PhD theses on its wiki. I showed in the last post that if you feed that template, sans any content, through the experimental Word to EPUB converter I’ve been working on, then it more or less worked, but without very much metadata (it was also dropping heading numbering, which I have now, sort-of, kinda, fixed).

To add the metadata that should be in the EPUB you would have to type it in somewhere. Either I could add fields to the conversion service, or you could use something like Calibre, but the thing is, most of the metadata you need is in the document it’s just not marked as such. The title page has the Title (in AUTHOR style) the author’s name, and the name of the institution, degree and date in the footer.

graphics2Illustration 2: Thesis metadata is there – in the text, just not marked as such

So it should be possible, given that this metadata is all there to mark it in such a way that downstream processing systems can recognise it. One of the best places to start is with the document metadata fields. The Edinburgh template does use document metadata for the title.

graphics3Illustration 3: Document metadata in Word 2010

But it could go one step further, and instead of requiring the author to enter the same thing in two places, use a field to show the title on the title page. In Word 2010 the field function is hiding in the Ribbon under Insert, Quick Parts.

graphics4Illustration 4: Adding a field so the title entered in the document metadata can be placed on the title page without re-typing.

Now the title is linked to the document properties, and any application, such as search engine can extract that metadata. But there is a cost you have to be able to explain to your authors that they need to set the title in the properties, and how to do it, for the different word processing applications they’re using.

The same thing works for the author field as well. That’s OK for theses but it is less useful for other kinds of scholarly content where there are often many authors. Word 2010 supports multiple authors in its metadata but the fields don’t all you can get using a field is a semicolon separated list of authors, which is not useful for laying out the content. An approach I think is useful for scholarly templates in general is to embed the metadata in-line.

Some colleagues and I wrote up some of the approaches for embedding inline metadata for the Journal of Digital Curation1. The short version of that is that the most reliable cross-platform way of adding semantics like metadata in-line to documents is to use styles, or a new technique I have been developing since that work, using links. Both styles and links are supported by major word processors, so they tend to survive being loaded into different word processors or different versions of the same word processor. I will give examples of both approaches here.

Styles are fiddly to apply if you are expecting people to manage the process for themselves, but in the case of a template like this one for theses they should be robust enough thesis candidates are not going to be changing the title page except to fill in their details. Even better why doesn’t the university do it for the candidate? I’ll come back to this idea. Using tables for metadata like the one on the top of this document is also a reliable approach the metadata can be identified using style, or just text in a cell adjacent to each metadata item.

So to demonstrate the use of styles for metadata in the Edinburgh thesis template, I:

  1. Used style p-meta-author instead of AUTHOR so the ICE conversion system would recognise it.

    graphics5Illustration 5: Applying the style p-meta-author the author name in the template. This dialogue box is a bit hard to find, good luck.

  2. Added an inline/character style for the date i-date. [TODO: get this working or remove from post]

    graphics6Illustration 6: The inline style for the date, i-meta-date. It has no special formatting.

Getting both of these to work required a bit of hacking on ICE itself, as this metadata handling was only partially implemented.

The result is that both author and date are now included in the metadata for the EPUB file.

There is a problem with this approach, though, in that it is not giving us very high quality metadata in a linked-data sense. The author name is just a string, which as we know is not a good way to uniquely identify an author. More than one person might be identified by a string, and more than one string often identifies an author2. It would be much better if we could give the Author an HTTP URI. That is to name them using a URL that will be stable and unambiguous whether they are called Name of Author or Author, N or they change their name to Nom de Plume, which might occur as a string like de Plume, N or many other variants.

There’s a big project coming, ORCID, which will aim to give researchers URIs, but an university could easily give each candidate a URI now, and match up with ORCID later.

I have included a demonstration of how to identify a party, the Publisher, using a URI. Here’s a walk-through of a possible technique for including URIs for metadata in a template. Remember, only the template designer has to do this, not the poor candidate. And if we wanted to use this technique for personal names we could automate it and use a university-assigned URI for each candidate:

  1. I chose a URI for the university: http://www.ed.ac.uk/ . Just using that as a link does not amount to metadata though. Instead I,
  2. Visited http://www.ed.ac.uk/ – which redirects to http://www.ed.ac.uk/home
  3. Clicked my Publisherize.me bookmarklet.
  4. Copied the resulting link which is encodes an RDF statement/triple, and wrapped it around the text in the template.

    http://ontologize.me/?tl_p=http://purl.org/dc/terms/publisher&triplink=http://purl.org/triplink/v/0.1&tl_o=http://www.ed.ac.uk/home

  5. Now, when documents using that template are fed through ICE, including the word-processing-to-EPUB service I have been prototyping, ICE recognises the metadata, extracts it into a data structure so it can be passed-on to Calibre, which makes the EPUB.

    ebook-convert … –title “Title of Thesis” –authors “Author-name” –publisher “The University of Edinburgh (http://www.ed.ac.uk/home)” –pubdate “2011-05-01”

    But wait, there’s more! ICE also embeds the metadata in the HTML it produces, like so (I did edit out some cruft that it should not be producing):

    <span rel="http://purl.org/dc/elements/1.1/publisher" resource="http://www.ed.ac.uk/home">

    <span property="http://xmlns.com/foaf/0.1/name" resource="http://www.ed.ac.uk/home">

    <a href="http://ontologize.me/?tl_p=http://purl.org/dc/terms/publisher&amp;triplink=http://purl.org/triplink/v/0.1&amp;tl_o=http://www.ed.ac.uk/home">The University of Edinburgh

    </a>

    </span></span>

    This is intended to be compatible with RDFa 1.1, and this approach for embedding metadata in scholarly documents is on of the approaches we’re promoting in the nascent Scholarly HTML movement.

Summary

In this post I have looked at three ways to embed metadata in a word processing document, so that when people use the template the metadata they or the template designer, enter can be machine-processed from then on.

  1. Using the metadata fields in the document: good for very basic metadata like titles, but limited and not particularly interoperable for other kinds of metadata.
  2. Using styles: flexible but fragile, and requires that each processing system knows about the styles you are using.
  3. Using my proposed way of making linked data metadata statements encoded in links; triplinks, as seen on my demo site: http://ontologize.me. This is potentially quite robust, and could be supported by tool-chains that are much easier to use than the current half-baked infrastructure provided by yours truly.

Here’s a final screenshot showing how the embedded metadata has made its way from the sample template using those three methods to the EPUB metadata, as seen in the Firefox EPUB plugin.

graphics7Illustration 7: Metadata from the thesis template demo, in the Firefox EPUB plugin.

All three of these require that software systems know how to find and process metadata what we’re trying to achieve over at the Scholarly HTML site (when I get time to add pages on conventions for encoding metadata) is to document common ways of doing this so that tool-builders can create interoperable systems.

To try this out for yourself:

  1. Go here in your browser: http://ec2-50-16-170-243.compute-1.amazonaws.com/api/convert/doc
  2. Either:

The default check-boxes at that service will make you an EPUB if you don’t have an EPUB reader you can change the file extension to .zip, open it up and have a look. If you do, you’ll see something like this:

graphics9Illustration 9: Test thesis template in Adobe Digital Editions – note the title and author have been automatically extracted from the Word document.

Where now?

There’s potential here to test some of this stuff out with the folks who support thesis candidates and their supervisors, or in journal templates.

  • I will keep working on the Edinburgh template to show how we might add to it in ways that increase the utility of the documents it produces, by making it easier to build ebooks. My thinking is to provide demos of what can be done for Word, OpenOffice.org/LibreOffice both using generic styles, or for people prepared to invest a little more time using the ICE styles.
  • I’d love to do something with a repository I’m thinking that it would be great to deposit theses in EPUB format and the repository could provided a web-based reader, along the lines of IbisReader, which Liza Daly and company created. I’m looking at you, Eprints! Eprints already almost supports this, if you upload a zip file it will stash all the parts for you in a single record. All we would need would be something like this little reader my colleagues at USQ made. It would just be a matter of transforming the EPUB TOC into JSON, and loading the JavaScript into an Eprints page.
  • There are improvements to be made to ICE currently the style-based metadata does not produce Scholarly HTML / RDFa output, and is in a separate part of the code from the link-based metadata; these could be brought together.
  • Is it worth adding Scholarly HTML / RDFa metadata support to Calibre so it can auto-detect metadata in HTML input?

Longer term I would like to see:

  • A properly resourced end-to-end thesis project looking at how an institution could provide technical resources to candidates and supervisors, from templates, and a content, data and annotation management system . I will be showing demo service of some of this later in the project, but at the moment the demos are just toys we need some real users and some institutional commitment to trying this stuff out.
  • A journal and conference paper service where authors can write once and then submit to multiple journals. This idea comes from Timo Hannay who I met when I was in the UK he’s worked with Nature where there is a 95%-ish rejection rate, so a service that could automatically re-work your document for you and submit would be really useful. Also sounds a bit like the Repository Junction project that Theo Andrew is involved in.

1. Sefton P, Barnes I, Ward R, Downing J. Embedding Metadata and Other Semantics in Word Processing Documents. International Journal of Digital Curation. 2009;4(2). Available at: http://www.ijdc.net/index.php/ijdc/article/view/121. Accessed October 22, 2009.

2. Salo D. Name Authority Control in Institutional Repositories – Cataloging & Classification Quarterly. Cataloging & Classification QuarterlyWhere. 2009;47(3 & 4):249 – 261. Available at: Accessed September 9, 2009.

 

Copyright Peter Sefton, 2011. Licensed under Creative Commons Attribution-Share Alike 2.5 Australia. <http://creativecommons.org/licenses/by-sa/2.5/au/>

HTTP://DBPEDIA.ORG/SNORQL/?QUERY=SELECT+%3FRESOURCE%0D%0AWHERE+{+%0D%0A%3FRESOURCE+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%2FBIRTHPLACE%3E+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FRESOURCE%2FSYDNEY%3E+%3B%0D%0A%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%

This post was written in OpenOffice.org, using templates and tools provided by the Integrated Content Environment project.

Posted in Uncategorized, Workpackage 3 | Tagged , | 8 Comments

Metadata in word processing monographs

Posted a draft version of this document by mistake. This post has been replaced by a new version http://jiscpub.blogs.edina.ac.uk/2011/04/11/metadata-in-word-processing-monographs-2/

Posted in Uncategorized, Workpackage 3 | Comments Off on Metadata in word processing monographs

EPub for word processing users

Author:

Peter Sefton <pt@ptsefton.com>

Date:

Time-stamp: <2011-04-05 15:38:38>

Description:

First informal report on progress with Workpackage 3. Looks at tools for turning word processing documents such as Microsoft Word documents into epub.

Summary

Last week I started on the JISCPub project on workpackage 3. My role on this project is to be the Wordprocessing Tool Expert.

I started by considering the first use case, seeing how a PhD thesis could be converted to EPUB for mobile use. I happen to have a very worthy test file in the form of Danny Kingsley’s Creative Commons Licensed: The effect of scholarly communication practices on engagement with open access: Australian study of three disciplines from 2008.

I am working three kinds of deliverable:

  • Blog reports. This is the first.
  • Demonstration software I have produced a simple demo that can covert word documents into EPUB files you try it out, assuming you can work out how, or read on for more background about the service, how raw it is, excuses about its somewhat agricultural interface and the fact that it might not work very well yet.
  • Demonstration files showing the results of various processes. I have set up a dropbox.com folder for the project team, where I will be keeping records of the experiments I’m doing with various bits of software. This will be available as a data set at the end of the project. One demo is this post, formatted as an EPUB file using the tool I discuss below. I tried to upload it to WordPress but it does not meet security requirements, you can grab the post in EPUB format from here. Comments welcome this is a test process only.

Initial impressions

I started this investigation by assuming that I had a thesis I wanted to publish as an ebook, via EPUB and doing various Google searches for stuff like How to make an EPUB. Theo Andrew is going to talk to some real users about this soon so we’ll find out a lot more about what kinds of assumptions are valid to make about our users, graduate students and academics.

There are some sites which review software, notably Jedisaber’s, which has reviews of many software packages related to EPUB and ebooks in general, and a how-to on making an EPUB from scratch, by hand. I have been trying out various bits of software against the uses cases for this project without a great deal of success and I will document this behind the scenes and save input and output documents as part of the data set for this project.

As it says in the plan for this workpackage the most obvious bit of software to try out is Calibre which is a mature open source ebook management application available for all the major platforms. Calibre is a very feature-rich application with a somewhat quirky interface which doesn’t do the one thing I wanted for my first experiment. It doesn’t convert Microsoft Word .doc files into .epub format. Yes, you can do it by doing a ‘Save as HTML’ but that’s not the ideal process for casual users. Calibre does do open document format (.odt) files, though, so I tried that with Danny’s thesis, using open office to save it as .odt, after some minor tweaking in OpenOffice. I found that:

  • On large files like the thesis it takes around 100-200 minutes to convert the document, in the background, using the GUI on my modest desktop PC.
  • Some graphics are not supported, for example embedded vector drawings. I don’t think there’s a good solution for this that does not involve firing up a word processor to render some things I will come back to this in a future post on the current (terrible) state of the art in Word processor to HTML conversion, and what could be done about it.

(I am wondering if the speed of Calibre is the reason I never heard back from the odt2ebook.com site, which offered free conversions; there was supposed to be an email notification but it has been a few days.)

Demonstration software

After about half a day of investigation, It became clear that there was a big crater in the ebook software landscape. There is no obvious way to make an ebook from a Word document. Yes there are a some word processing packages which will do it, but nothing simple or online. Liza Daly the ebook expert on this project confirmed this so I set out to at least try to fill that gap. My starting point was the ICE software I established at the University of Southern Queensland, where I worked until very recently Over the years this has evolved into a very capable word processing format converter for the web. It is mainly designed to work with documents with a known input format, using its own set of styles, but it can also deal with generic documents with standard headings. It already had some EPUB export but only for collections of documents, not for a single word processing file via a simple web service.

The idea was to use ICE to generate HTML then write code to break it up and store it a zip file, EPUB style. ICE does a reasonable job of converting the test thesis to HTML by looking for standard heading styles (Heading 1 .. n) and guessing that ‘Quote’ is a style for quotations. There are no widely used standards for word processing styles, so some heuristics are required; the current ICE code does not do a lot of this, but it could be extended.

What I did was:

  • Looked at building on work already in ICE and other USQ software that makes EPUB files. I got something running, but in the process of developing it, I discovered the command line features in Calibre were probably a better option.
  • Set up a possibly temporary code repository at Google Code and checked in the latest trunk of ICE.
  • Added simple code to the ICE conversion service to call Calibre on the ICE-generated HTML. The command currently looks like this.ebook-convert "/tmp/ICE-u6hayU.ice/Kingsley-Formatted PhD 12May09.htm" "/tmp/ICE-u6hayU.ice/Kingsley-Formatted PhD 12May09.epub" --chapter "//*[name()='h1']" --level1-toc "//*[name()='h1']" --level2-toc "//*[name()='h2']" --title "The effect of scholarly communication practices on engagement " --authors "" --publisher "" --pubdate ""
  • Using Word docs there are some problems, like the way the title is truncated above, and other metadata is missing. I’ll come back to metadata in a future post making sure that works are correctly described and labelled is an important dimension of scholarship. I see metadata as a gateway feature (as in gateway drug) for improving the semantic content of documents in general and I think once people see the power of embedding metadata semantics in their documents so they don’t have to retype stuff all the time they’ll be ready to deal with citations and rhetorical structure, and domain-specific semantics.
  • Set up a test/demo server in Amazon’s cloud services. Here’s Danny’s thesis as rendered by the service. The input document was a word document, the output is a EPUB file. It takes about ten minutes to create.

The test service uses ICE to convert .doc and .odt documents to HTML you can feed it generic documents using Heading styles and it will do its best. ICE deals with all sorts of images and maths etc, it does a better job than most processors because it uses the OpenOffice.org word processor to create images using its rendering engine most tools can’t deal with some kinds of content because they can’t render them. Then the service calls Calibre’s HTML to EPUB conversion tools. To start with I coded this to treat <h1> elements in the HTML as the major sections in the document. This contrasts with the way Calibre attempts to generate HTML directly from the .odt file, which means that lots of graphics won’t be able to be rendered at all.

Status:

  • Alpha code only.
  • Might break.
  • Slow (circa 5 minutes for a big document, though, not 100-ish)
  • I will endeavour to keep this up and running for the life of this project.

Later in the project I plan to work with Theo and his focus group users to see whether the ICE approach for styles is viable for thesis production and for the various other use cases. Other questions include:

  1. Should long documents such as theses be managed as multiple parts or single documents?
  2. What’s the demand for being able to integrate other kinds of resources (spreadsheets, images, other data like chemistry) and are there viable ways to embed this stuff in EPUB documents? The question here is whether EPUB is suitable for the ‘research object of the future’ where publications are not just documents, but need to be embedded in or carry-around more of their context.
  3. Should we be thinking about EPUB converters as part of repository deposit processes?

Known issues with the ICE/EPUB generator:

  • Sometimes he first section of the document appears in the TOC as ‘unnamed’. I don’t want to resort to hacks like adding a heading called ‘frontmatter’ to the document, but I might.
  • I have been focussing initial testing on the Firefox addon for EPUB: https://addons.mozilla.org/en-us/firefox/addon/epubreader/ more testing and validation required (this goes for all the tools I have been looking at.)

Where next?

Potential next steps include:

  • Improving and glamorising the test conversion service.
  • Creating new calibre plugin to use ICE as an HTML generator this could be used to add Word support, and potentially improve Open Document support.

Lessons learned

Even if you start from good quality HTML, writing an automated tool for creating EPUB is likely to be quite a big job. Calibre already does it, has lots of configuration options and seems to be a good starting point for a conversion tool I’m interested in comments about this though. A couple of days of my own coding didn’t get me close to what I could do with Calibre in about an hour learning what command-line options to use (there were several hours spent refining the initial options, though). And it’s not just me, others of the staff at USQ have spent a fair bit of time getting basic EPUB support working; the one thing that makes me think it might be worth tackling is the glacial speed of Calibre but that might just be because Calibre is doing all sorts of important things and a new converter could be just as slow.

Copyright Peter Sefton, 2011. Licensed under Creative Commons Attribution-Share Alike 2.5 Australia. <http://creativecommons.org/licenses/by-sa/2.5/au/> Published by http://ptsefton.com

graphics1

This post was written in OpenOffice.org, using templates and tools provided by the Integrated Content Environment project and published to WordPress.

Posted in Uncategorized, Workpackage 3 | Tagged , | 7 Comments

Workplan 2: Draft Table of Contents

Technical Landscape Exemplars and Recommendations

Author: Liza Daly <liza@threepress.org>
Date: Time-stamp: <2011-03-25 10:25:49 liza>
Description: Draft of technical landscape paper for JISC

Executive summary

Introduction

  1. Audience for this work
  2. Discussion of focus
  3. Why digital publishing now?

Note

"Why digital publishing now" will include a graph of digital book sales showing hockey-stick growth

Historical perspectives

Structured markup and academic authoring

  1. TeX and LaTeX
  2. SGML and other hypertext

Early digital book marketplaces and failures

  1. Format proliferation
  2. Lack of distribution mechanism
  3. Form-factor issues (displays and size)

Digital book landscape today

Marketplace successes

  1. Introduction of Kindle
  2. Formalization of EPUB/OEB
  3. Digital distribution precedents (other media)
  4. Proliferation of mobile devices

Note

To include sales data, trends, and international growth

Unexpected outcomes

  1. Sharp decline in retail print book business
  2. Migration of authors towards self-publishing or digital-first publishing
  3. Scramble to allocate rights from out-of-print/pre-digital works
  4. Decline in quality of product due to outsourcing and inexperience

Hardware considerations

  1. Screens: electronic ink and high-resolution displays
  2. Form factors: laptops, tablets, phones
  3. Wireless connectivity and purchase

Ebook format technical characteristics

  1. Page-based formats
  1. PDF
  2. Accessibility implications
  3. Device suitability
  4. Authoring tools
  5. Position in the academic marketplace
  1. Reflowable formats
  1. Mobipocket/AZW
  2. EPUB 2
  3. Accessibility implications
  4. Device suitability
  5. Authoring tools
  6. Internationalization and localization

Note

Charts comparing formats (PDF, Mobi, EPUB) appropriate here

  1. Future formats
  1. EPUB 3
  2. HTML5
  3. Scholarly HTML
  1. Academic considerations
  1. MathML and SVG
  2. Monograph content
  3. Stability of works
  4. Citations and linking
  5. Workflow and conversion: costs and tooling

Persistent challenges in digital book distribution

  1. Standardization versus innovation
  2. Design and usability
  3. Use in educational settings
  4. Copyright and law
  1. Google Books
  2. Digital rights management
  3. Territorial rights and ownership

Note

To include data from existing Kindle/ereader pilots in academic settings

Qualities of a successful digital distribution platform

  1. Discovery and purchase
  2. Negotiating rights
  3. Consumption
  4. User-generated content: annotations and sharing
  5. Sustainable outcomes

Note

To include specific projects to monitor that would be of interest to the target audience

Appendices

Glossary

Note

To cover general terminology in digital publishing as well as specifics about devices

Exemplar ereaders

  1. Electronic ink devices
  2. Mobile phones
  3. Desktop computer software
  4. Cloud-based
  5. Tablet computers
  6. Other potential entrants

Acknowledgments

Posted in Workpackage 2 | Tagged , , , | 5 Comments

Workpackage 3

Plan for workpackage 3

Use cases

The three main use cases identified in the current plan, and a fourth proposed one:

  • Postgrad serializing PhD (or conference paper etc) for mobile devices
  • Retiring academic publishing their ‘best-of’ research (books)
  • Present final report as epub
  • Publish course materials as an eBook (Proposed extra use-case proposed by Sefton)

Methodology

The process will be to look at the agreed set of use cases in three main ways.

Firstly to model the use of desktop tools that users might come across by themselves and evaluate their fitness for purpose for the use cases. The most obvious candidate is Calibre the free ebook tool. Secondly, to look at how an example of how the popular WordPress blogging software could be used. The research question is: would users be able to take existing documents or create new document and create useful books? If not, what is a model for an application which would be more useful? This baseline information will be fed into the design process for the final part of the project.

Finally, we will research and experimental development into applications software to create ePub books using a different scenario – concentrating on the use of Word Processing software – which is very widely used in the academy and creating an application to allow real-time preview of content in an online book-enviroment.

Deliverables

The deliverables will be:

  • A series of research reports in the form of blog posts outlining the tests undertaken and encapsulating the results, defining a model for eBook publishing in the academy, as well as pointing out issues that need to be addressed in future more substantial investigations.
  • Working demonstrator sites.
  • Demonstration files and outcomes, presented as a data set that can be used for further investigation and a benchmark for future projects.

Assumptions

While we will generate ePub books and do simple testing, we expect to pass on our results to others on the project for testing with mobile devices as appropriate.

Task breakdown

Task Deliverables
1 Explore Calibre (and any other likely candidate software) to establish what is possible with off the shelf tools. Blog posts:

  1. Explain process and results show results
  2. discuss issues for landscape study
  3. Use any identified shortcoming in as input to (3)
2 Explore the use of WordPress (including JISCPress)

Using the Anthologize plugin (out of the box)

Using other other ePub plugins such as that developed by Martin Fenner

Consider how the original materials in the use cases might be handled in this environment

Blog posts:

  1. Explain process and evaluate plugins
  2. Explore issues with WP as a platform
  3. WordPress running on a demo machine at ADFI with a demonstrator
  4. Use the demonstrator to make final report available in ePub
3 Build a demonstrator using DropBox + The Fascinator & ICE.

Demonstrate the difference between converting ad hoc word processing content and structured content created using a template.

Explore the benefits of using tools like DropBox

Show content at different screen sizes to give Authors a sense of What You See is What I Mean publishing (WYSIWIM), rather than What You See Is What You Get (WYSIWYG).

Running public demonstrator which allows:

  1. Users to put files in DropBox
  2. Manage those files via a web interface, dragging and dropping them into book packages
  3. Add content from an RSS feed
  4. Click to publish as ePub
Posted in project plan | Tagged , , , | 7 Comments

Introducing the project team

Theo Andrew – Project Manager

Based at the University of Edinburgh, Theo is currently on secondment to EDINA, a JISC national datacentre, working as a project manager. Besides this project he is involved with running the OpenDepot.org service and the Open Access Repository Junction.

Theo will be making sure the project runs to schedule and will provide support for the other team members to be able to carry out their work.

Peter Sefton – Technical expert

Peter is the former Manager of the Software Research and Development Laboratory at the Australian Digital Futures Institute, University of Southern Queensland.

Working as our wordprocessing tool expert Peter will be developing tools to enable the easy publishing of scholarly material in epub format for a number of use cases.

Liza Daly – Publishing expert

Liza Daly is the founder of Threepress Consulting which provides strategy and software for publishers, authors and vendors. In 2008, she developed Bookworm, one of the first open source EPUB readers, and in February of 2010 released Ibis Reader™, the first HTML5 ebook platform.

Liza will be working as our publishing technical expert to carry out research into the state of digital monograph publishing in Higher Education in the UK and internationally; and will synthesise the project outcomes into a technical landscape report to make recommendations to JISC about future activities in this area.

Posted in News | Tagged , | Comments Off on Introducing the project team

We are green for go!

I’m really pleased to announce that the Digital Monograph Landscape Study has finally been approved and is now officially starting. Currently we are assembling the project team and getting started with all the necessary preparations. As we make progress we’ll announce all our major and minor findings on this blog, as well as details of our methodology and related e-book news, so be sure to hook up your RSS reader to our feed. Over the course of the next few blog posts I’ll start by introducing our project team and their roles.

Really looking forwards to getting stuck in with this project!

Posted in News | Tagged , | Comments Off on We are green for go!

Project Plan 5b: Revised budget

Project plan 5b) Revised budget – awaiting confirmation

Name of Project: Digital Monograph Technical Landscape Study #jiscPUB

Directly Incurred Staff (1st Feb – 31st July 2011)

Theo Andrew Project Manager FTE 0.2 £4343.13
Total Directly Incurred Staff (A) £4343.13
Directly Incurred Non-Staff
User Support 0
Documentation & e-content 0
Consultant: Digital Publishing Expert

Liza Daly (Threepress Consulting Inc.)

£10000
Authoring User Perspective Experts

(5*UoE Postgraduate)

£2500
Device Specifications Expert

(tbc)

£2500
Promotion, T&S & Conferences
ePub workshop (at Science Online London) £3500
T&S £2000
Technical and Operations: Equipment & Support
Consultant: Wordprocessing Tool Expert

Peter Sefton

£15000
Equipment: E-book device £150
Staff Development 0
Management and Administration 0
Other – pls specify (e.g. Rent, Utilities etc) 0
Total Directly Incurred Non-Staff (B) £35650
Total Directly Incurred Costs (C)
(A+B=C) £39993.13
Total Requested From JISC £39993.13
Posted in project plan | Tagged , | 1 Comment

Project Plan 5: Budget

Please note draft status: feedback welcome as comments – final  version liable to change

Here is a summary breakdown of the costs as associated with their tangible outputs:

○ 12.5% of budget spent on creating sign-post (table of contents) of deliverables by project manager.

○ 25% of budget spent on technical landscaping study delivered in two formats (HTML & ePub)

○ 37.5% of budget spent on two visionary technical exemplars for how the authoring-wordprocessing platform could enable digital monograph publishing.

○ 12.5% of budget spent on author (end user) viewpoint commentary (via blog posts) on how the landscape recommendations, exemplars and prototypes affect the author and implications for various author groups.

○ 12.5% spent updating wikipedia pages on any and all devices that are applicable to publishing scholarly monographs upon, including applicability and technical specification of device for use with digital formats such as ePub and other formats for use as identified in the landscape study.

Budget Itemisation:

Staff and consultancy:
Project Manager £5,000
Technical Publishing Expert £10,000
Technical Wordprocessing Tool Expert £15,000
Authoring User Perspective Expert £5,000
Device Technical Specifications Expert £5,000
Total:
Total Budget £40,000
Posted in project plan | Tagged , , | Comments Off on Project Plan 5: Budget