<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Digital Monograph Technical Landscape study #jiscPUB</title>
	<atom:link href="http://jiscpub.blogs.edina.ac.uk/feed/" rel="self" type="application/rss+xml" />
	<link>http://jiscpub.blogs.edina.ac.uk</link>
	<description>A technical landscaping pilot on Open Access publishing of scholarly monographs to multiple devices (eBook/ Tablet/ Mobile) in one click</description>
	<lastBuildDate>Tue, 14 Feb 2012 15:53:12 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>New JISC podcast featuring the #jiscPUB report</title>
		<link>http://jiscpub.blogs.edina.ac.uk/2012/02/14/jisc-feature-jiscpub/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=jisc-feature-jiscpub</link>
		<comments>http://jiscpub.blogs.edina.ac.uk/2012/02/14/jisc-feature-jiscpub/#comments</comments>
		<pubDate>Tue, 14 Feb 2012 15:47:25 +0000</pubDate>
		<dc:creator>Nicola Osborne</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[#jiscPUB]]></category>
		<category><![CDATA[JISC]]></category>
		<category><![CDATA[podcast]]></category>
		<category><![CDATA[report]]></category>

		<guid isPermaLink="false">http://jiscpub.blogs.edina.ac.uk/?p=438</guid>
		<description><![CDATA[Today sees the publication of a new JISC blog post, &#8220;How important are open ebook standards to universities?&#8221; and podcast featuring Ben Showers, JISC Programme Manager for Digital Infrastructure, discussing the Digital Monograph Technical Landscape: Exemplars and Recommendations Final Report. We first published &#8230; <a href="http://jiscpub.blogs.edina.ac.uk/2012/02/14/jisc-feature-jiscpub/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Today sees the publication of a new JISC blog post, &#8220;<em><a href="http://www.jisc.ac.uk/blog/how-important-are-open-ebook-standards-to-universities/#more-1157" target="_blank">How important are open ebook standards to universities?</a></em>&#8221; and <a href="http://www.jisc.ac.uk/blog/how-important-are-open-ebook-standards-to-universities/#more-1157" target="_blank">podcast</a> featuring Ben Showers, JISC Programme Manager for Digital Infrastructure, discussing the <a href="http://jiscpub.blogs.edina.ac.uk/final-report/" target="_blank">Digital Monograph Technical Landscape: Exemplars and Recommendations Final Report</a>.</p>
<p>We first published the report in November 2011 and, as the<a href="http://www.jisc.ac.uk/blog/how-important-are-open-ebook-standards-to-universities/#more-1157" target="_blank"> JISC post</a> discusses, we have already seen several major ebook announcements. This has included Apple&#8217;s launch of iBooks2, significant news for the whole education sector since the software allows embedding of multimedia and more <a href="http://transliteracy.com/" target="_blank">transliterate(1)</a> ebook design.  It was launched with a number of innovative and visual textbooks giving a sense of how the ePub format can be creatively exploited.</p>
<p>The Apple news followed a bumper Christmas for ebook reader sales, particularly of the Kindle and iPad, indeed according to an estimate (based on a YouGov poll of 2,012 adults) reported in <a href="http://www.pocket-lint.com/news/43689/kindle-and-ipad-dominated-christmas" target="_blank">Pocket-lint</a>:</p>
<blockquote><p>&#8220;A staggering one in every 40 adults in Britain woke up to find an ebook reader under the tree on Christmas morning&#8221;</p></blockquote>
<p>Even the Man Booker jury have reportedly adopted <a href="http://www.guardian.co.uk/books/2011/jan/28/ebook-revolution-accelerates-sales" target="_blank">Kindles</a> to assess the nominations this year.</p>
<p>These developments not only bring ebook readers into the mainstream but they also mean that an increasing number of students and academic staff will be adopting these tools making it an ideal time for universities to focus on how they can better engage ebooks, whether supporting their community or taking a lead in adopting and publishing directly. The timing could not be better to read or take another look at our <a href="http://jiscpub.blogs.edina.ac.uk/final-report/">Final Report</a> on ebook publishing and the implications for Higher Education.</p>
<p>As Theo Andrew, Project Manager for this work says:</p>
<blockquote><p> &#8221;Over the last year or so ebook devices have really grabbed the attention and imagination of the general public. The academic community now has a good opportunity to utilise these technologies to present their work in new transformative ways. This timely report describes the current scene and highlights some of the key challenges that the sector faces with adopting and creating content for consumption on ebook readers. It finishes by making some specific recommendations on what actions are needed for the sector to fully take advantage of the many opportunities that ebooks provide.&#8221;</p></blockquote>
<p>So, do take a look at the <a href="http://www.jisc.ac.uk/blog/how-important-are-open-ebook-standards-to-universities/#more-1157" target="_blank">JISC post and podcast</a>, read the report &#8211; which is available in <a href="http://jiscpub.blogs.edina.ac.uk/2011/12/02/final-post/" target="_blank">various formats</a> of course &#8211; and share your thoughts on the <a href="http://jiscpub.blogs.edina.ac.uk/final-report/" target="_blank">Final Report page</a> or right here. We&#8217;d be particularly love to hear your own thoughts and experiences of reading and interacting with higher education related ebooks.</p>
]]></content:encoded>
			<wfw:commentRss>http://jiscpub.blogs.edina.ac.uk/2012/02/14/jisc-feature-jiscpub/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Final Post</title>
		<link>http://jiscpub.blogs.edina.ac.uk/2011/12/02/final-post/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=final-post</link>
		<comments>http://jiscpub.blogs.edina.ac.uk/2011/12/02/final-post/#comments</comments>
		<pubDate>Fri, 02 Dec 2011 15:59:52 +0000</pubDate>
		<dc:creator>Theo</dc:creator>
				<category><![CDATA[Final Post]]></category>

		<guid isPermaLink="false">http://jiscpub.blogs.edina.ac.uk/?p=404</guid>
		<description><![CDATA[This is our final post on the jiscPUB blog which draws together all the key project information and main achievements. Project tag: #jiscPUB Description: The Digital Monograph Technical Landscape study (a.k.a. #jiscPUB) was a six month thinktank set up by &#8230; <a href="http://jiscpub.blogs.edina.ac.uk/2011/12/02/final-post/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><em>This is our final post on the jiscPUB blog which draws together all the key project information and main achievements.</em></p>
<p><strong>Project tag:</strong> #jiscPUB</p>
<p><strong>Description:</strong> The Digital Monograph Technical Landscape study (a.k.a. #jiscPUB) was  a  six  month thinktank set up by the JISC in the first half of 2011 to   explore  the potential value that the use of the ePUB specification   could bring  to the Higher and Further Education sector if further   adopted in UK  Universities.</p>
<p><strong>Key deliverables 1: </strong>Exemplars &amp; Recommendations report</p>
<p>The project final report describes the historical perspective on electronic publishing, with details on how digital books are authored, both in a  scholarly context and in general ebook production terms, before describing future work that could be actionable and relevant to a  scholarly publishing audience, with a goal towards providing better  tooling for both authors and readers of scholarly works. The report is available in a variety of formats:</p>
<p>i) Online at the <a href="http://jiscpub.blogs.edina.ac.uk/final-report/">Final Report page</a> on this blog.</p>
<p>ii) Common ebook formats &#8211; <a href="http://edina.ac.uk/presentations_publications/jiscpub.epub">epub</a> (usable on most devices), <a href="http://edina.ac.uk/presentations_publications/jiscpub.mobi">mobi</a> (for Kindle users) and <a href="http://edina.ac.uk/presentations_publications/jiscpub.pdf">pdf</a> (for everyone else).</p>
<p><strong>Key deliverables </strong><strong>2: </strong>Tool investigation</p>
<p>The project think-tank team investigated the .epub format, and looked at various tools to create ebook formats from traditional word processing software, e.g. MS Word or OpenOffice, non-conventional platforms, e.g. blogs, and also experimental authoring environments, e.g. &#8216;desktop repositories&#8217;<strong>. </strong>These findings are published as a series of blog posts:<strong><br />
</strong></p>
<ul>
<li><a href="http://jiscpub.blogs.edina.ac.uk/2011/04/05/epub-for-word-processing-users/">EPub for word processing users</a></li>
<li><a href="http://jiscpub.blogs.edina.ac.uk/2011/04/11/metadata-in-word-processing-monographs/">Metadata in word processing monographs</a></li>
<li><a href="http://jiscpub.blogs.edina.ac.uk/2011/04/14/introducing-epub2html-adding-a-plain-html-view-to-an-epub/">Introducing Epub2Html – adding a plain HTML view to an EPUB</a></li>
<li><a href="http://jiscpub.blogs.edina.ac.uk/2011/05/03/how-to-add-epub-support-to-eprints-8/">How to add EPUB support to EPrints</a></li>
<li><a href="http://jiscpub.blogs.edina.ac.uk/2011/05/11/anthologize-a-wordpress-based-collection-tool/">Anthologize: a WordPress based collection tool</a></li>
<li><a href="http://jiscpub.blogs.edina.ac.uk/2011/05/25/making-epub-from-wordpress-and-other-web-collections/">Making EPUB from WordPress (and other) web collections</a></li>
<li><a href="http://jiscpub.blogs.edina.ac.uk/2011/07/15/the-repository-is-watching-automated-harvesting-from-replicated-filesystems-2/">The repository is watching: automated harvesting from replicated filesystems</a></li>
<li><a href="http://jiscpub.blogs.edina.ac.uk/2011/08/03/template-design-issues-for-wordprocessors-and-possible-future-epub-export/">Template design issues for WordProcessors and (possible future) EPUB export</a></li>
</ul>
<p><strong>Key deliverables </strong><strong>3: </strong>Device Usability Study</p>
<p>Project think-tank members also carried out lightweight usability testing of common devices that could be used in an academic setting. The findings are set out in a series of blog posts on the UKOLN Dev blog:<strong><br />
</strong></p>
<ul>
<li><a href="http://blogs.ukoln.ac.uk/ukolndev/2011/06/28/project-sunflower/">Project Description</a></li>
<li><a href="http://blogs.ukoln.ac.uk/ukolndev/2011/06/29/ereaders-to-be-used-for-research/">Devices for Research</a></li>
<li><a href="http://blogs.ukoln.ac.uk/ukolndev/2011/06/29/the-unboxing-experience/">The Unboxing Experience</a></li>
<li><a href="http://blogs.ukoln.ac.uk/ukolndev/2011/06/29/kindle-dx-ipad-and-xoom-purchasing-and-installing-content-and-integration/">Purchasing and Installing Content, and Integration</a></li>
<li><a href="http://blogs.ukoln.ac.uk/ukolndev/2011/07/12/time-to-launch-application-open-a-book-and-flip-page/">Time to Launch Application, Open a Book and Flip Page</a></li>
<li><a href="http://blogs.ukoln.ac.uk/ukolndev/2011/07/22/project-sunflower-usabliity-study/">Usability Study</a></li>
<li><a href="http://blogs.ukoln.ac.uk/ukolndev/2011/07/25/project-sunflower-ipad-usability-study/">iPad Usability Study</a></li>
<li><a href="http://blogs.ukoln.ac.uk/ukolndev/2011/07/28/project-sunflower-kindle-usability-study/">Kindle DX Usability Study</a></li>
<li><a href="http://blogs.ukoln.ac.uk/ukolndev/2011/07/29/project-sunflower-xoom-usability-study/">XOOM Usability Study</a></li>
<li><a href="http://blogs.ukoln.ac.uk/ukolndev/2011/08/02/project-sunflower-comparison-based-on-neilsens-heuristics/">Evaluation Based on Neilsen’s Heuristics</a></li>
</ul>
<p><strong>Key deliverables 4: </strong>User insights</p>
<p>The project think-tank members also carried out a number of focus grops with Early Career Researchers and Postgraduate Students at the University of Edinburgh. Insights from these groups fed into the other key deliverables. The wider picture of how ebooks and new forms of authorship could fit into emerging humanities research was also considered in a blog post here:</p>
<ul>
<li><a href="http://jiscpub.blogs.edina.ac.uk/2011/05/23/a-view-from-academia-on-digital-humanities/">A view from academia on digital humanities research</a></li>
</ul>
<p><strong>Lead Institution</strong>: EDINA &#8211; The University of Edinburgh</p>
<p><strong>Person responsible for documentation:</strong> Theo Andrew</p>
<p><strong>Project partners and roles</strong><strong>:</strong> Project Manager: Theo Andrew (EDINA), Technical Publishing expert &amp; Report Author: Liza Daly (Threepress Consulting Ltd.), Technical Tools expert: Peter Sefton (formerly Australian Digital Futures   Institute), Device reader &amp; Usability expert: Emma Tonkin (UKOLN), Usability advisor: Harsh Khatri (University of Bath) and Programme Manager: David F. Flanders (JISC).</p>
<p><strong>Project started:</strong> Feb 2011</p>
<p><strong>Project finished:</strong> July 2011, extended to Dec 2011</p>
<p><strong>Project budget</strong>: £39,993</p>
<p><a href="http://jiscpub.blogs.edina.ac.uk/files/2011/12/jisc.png"><img class="alignleft size-full wp-image-417" src="http://jiscpub.blogs.edina.ac.uk/files/2011/12/jisc.png" alt="" width="53" height="48" /></a><em>The Digital Monograph Technical Landscape study (#jiscPUB) was supported by JISC as part of it&#8217;s Repository Infrastructure Programme.</em></p>
<p style="text-align: center"><a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/"><img style="border-width: 0" src="http://i.creativecommons.org/l/by-nc-sa/2.5/scotland/88x31.png" alt="Creative Commons Licence" /></a><br />
<span>This blog</span> hosted by <a href="http://edina.ac.uk">EDINA</a> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/">Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://jiscpub.blogs.edina.ac.uk/2011/12/02/final-post/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Template design issues for WordProcessors and (possible future) EPUB export</title>
		<link>http://jiscpub.blogs.edina.ac.uk/2011/08/03/template-design-issues-for-wordprocessors-and-possible-future-epub-export/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=template-design-issues-for-wordprocessors-and-possible-future-epub-export</link>
		<comments>http://jiscpub.blogs.edina.ac.uk/2011/08/03/template-design-issues-for-wordprocessors-and-possible-future-epub-export/#comments</comments>
		<pubDate>Wed, 03 Aug 2011 07:09:13 +0000</pubDate>
		<dc:creator>Peter Sefton</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://jiscpub.blogs.edina.ac.uk/?p=309</guid>
		<description><![CDATA[This document is a collection of notes on how to design word processing templates for creating EPUBs – particularly theses. It&#8217;s probably not very interesting as a general read. The intended audience is support and technical staff who are working &#8230; <a href="http://jiscpub.blogs.edina.ac.uk/2011/08/03/template-design-issues-for-wordprocessors-and-possible-future-epub-export/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<div>
<div>
<p>This document is a collection of notes on how to design word processing templates for creating EPUBs <span class="spCh spChx2013">–</span> particularly theses. It&#8217;s probably not very interesting as a general read. The intended audience is support and technical staff who are working with theses, and preparing for ebook creation projects. It may be of use to projects following from jiscPUB, particularly in the area of thesis management submission and deposit where theses are required to be published in HTML and/or EPUB. These notes are incomplete <span class="spCh spChx2013">–</span> this is not a full <span class="spCh spChx201c">“</span>word processing for theses<span class="spCh spChx201d">”</span> book and it does not provide an answer of how to actually create EPUB theses of high quality from word processing documents, although I produced some <a href="http://jiscpub.blogs.edina.ac.uk/category/workpackage-3/"><span>promising demonstrations of the potential during my work on this project</span></a>.</p>
<p>To make theses produced with a word processor available as EPUB it is a given that the word processor, or some other application which can read word processing documents needs to be able to produce good quality HTML. Given HTML EPUB can be created even if the word processing package or content management system being used is not capable of exporting EPUB natively. As Liza Daly notes in the final report for this project that&#8217;s difficult to achieve from arbitrary word processing documents, which is why it is useful to design a template, documentation and training that helps users to choose features in their word processors, such as using defined styles rather than direct formatting.</p>
<p>I was involved in a word-processor based web publishing project at the University of Southern Queensland from 2004 to 2010. The project, the <a href="http://ice.usq.edu.au/"><span>Integrated Content Environment </span></a>produced some templates, toolbars for creating documents, and HTML conversion code all released under an open source license. I refer to that project a lot here, as it dealt with many of the relevant issues in setting up word processors for academic use, including <a href="http://ice.usq.edu.au/packages/user_guide/default.htm"><span>fairly comprehensive documentation about how to do things the right way</span></a>. There is <a href="http://code.google.com/p/integrated-content-environment/"><span>a fork of the project on Google Code</span></a> which I added to during the jiscPUB project.</p>
<p>This document takes a general look at template design and provides some specific examples and advice for two applications, Microsoft Word and OpenOffice.org Writer (including the new LibreOffice fork and the other derivatives); many of the issues are the same for other word processing packages but this project didn&#8217;t have the resources to explore all the available options such as Apple&#8217;s Pages and Google docs. Another option is the open source  <a href="http://www.lyx.org/"><span>LyX</span></a> word processor (or document processor as the creators call it) which, with some training and template development may suit some candidates. But note that it would need to be run in a <strong>very</strong> well-supported environment.</p>
<p>In this document I refer to the<a href="https://www.wiki.ed.ac.uk/display/HowTowiki/Thesis+Template+for+Postgraduate+Students"><span> thesis template provided by the university of Edinburgh</span></a> and use it as the basis for some examples.</p>
<h1><a id="id2"><span> </span></a>Templates vs <span class="spCh spChx201c">“</span>document prototype factories<span class="spCh spChx201d">”</span></h1>
<p>Templates are a starting point for creating different genres of document. When properly installed, they allow users to choose something like <code>File / New / From template...</code> and to pick the kind of document they want; a thesis chapter, a paper, report or blog post.  But they suffer from several usability and maintenance issues in today&#8217;s  computing environment.</p>
<ul class="lib">
<li>If you click to open a template, it spawns a new document. In my experience users tend to save these new documents wherever they work normally <span class="spCh spChx2013">–</span> maybe in a shared drive, often on the desktop, and leave the template where they downloaded it. So the most likely place a template will end up living is in the Downloads or Desktop folder <span class="spCh spChx2013">–</span> where it is not subject to version control or management.</li>
<li>In OpenOffice.org the template system is arcane and difficult to navigate <span class="spCh spChx2013">–</span> it is possible to import a template via the user interface but it is complicated.</li>
<li>My advice here is not to attempt to distribute templates unless it is possible to do so via something like a standard institutional desktop, but to make blank prototype documents available for download from a content management system or a shared directory, and to put in place managed processes, automated if possible for creating the prototype documents; creating something along the lines of a &#8216;document prototype factory&#8217;.</li>
</ul>
<p><strong>If you decide to maintain a family of document templates</strong>:</p>
<ul class="lib">
<li>Try to share as much as possible between document prototypes/template, including style names, and if possible the same fonts and margins to reduce maintenance overhead.</li>
<li>Maintain the core styles and common elements from the templates in one place <span class="spCh spChx2013">–</span> a &#8216;master&#8217; template.
<ul class="lib">
<li>When making changes make them in the master template and then import the changes into the other templates/prototypes.</li>
<li>Consider automating production of sets of styles using macros or by producing the raw XML for .docx or .odt files. The ICE system, for example, contains macros that create a complete set of styles on demand using default settings. This means (if the macros work and I&#8217;m not sure that they do 100%) that a new template can be created by setting margins, and the font and spacing for a couple of base elements, and having the machine generate all the rest.</li>
</ul>
</li>
</ul>
<h1><a id="id5"><span> </span></a>Granularity</h1>
<p>One of the fundamental choices to make in designing templates for long documents like theses is whether to manage the document in one long file or to break it up into multiple chapters.</p>
<p>Historically, it was important to work on compound documents for performance reasons. These days, performance is probably not a major problem, with most computers having plenty of RAM, but there are still reasons why compound documents make sense, for example where a resource is to be assembled out of a range of source documents, or other objects. It makes particular sense in collaborative environments, where multiple parties are working on a project and editing different chapters. Theses are not usually meant to be collaborative (although that might be changing) but in the absence of collaboration infrastructure which can manage comments from a supervisor, sending off chapter one to a supervisor to add comments while the candidate works on chapter two allows for simpler management than mailing off the whole thesis for comment, and then having to integrate the two versions.</p>
<p>The major problem with the compound approach is when it comes time to join the thesis into a single final product for printing.</p>
<p>Microsoft Word has long had a reputation for poor performance in managing master documents. I have not checked this in detail in the latest version but I would urge any template project to check its performance carefully before relying on it. The simple approach of copy-pasting multiple things into one once the thesis is finished, as recommended in the help text in the Edinburgh thesis template is possibly the most reliable but it can be time-consuming and small differences in formatting that have crept in to the various chapter documents can cause problems.</p>
<p>The ICE project used compound documents because its focus was course documents which were authored by multiple parties, but our initial experiments with OpenOffice.org master documents assembled by computer program were not a success, so we settled on an approach which automated copying and pasting things together, according to a table-of-contents-like manifest, to produce a final compound file, avoiding all sorts of complexities to do with differences between page layout and changes to styles, which can occur by accident.</p>
<p>With the rapid rise of ebook readers and a shift away from paper-based publishing, we should, in the academy be considering that thesis submission is a web-based process, possibly with EPUB as a container format, with thesis projects taking a few years to complete, the time for projects that consider how thesis authoring and submission should work is <strong>now</strong>.</p>
<h2><a id="id6"><span> </span></a>How to set up a master document</h2>
<p>In this section I have some sketchy instructions for setting up compound theses via master documents, note that these instructions are a starting point only.</p>
<p>In Writer you can turn a long document into a master document with multiple parts <span class="spCh spChx2013">–</span> I put examples of these in the <a href="http://jiscpub.blogs.edina.ac.uk/2011/07/15/the-repository-is-watching-automated-harvesting-from-replicated-filesystems-2/"><span>demonstration system</span></a> for the jiscPUB project.</p>
<ul class="lib">
<li>First, use styles for your headings.</li>
<li>Work out which heading style is being used for Chapter headings. In the Edinburgh template it&#8217;s Heading 1 <span class="spCh spChx2013">–</span> in a typical ICE thesis it would be <code>h1n</code> or <code>Title Chapter</code>. I have used <code>h1</code> in the examples.</li>
</ul>
<p>If you are using Writer:</p>
<ul class="lib">
<li>From the File menu choose <code>Send</code>, then <code>Create Master Document</code>.</li>
<li>In the <code>Template</code> dropdown choose the style that&#8217;s used for chapter headings and provide a file name.<a><span> </span></a><img class="fr2" style="border: 0px;vertical-align: middle" src="http://ptsefton.com/wp-content/uploads/2011/08/m2715afee_407x68.jpeg" alt="graphics4" width="407" height="68" /></li>
<li>Click <code>Save</code>.</li>
<li>The application will create a series of files, one for each block that starts with the chapter style. (Or at least it should <span class="spCh spChx2013">–</span> there seem to be bugs in LibreOffice 3.3.3 and the splitting feature didn&#8217;t work for me). The resulting master document will contain all the front matter text with the chapters included. I recommend moving this to a sub document too:
<ul class="lib">
<li>Select all the front matter and Cut it.</li>
<li>In the navigator in the Master document, right-click on the first chapter (eg my-thesis1.odt) and choose Insert New Document.</li>
<li>Paste the front-matter into the new document.</li>
<li>Save the new document as my-thesis0.odt (for example).</li>
</ul>
</li>
<li>To make the chapters usable as stand-alone documents:
<ul class="lib">
<li>Open each document.</li>
<li>In Tools / Outline Numbering, set the Start at number for the first outline level to the chapter number.<a><span> </span></a><img class="fr2" style="border: 0px;vertical-align: middle" src="http://ptsefton.com/wp-content/uploads/2011/08/m298a8618_305x247.jpeg" alt="graphics8" width="305" height="247" /></li>
</ul>
</li>
</ul>
<p>To perform the same trick in Microsoft Word you have to split the document manually.</p>
<ul class="lib">
<li>Copy and paste all the chapters and the front-matter into a series of files.</li>
<li>Create a new, blank document with the correct margins and styles.</li>
<li>Switch to outline view via the status bar, bottom right of the document window.<a><span> </span></a><img class="fr2" style="border: 0px;vertical-align: middle" src="http://ptsefton.com/wp-content/uploads/2011/08/7c2ddd02.png" alt="graphics5" width="106" height="25" /></li>
<li>In the Outlining tab, click Show document (this makes more options appear).<a><span> </span></a><img class="fr2" style="border: 0px;vertical-align: middle" src="http://ptsefton.com/wp-content/uploads/2011/08/26e8ab9a_451x65.jpeg" alt="graphics6" width="451" height="65" /></li>
<li>For each chapter, click Insert, then pick the chapter and click Open.</li>
<li>To make sure that each chapter has the correct number and title when you are managing in individually:
<ul class="lib">
<li>For each chapter, open the file itself.</li>
<li>Copy the text from the chapter heading and enter it as the document title in the File tab under Info, Properties, Title.<a><span> </span></a><img class="fr2" style="border: 0px;vertical-align: middle" src="http://ptsefton.com/wp-content/uploads/2011/08/m78c03bcd_620x356.jpeg" alt="graphics7" width="620" height="356" /></li>
<li>In the chapter heading, right click and choose <code>Numbering</code>, then <code>Set Numbering Value...</code><a><span> </span></a><img class="fr2" style="border: 0px;vertical-align: middle" src="http://ptsefton.com/wp-content/uploads/2011/08/m6a47fae6.gif" alt="graphics10" width="357" height="338" /></li>
<li>Choose the chapter number in Set value to:  and click OK.</li>
</ul>
</li>
</ul>
<p>In either Word or Writer you now have a series of stand alone documents that you can edit one at a time, and a master document that contains the whole book.</p>
<h2><a id="id7"><span> </span></a>Converting the files to HTML</h2>
<p>Converting the thesis to HTML can now be done either chapter by chapter, for example as a series of posts or pages in WordPress, or by via the master document, with the usual caveats that word processors tend to make poor HTML. One drawback of the approach I have outlined here is that each of the sub documents uses the Heading 1 style as its title so when converted to HTML as a stand alone document has a slightly odd structure. Dealing with this kind of document structure is something for a (forthcoming) wish-list of features for a  good quality HTML converter <span class="spCh spChx2013">–</span> it should be able to normalise headings in the documents it outputs, and &#8216;do the right thing&#8217; with each document delineated by article tags, containing sections. HTML5 has specific rules about document outlines which allow for re-combining content from multiple fragments.</p>
<h1><a id="id8"><span> </span></a>Styles</h1>
<p>Styles are one of the key innovations that make word processing useful for technical and academic content. A style is a named bundle of formatting attributes that can be attached to a paragraphs, span of text inside a paragraph and to lists and table structures.</p>
<p>The most basic use of styles (and the only area where there is anything like a cross-application standard approach) is using heading styles to structure a document. Most word processors use <code>Heading 1</code>, <code>Heading 2</code>, and so on attached to paragraphs as the standard way to create a document outline. That is where the quasi-standardisation ends, though, there are no widely used standards for the other things we need in academic documents.</p>
<p>A thesis template should have, at a minimum styles for:</p>
<ul class="lib">
<li>Headings</li>
<li>Metadata</li>
<li>Block-quotes</li>
<li>Examples</li>
<li>Pre-formatted code.</li>
</ul>
<p>For example the ICE system specifies a set of styles which has been used for several years for producing academic documents at the University of Southern Queensland.  The styles are summarised here. In a table, updated from <a href="http://www.xml.com/pub/a/2005/01/26/hacking-ooo.html"><span>one which originally appeared in an article at xml.com</span></a>. These style names were chosen to be mostly very short, so they would be easy to see in the interface in both Word and OpenOffice.org, particularly in Word&#8217;s view that shows style names on the left.</p>
<p>&nbsp;</p>
<div class="Table1" style="width: 100%;margin: 0px;padding: 0px;text-align: left">
<table class="Table1" style="border-spacing: 0;width: 100%;border-collapse: collapse;border: 1.0px solid #808080">
<col style="width: 3.242cm"></col>
<col style="width: 4.888cm"></col>
<col style="width: 1.067cm"></col>
<col style="width: 1.067cm"></col>
<col style="width: 1.067cm"></col>
<col style="width: 1.175cm"></col>
<tbody>
<tr>
<td class="Table1_A1" style="vertical-align: top;border-bottom: 1.0px solid #808080;border-left: 1.0px solid #808080;border-right: none;border-top: 1.0px solid #808080;padding: 0.049cm" rowspan="2">Family</td>
<td class="Table1_B1" style="vertical-align: top;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: 1.0px solid #808080;padding: 0.049cm" rowspan="2">Type</td>
<td class="Table1_C1" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: 1.0px solid #808080;border-top: 1.0px solid #808080;padding: 0.049cm" colspan="5">Style names</td>
</tr>
<tr>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">1</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">2</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">3</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">4</td>
<td class="Table1_G2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: 1.0px solid #808080;border-top: none;padding: 0.049cm">5</td>
</tr>
<tr>
<td class="Table1_A3" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: 1.0px solid #808080;border-right: none;border-top: none;padding: 0.049cm">Paragraph (p)</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">p-centre, p-right, p-indent*</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">p</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm"></td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm"></td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm"></td>
<td class="Table1_G2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: 1.0px solid #808080;border-top: none;padding: 0.049cm"></td>
</tr>
<tr>
<td class="Table1_A3" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: 1.0px solid #808080;border-right: none;border-top: none;padding: 0.049cm">Heading (h)</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm"></td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">h1</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">h2</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">h3</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">h4</td>
<td class="Table1_G2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: 1.0px solid #808080;border-top: none;padding: 0.049cm">h5</td>
</tr>
<tr>
<td class="Table1_A3" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: 1.0px solid #808080;border-right: none;border-top: none;padding: 0.049cm">Heading (h)</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">Numbered (n)</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">h1n</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">h2n</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">h3n</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">h4n</td>
<td class="Table1_G2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: 1.0px solid #808080;border-top: none;padding: 0.049cm">h5n</td>
</tr>
<tr>
<td class="Table1_A3" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: 1.0px solid #808080;border-right: none;border-top: none;padding: 0.049cm">List item (li)</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">Numbered (n)</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">li1n</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">li2n</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">li3n</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">li4n</td>
<td class="Table1_G2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: 1.0px solid #808080;border-top: none;padding: 0.049cm">li5n</td>
</tr>
<tr>
<td class="Table1_A3" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: 1.0px solid #808080;border-right: none;border-top: none;padding: 0.049cm">List item (li)</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">Bullet (b)</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">li1b</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">li2b</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">li3b</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">li4b</td>
<td class="Table1_G2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: 1.0px solid #808080;border-top: none;padding: 0.049cm">li5b</td>
</tr>
<tr>
<td class="Table1_A3" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: 1.0px solid #808080;border-right: none;border-top: none;padding: 0.049cm">List item (li)</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">Uppercase Alpha (A)</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">li1A</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">li2A</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">li3A</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">li4A</td>
<td class="Table1_G2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: 1.0px solid #808080;border-top: none;padding: 0.049cm">li5A</td>
</tr>
<tr>
<td class="Table1_A3" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: 1.0px solid #808080;border-right: none;border-top: none;padding: 0.049cm">List item (li)</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">Lowercase Alpha (a)</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">li1a</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">li2a</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">li3a</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">li4a</td>
<td class="Table1_G2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: 1.0px solid #808080;border-top: none;padding: 0.049cm">li5a</td>
</tr>
<tr>
<td class="Table1_A3" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: 1.0px solid #808080;border-right: none;border-top: none;padding: 0.049cm">List item (li)</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">Lowercase Roman (i)</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">li1i</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">li2i</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">li3i</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">li4i</td>
<td class="Table1_G2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: 1.0px solid #808080;border-top: none;padding: 0.049cm">li5i</td>
</tr>
<tr>
<td class="Table1_A3" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: 1.0px solid #808080;border-right: none;border-top: none;padding: 0.049cm">List item (li)</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">Lowercase Roman (I)</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">li1I</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">li2I</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">li3I</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">li4I</td>
<td class="Table1_G2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: 1.0px solid #808080;border-top: none;padding: 0.049cm">li5I</td>
</tr>
<tr>
<td class="Table1_A3" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: 1.0px solid #808080;border-right: none;border-top: none;padding: 0.049cm">List item (li)</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">Continuing paragraph (p)</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">li1p</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">li2p</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">li3p</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">li4p</td>
<td class="Table1_G2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: 1.0px solid #808080;border-top: none;padding: 0.049cm">li5p</td>
</tr>
<tr>
<td class="Table1_A3" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: 1.0px solid #808080;border-right: none;border-top: none;padding: 0.049cm">Blockquote (bq)</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm"></td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">bq1</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">bq2</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">bq3</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">bq4</td>
<td class="Table1_G2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: 1.0px solid #808080;border-top: none;padding: 0.049cm">bq5</td>
</tr>
<tr>
<td class="Table1_A3" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: 1.0px solid #808080;border-right: none;border-top: none;padding: 0.049cm">Definition List</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">Term (dt)</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">dt1</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">dt2</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">dt3</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">dt4</td>
<td class="Table1_G2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: 1.0px solid #808080;border-top: none;padding: 0.049cm">dt5</td>
</tr>
<tr>
<td class="Table1_A3" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: 1.0px solid #808080;border-right: none;border-top: none;padding: 0.049cm">Definition List</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">Description (dd)</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">dd1</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">dd2</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">dd3</td>
<td class="Table1_C2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: none;border-top: none;padding: 0.049cm">dd4</td>
<td class="Table1_G2" style="vertical-align: middle;border-bottom: 1.0px solid #808080;border-left: none;border-right: 1.0px solid #808080;border-top: none;padding: 0.049cm">dd5</td>
</tr>
</tbody>
</table>
</div>
<p>It was not intended that users have to type these or even select them from a drop-down, rather they would use a add-in interfaces which aided them. The first generation is described in the XML.com article I wrote. This used a hierarchical menu system (also keyboard navigable) which was the same in n Word and OpenOffice.</p>
<p><a><span> </span></a><img class="fr5" style="border: 0px;vertical-align: middle" src="http://ptsefton.com/wp-content/uploads/2011/08/1685e391_478x243.jpeg" alt="graphics2" width="478" height="243" /></p>
<p>&nbsp;</p>
<p>The second generation of this interface is the ICE toolbar, which uses a set of buttons very like those in most modern editing application, but which tries to <span class="spCh spChx201c">“</span>Do the right thing<span class="spCh spChx201d">”</span> and apply styles, documented at the ICE site.</p>
<p><a><span> </span></a><img class="fr3" style="border: 0px;vertical-align: top" src="http://ptsefton.com/wp-content/uploads/2011/08/m1aadae35_529x109.jpeg" alt="graphics9" width="529" height="109" /></p>
<h2><a id="id9"></a>Saving as HTML</h2>
<p>Using styles does not do a lot to help the quality of HTML exported from our two word processors out of the box but many third-party applications for creating HTML do try to use styles, for example ICE, or the commercial <a href="http://www.systemiksolutions.com/transit.htm"><span>HTML Transit</span></a>.</p>
<p>(I gather that ICE is no longer actively maintained by USQ  <span class="spCh spChx2013">–</span> I&#8217;m using it here as an example of the kinds of interfaces that make it easier for users to apply styles than the defaults that come with their word processing packages <span class="spCh spChx2013">–</span> it is open source, so organisations who wanted to develop templates like the ones used in ICE could adopt part or all of it).</p>
<h2><a id="id10"></a></h2>
<h2><a id="id11"></a>Heading numbering and document outlines</h2>
<p>One of the key benefits of using heading styles is that they allow for automatic tables of contents, and to use an outline view of a document.</p>
<p>One issue that needs to be dealt with is document numbering. It is possible to attach numbering to styles so that the headings in a document are numbered. The simplest case is to map style names to numbers <span class="spCh spChx2013">–</span> but there are use-cases where documents have both numbered and non-numbered parts <span class="spCh spChx2013">–</span> and special cases such as appendices which might be sections at the same level as, say, chapters but have different numbers.</p>
<p>ICE (barely) manages to deal with this complexity by using a compound document approach with each chapter or appendix stored in a separate file. The ICE system was designed to be aggressively interoperable between the OpenOffice family and Word, which imposed a major limitation <span class="spCh spChx2013">–</span> OOo Writer can only tie ONE style to each numbering level in the document outline <span class="spCh spChx2013">–</span> with the added complication that recent versions do support  &#8216;outline level&#8217; as a paragraph attribute, although this is not tied to the Outline numbering feature as far as I can tell.</p>
<h2><a id="id12"></a>Lists</h2>
<p>List structures are one of the most difficult things to deal with in word processing, for template design, HTML export and for basic usability.</p>
<ul class="lib">
<li>Word&#8217;s lists have historically been very unstable. There are multiple ways to make lists in Word, including direct formatting, named lists outlines, anonymous list outlines and list styles, many of which are almost impossible to access in the new Ribbon interface that Word moved to in version 2007, there have also been many changes to the way Word handles lists and list styles over the years making this a very complicated topic.</li>
<li>Writer&#8217;s list support is close to unusable.  The Open Document format which is the native file format has lists as a first-class object and has provision for a document to contain hierarchical list structures like those in HTML. The problem is that in a paragraph-based editing environment it is almost impossible for an author to understand the hierarchical structure of their lists <span class="spCh spChx2013">–</span> there are only very small cues in the interface to show you what level of list a particular item is on, for example, and the process of adding an extra paragraph into a list, without a bullet is bizarrely complicated <span class="spCh spChx2013">–</span> it is not a matter of applying formatting or styling, but a structural manipulation which is at odds with the way word processors typically work.</li>
</ul>
<p>Interoperability is a problem <span class="spCh spChx2013">–</span> when transferring documents with or without styles between Word and Writer, lists often break, numbering is destroyed, and indenting changes. Even when using styles, when Writer saves to the <code>.doc</code> format, instead of creating word styles for lists corresponding to its internal ones it creates new ones. So, the result even of saving a Writer document and reloading it back in to Writer breaks documents.</p>
<p>Against this background I think it is worth describing the ICE approach to interoperability here as an illustration of the sort of thinking that is needed in a heterogeneous application environment.</p>
<p>In ICE there is a standard set of list style names which is implemented differently in Word and OpenOffice. Both share a set of paragraph styles with the same name, li1b for a first-level bullet list, li2n for a second level numbered list item and so on.</p>
<dl>
<dt>In Word </dt>
<dd>Each paragraph style is tied to a named list outline (not a list style), so the list styles <code>li1b</code>, <code>li2b</code> et al are attached to a single outline called <code>lib</code>. While Word has these named outlines they are difficult to access reliably <span class="spCh spChx2013">–</span> there is no way to pick one off a list, they only appear in galleries and if the one you want is not showing you cannot access it. In ICE use of these lists is entirely by macros which can repair them when they break. (And they do).</dd>
<dt>In Writer </dt>
<dd>There is a corresponding List style for each paragraph style, and when a user uses the ICE toolbar or menus to apply a paragraph style, a macro applies the relevant list style at the same time Writer has long had an option to tie a paragraph style to a list style, but it doesn&#8217;t work reliably.</dd>
</dl>
<p>In both cases when things go wrong there is a macro that cycles through every paragraph in the document and re-applies each style, including making sure that is a paragraph is in li1b style it is attached to the correct list. In Word, there is a macro to reset its list formatting, rebuilding each named list outline, as Word has a tendency to do what can only be described as &#8216;go crazy&#8217; and have all the lists in a document change formatting (I have not checked up on this in the latest version, but I have no reason to think that this has been fixed).</p>
<h3><a id="id13"></a>Saving as HTML</h3>
<p>Saving lists as HTML is one of the worst performing areas for word processors. Their algorithms typically do a very poor job.  Word 2010 still saves list items as paragraphs with formatting rather than as list structures, and the OpenOffice.org family produce non-standard, often flat-out wrong structures. The ICE approach of a full consistent set of styles means that ICE can create properly structured output, including correctly nesting block quotes and non-numbered paragraphs inside complex list structures. It does this by using the level numbers in ICE styles to work out what should be nested inside what.</p>
<p>In a potential new service for converting word processing content to HTML this could be extended to deal not only with a standard set of style names, but to infer structure in other situations as well , indenting being one of the major cues (that seems obvious, but the current algorithms in word processors and in browser based editors manage to get it wrong <span class="spCh spChx2013">–</span> they produce odd structures that are almost certainly not what any author was trying to mean).</p>
<h2><a id="id14"></a>Metadata</h2>
<p>I <a href="http://jiscpub.blogs.edina.ac.uk/2011/04/11/metadata-in-word-processing-monographs-2/"><span>looked at metadata in a blog post</span></a>.</p>
<h2><a id="id15"></a>Embedding images</h2>
<p>By default in Word and OpenOffice, if you paste in or create an image or other inline object such as a chart or drawing it &#8216;floats&#8217; relative to the content. The idea is that objects can be placed on a page. For web and ebook publishing this is not useful and it leads to lots of frustrations. Unless very fine grained support for image placement is required for print publication it is usually best to anchor images as characters rather than as floating objects.</p>
<ul class="lib">
<li>Anchor images and objects as characters.In Writer:
<ul class="lib">
<li>Right click on an embedded object and choose <code>Anchor</code>, <code>As Character</code>.</li>
</ul>
<p>In Word:</p>
<p>Right click on the object and choose <code>Wrap Text</code>, <code>Inline with Text</code></li>
<li>Use the in-built vector drawing packages for diagramming, but:
<ul class="lib">
<li>Don&#8217;t draw on the document as though it were paper, insert objects that contain drawings.
<ul class="lib">
<li>In Writer: From the  <code>Insert</code> menu choose, <code>Object</code>, <code>OLE Object</code>, (Name of application) <code>Drawing</code></li>
<li>In Word: From the Insert Tab choose Shapes, New Drawing Canvas.</li>
</ul>
</li>
</ul>
</li>
<li>Use the inbuilt Maths editors in either platform.</li>
</ul>
<h2><a id="id17"></a>0.1 Maths</h2>
<p>Maths support on the web has been a problem, but things are slowly improving. The ideal is to use MathML which is part of HTML. Current practice on the web often involves the use of LaTeX as a source for mathematics which is then rendered into HTML via other tools. There are commercial plugins for both Word and Writer that can deal with LaTeX markup.</p>
<p>Word 2007 and 2010 and Writer call export MathML and save MathML inside their file formats, although this does not happen when you save as HTML, so it should be possible to automate production of high quality output to HTML given the resources. As far as I know, nobody has done this yet.</p>
<p>For casual use of maths, using the approach I describe below of generating images using the Word processor&#8217;s inbuilt Save as HTML, which creates images of the maths is probably adequate but is far from ideal where mathematics is a key part of the content.</p>
<p>&nbsp;</p>
<h3><a id="id18"></a>Converting to images to HTML</h3>
<p>One of the areas where many HTML conversion projects fall down is images. Because office suites have tight integration with drawing and presentation applications, and inbuilt maths rendering etc it is often very difficulty for external code to render anything but a plain-text document or with images that already in web formats such as JPEG or PNG as HTML from a word processor file The ICE application uses OpenOffice to render inline objects from both Word and Writer documents, and in parallel created HTML from the XML source files.</p>
<p>&nbsp;</p>
<p class="Illustration" style="width: 696px"><span><a></a><img class="fr6" style="border: 0px;vertical-align: top" src="http://ptsefton.com/wp-content/uploads/2011/08/m592aad7c.gif" alt="Object1" width="697" height="379" /></span>Illustration 1: Diagram showing how an HTML converter can use the word processor to create web ready images, while still creating HTML from the XML inside its native document format (.docx or .odt)</p>
<p>In a previous project I worked on with some members of the ICE team we simply used the HTML output from Word 2000 and massaged it to be much better quality HTML, discarding the formatting that Word outputs and using the style names (which are output as classes) to generate the HTML.</p>
<p>This is an important area because the various integration features which allow authors to embed charts and vector graphics and so on are one of the main reasons to keep using word processors. If candidates are working on theses that are exclusively text, then using a tool-chain such as asciiidoc with a wiki text format or Pandoc may be worth considering.</p>
<h3><a id="id19"></a>Reference managers</h3>
<p>There is no space here for a full evaluation of reference managers such as EndNote, Zotero or Mendeley, all of which have integration with word processors, for now candidates must be assisted in choosing an appropriate tool for their discipline and institution. Regarding the future, JISC is investing in this area with support for the Open Bibliography project <span class="spCh spChx2013">–</span> one important dimension of this will be working out how cost and effort across the entire sector can be reduced by simplifying and rationalising the process of citing works. If we have a large scale open bibliography available, then referencing in many disciplines could be as simple as linking to a URI for a resource in that shared bibliography <span class="spCh spChx2013">–</span> with all the details of presenting citations and reference lists handled automatically.</p>
<h2><a id="id20"></a>Tables of contents, figures etc</h2>
<p>Both Word and Writer have extensive automation features for tables of contents and tables of figures etc as demonstrated in the Edinburgh template. It is important to set up examples and encourage candidates to use them. A template should have the required tables of contents for headings, figures etc already in place with examples and instructions on how to insert figures etc so they are numbered. Most of these should probably be discarded in exported HTML and EPUB versions and appropriate native HTML versions prepared automatically by software.</p>
<h1><a id="id21"></a>Summary</h1>
<p>Any new template design process needs to consider all of he above (and more) in multiple cycles, until a stable set of design constraints emerges.</p>
<ul class="lib">
<li>Interoperability requirements. The range of packages you want users to be able to work with imposes constraints on which features can be used. Current trends such as tablet computing, and the rise of vertical plaftorms such as Apple&#8217;s iOSX devices need to be given consideration.(On the ICE project, several years ago we decided to support OpenOffice.org Writer and Microsoft Word to ensure cross-platform coverage across Windows, Mac and Linux <span class="spCh spChx2013">–</span> today&#8217;s environment is very different, but during the ICE project our each-way bet paid off when Microsoft dropped support for Visual Basic scripting in the Mac version of office <span class="spCh spChx2013">–</span> we were able to keep coverage for the style toolbar on that platform by offering Writer to Mac users.)</li>
<li>Whether to support single-file theses or multi file theses or both. Multiple files will increase the need to provide support, and possibly require the use of external tools, but for modern research theses, the ability to aggregate different things such as data files together is attractive.</li>
<li>A set of styles and/or other guidelines.</li>
<li>How to make HTML and EPUB versions of the content. If an application can produce HTML then that can be converted to EPUB automatically. It is producing HTML of sufficient quality that is a problem.</li>
</ul>
<p>The ICE system I have continually referenced throughout this document was one fully worked example of all of the above considerations. It was not designed for theses, although it was tested on them and found to be adequate. But ICE is several years old, so re-doing this process now would produce a different design. Some of the key insights from the ICE design process include:</p>
<ul class="lib">
<li>Templates need to be immediately useful to their users. That is, people have to be able to see the point of what they are being asked to do/fill-in. For theses this is simpler than for some other types of document, the institution can say to a candidate: <span class="spCh spChx201c">“</span>Use this!<span class="spCh spChx201d">”</span> or, <span class="spCh spChx201c">“</span>Your thesis must meet the formatting criteria we specify, here is a template that helps<span class="spCh spChx201d">”</span>.</li>
<li>Following from the above point, rapid feedback is required <span class="spCh spChx2013">–</span> if the final deliverable is expected to be an ebook, amongst other formats, make sure there is a system in place to show the candidate and their supervisor to</li>
<li>The document authoring system needs to be integrated into institutional processes, so making the authoring system part of the supervisor/candidate conversation, and automating submission will be important.</li>
</ul>
<p>While this document has looked at some design issues for templates, it does not provide a solution to the question it is trying to answer; how to set up an environment for creating EPUB theses from word processing source files. I will produce one final blog post for this project outlining some potential solutions to some of the issues raised in this document as a guide to where JISC might or might not like to invest in future work.</p>
<p class="center">Copyright <span><span>Peter Sefton</span></span>, 2011-07-25. Licensed under <span>Creative Commons Attribution-Share Alike 2.5 Australia</span>. &lt;http://creativecommons.org/licenses/by-sa/2.5/au/&gt;</p>
<p class="center"><span class="Default_20_Paragraph_20_Font"><span><span class="T1"><a></a><img class="fr4" style="border: 0px;vertical-align: top" src="http://ptsefton.com/wp-content/uploads/2011/08/m40ca94ba.png" alt="graphics1" width="88" height="31" /></span></span></span></p>
<p class="center">This post was written in OpenOffice.org, using templates and tools provided by the <a href="http://ice.usq.edu.au/"><span>Integrated Content Environment</span></a> project.</p>
<p class="center">&nbsp;</p>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://jiscpub.blogs.edina.ac.uk/2011/08/03/template-design-issues-for-wordprocessors-and-possible-future-epub-export/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The repository is watching: automated harvesting from replicated filesystems</title>
		<link>http://jiscpub.blogs.edina.ac.uk/2011/07/15/the-repository-is-watching-automated-harvesting-from-replicated-filesystems-2/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=the-repository-is-watching-automated-harvesting-from-replicated-filesystems-2</link>
		<comments>http://jiscpub.blogs.edina.ac.uk/2011/07/15/the-repository-is-watching-automated-harvesting-from-replicated-filesystems-2/#comments</comments>
		<pubDate>Fri, 15 Jul 2011 04:47:25 +0000</pubDate>
		<dc:creator>Peter Sefton</dc:creator>
				<category><![CDATA[Workpackage 3]]></category>

		<guid isPermaLink="false">http://jiscpub.blogs.edina.ac.uk/2011/07/15/the-repository-is-watching-automated-harvesting-from-replicated-filesystems-2/</guid>
		<description><![CDATA[Managing a thesis Demonstration Installation notes One of the final things I&#8217;m looking at on this jiscPUB project is a demonstration of a new class of tool for managing academic projects &#8211; not just documents. For a while we were &#8230; <a href="http://jiscpub.blogs.edina.ac.uk/2011/07/15/the-repository-is-watching-automated-harvesting-from-replicated-filesystems-2/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<div>
<div class="page-toc">
<ul>
<li><a href="#id2"><span>Managing a thesis</span></a></li>
<li><a href="#id3"><span>Demonstration</span></a></li>
<li><a href="#id4"><span>Installation notes</span></a></li>
</ul>
</div>
<div>
<p><span style="font-style:normal"><span class="T2"></span></span></p>
<p class="P1"><span style="font-style:normal"><span class="T2">One of the final things I&#8217;m looking at on this jiscPUB project is</span></span> <span style="font-style:normal"><span class="T2">a demonstration of a new class of tool for managing academic projects <span class="spCh spChx2013">&#8211;</span> not just documents. For a while we were calling this idea the <span class="spCh spChx201c">&#8220;</span></span></span><a href="http://ptsefton.com/2011/02/16/another-look-at-desktop-repositories.htm"><span style="font-style:normal"><span class="T2">Desktop Repository</span></span></a><span style="font-style:normal"><span class="T2"><span class="spCh spChx201d">&#8221;</span>, the idea being that there would be repository services watching your entire hard disk and exposing all the content in a local website with repository and content management services <span class="spCh spChx2013">&#8211;</span> that&#8217;s possibly a very useful class of application for some academics, but in this project we are looking at a slightly different slant on that idea.</span></span></p>
<p class="P2">The core use case I&#8217;m illustrating here is thesis writing, but the same workflow would be useful across a lot of academic projects, including all the things we&#8217;re focussing on in the jiscPUB project <span class="spCh spChx2013">&#8211;</span> academic users managing their portfolio of work, project reporting and courseware management. This tool is about a lot more than just ebook publishing, but I will look at that aspect of it, of course.</p>
<p class="P2">In this post I will show some screenshots of  The Fascinator repository in action, talk about how you can get involved in trying it out, and finish with some technical notes about installation and setup. I was responsible for leading the team that built this software at the University of Southern Queensland. Development is now being done at the University of Central Queensland and the Queensland Cyber Infrastructure Foundation where Duncan Dickinson and Greg Pendlebury continue work on the <a href="http://code.google.com/p/redbox-mint/"><span>ReDBox research data repository</span></a> which is based on the same platform. </p>
<p class="P2">I know Theo Andrew at Edinburgh is keen to get some people trying this. So this blog post will serve to introduce it and give his team some ideas <span class="spCh spChx2013">&#8211;</span> we&#8217;ll follow up on their experiences if there are useful findings.</p>
<h1><a id="id2" name="id2"><span /></a>Managing a thesis</h1>
<p>The short version of how this thesis story might work is:</p>
<ul class="lib">
<li>
<p>The university supplies the candidate with a dropbox-like shared file system they can use from pretty much any device to access their stuff. But there&#8217;s a twist <span class="spCh spChx2013">&#8211;</span> there is a web-based repository watching the shared folder and exposing everything there to the web.</p>
</li>
<li>
<p>The university helpfully adds into the share a thesis template that&#8217;s ready to go, complete with all the cover page stuff, margins all set, automated tables of contents for sections and tables and figures and the right styles <span class="spCh spChx2013">&#8211;</span> and trains the candidate in the basics of word processing.</p>
</li>
<li>
<p>The candidate works away on their project, keeping all their data, presentations, notes and so on in the Dropbox and filling out the thesis template as they go.</p>
</li>
<li>
<p>The supervisor can drop in on the work in progress and leave comments via an annotation system.</p>
</li>
<li>
<p>At any time, the candidate can grab a group, which we call a package of things to publish to a blog or deposit to a repository at the click of a button. This includes not just documents, but data files (the ones that are small enough to keep in a replicated file system), images, presentations etc.</p>
</li>
<li>
<p>The final examination process could be handled using the same infrastructure and the university could make its own packages of all the examiners reports etc for deposit into a closed repository.</p>
</li>
</ul>
<p>The result is web-based, web-native scholarship where everything is available in HTML, not just PDF or application file formats and there are easy ways to route content to other repositories or publish it in various ways.</p>
<p>Where might ebook dissemination fit into this?</p>
<p>Well, pretty much anywhere in the above that someone wants to either take a digital object &#8216;on the road&#8217; or deposit it in a repository of some kind as a bounded digital thing.</p>
<h1><a id="id3" name="id3"><span /></a>Demonstration</h1>
<p>I have put a copy of Joss Winn&#8217;s MA thesis into the system to show how it works. It is <a href="http://ec2-50-19-86-198.compute-1.amazonaws.com/portal/default/detail/2fcd657c012eb2d4f5f011a37b1d33fb/#22cc9a7fae17e01c1c699c970858ecf0"><span>available in the live system</span></a> (note that this might change if people play around with it). I took an old OpenOffice .sxw file Joss sent me and changed the styles a little bit to use the ICE conventions, I&#8217;m writing up a much more detailed post about templates in general, so stay tuned for a discussion of the pros and cons of various options for choosing style names and conventions and whether or not to manage the document as a single file or multiple chapters.</p>
</p>
<p class="Illustration" style="width:643px"><span><a name="graphics2"><span /></a><img alt="graphics2" class="fr4" height="287" src="http://ptsefton.com/wp-content/uploads/2011/07/1772b368_643x287.jpeg" style="border:0px;vertical-align: top" width="643" /></span>Illustration 1: The author puts their stuff in the local file system, in this case replicated by Dropbox.</p>
</p>
<p class="Illustration" style="width:643px"><span><a name="graphics7"><span /></a><img alt="graphics7" class="fr4" height="323" src="http://ptsefton.com/wp-content/uploads/2011/07/m6338db22_643x323.jpeg" style="border:0px;vertical-align: top" width="643" /></span>Illustration 2: A web-view of Joss Winn&#8217;s thesis. </p>
<p>The interface provides a range of actions.
</p>
<p class="Illustration" style="width:359px"><span><a name="graphics9"><span /></a><img alt="graphics9" class="fr4" height="421" src="http://ptsefton.com/wp-content/uploads/2011/07/m74679952.png" style="border:0px;vertical-align: top" width="359" /></span>Illustration 3: You can do things with content in The Fascinator including blogging and export to zip or (experimental) EPUB</p>
<p>The EPUB export was put together as a demonstration for the Beyond The PDF effort by Ron Ward. A the moment it only works on packages, not individual documents, and it is using some internal Python code to stitch together documents, rather than calling out to Calibre as I did in <a href="http://jiscpub.blogs.edina.ac.uk/2011/04/05/epub-for-word-processing-users/"><span>earlier work on this project</span></a>. The advantage of doing it this way is that you don&#8217;t have Calibre adding extra stuff and reprocessing documents to add CSS <span class="spCh spChx2013">&#8211;</span> but the disadvantage is that a lot of what Calibre does is useful, for example working around known bugs in reader software, but it does tend to change formatting on you, not always in useful ways. </p>
<p>I put the EPUB into the dropbox so it is <a href="http://ec2-50-19-86-198.compute-1.amazonaws.com/portal/default/detail/fac25c6c6183ffe6c1b874d02e0fe620/"><span>available in the demo sit</span></a>e (you need to expand the Attachments box to get the download <span class="spCh spChx2013">&#8211;</span> that&#8217;s not great usability I know). Or you can <a href="http://ec2-50-19-86-198.compute-1.amazonaws.com/portal/default/detail/2fcd657c012eb2d4f5f011a37b1d33fb/#22cc9a7fae17e01c1c699c970858ecf0"><span>go to the package</span></a> and export it yourself. Log in first, using admin as a username and a the same for a password.</p>
</p>
<p class="Illustration" style="width:643px"><span><a name="graphics8"><span /></a><img alt="graphics8" class="fr4" height="329" src="http://ptsefton.com/wp-content/uploads/2011/07/m32aa0aef_643x329.jpeg" style="border:0px;vertical-align: top" width="643" /></span>Illustration 4: Joss Winn&#8217;s thesis exported as EPUB.</p>
<p>I looked a <a href="http://jiscpub.blogs.edina.ac.uk/2011/05/25/making-epub-from-wordpress-and-other-web-collections/"><span>different way of creating an EPUB book from the same thesi</span></a>s a while ago which will be available for <a href="http://ec2-50-19-86-198.compute-1.amazonaws.com/"><span>a while here at the Calibre server I set up</span></a>.</p>
<p class="P2">One of the features of this software is that more than one person can look at the web site <span class="spCh spChx2013">&#8211;</span> and there are extensive opportunities for collaboration.</p>
<p class="P2">
<p class="Illustration" style="width:643px"><a name="graphics5"><span /></a><img alt="graphics5" class="fr6" height="239" src="http://ptsefton.com/wp-content/uploads/2011/07/m15505cf3_643x239.jpeg" style="border:0px;vertical-align: middle" width="643" />Illustration 5: Colleagues and supervisors can leave comments via inline annotation (including annotating pictures and videos)</p>
</p>
<p class="Illustration" style="width:364px"><span><a name="graphics6"><span /></a><img alt="graphics6" class="fr5" height="272" src="http://ptsefton.com/wp-content/uploads/2011/07/m60f1be58.png" style="border:0px;vertical-align: top" width="364" /></span>Illustration 6: Annotations are threaded discussions</p>
</p>
<p class="Illustration" style="width:643px"><span><a name="graphics3"><span /></a><img alt="graphics3" class="fr4" height="350" src="http://ptsefton.com/wp-content/uploads/2011/07/m70dddee9_643x350.jpeg" style="border:0px;vertical-align: top" width="643" /></span>Illustration 7: Images and videos can be annotated too. At USQ we developed a Javascript toolkit called Anotar for this, the idea being you could add annotation services to any web site quickly and easily.</p>
<p>This thesis package only contains documents, but one of the strengths of The Fascinator platform is that it can aggregate all kinds of data, including images, spreadsheets, presentation and can be extended to deal with any kind of data file via plugins. I have added another package, modestly calling itself  <a href="http://ec2-50-19-86-198.compute-1.amazonaws.com/portal/default/detail/35ef5a6c8a43f8946e6480c52b9e8d87/#81ad16c49c0f7b7284edb82872eef547"><span>the research object of the future</span></a>, using some files supplied by Phil Bourne for the Beyond the PDF group The Fascinator makes web views of all the content <span class="spCh spChx2013">&#8211;</span> and can package it all as a zip file or an EPUB. </p>
</p>
<p class="Illustration" style="width:643px"><span><a name="graphics10"><span /></a><img alt="graphics10" class="fr4" height="355" src="http://ptsefton.com/wp-content/uploads/2011/07/2c8aa7f2_643x355.jpeg" style="border:0px;vertical-align: top" width="643" /></span>Illustration 8: A spreadsheet rendered into HTML and published into an EPUB file (demo quality only)</p>
<p>This includes turning PowerPoint into a flat web page.</p>
</p>
<p class="Illustration" style="width:643px"><span><a name="graphics11"><span /></a><img alt="graphics11" class="fr4" height="397" src="http://ptsefton.com/wp-content/uploads/2011/07/m578316a1_643x397.jpeg" style="border:0px;vertical-align: top" width="643" /></span>Illustration 9: A presentation exported to EPUB along with data and all the other parts of a research object</p>
<h1><a id="id4" name="id4"><span /></a><span style="font-style:normal"><span class="T2">Installation notes</span></span></h1>
<p class="P6"><span style="font-style:normal"><span class="T2">Installing The Fascinator</span></span>&#160; <span style="font-style:normal"><span class="T2">(I did it on Amazon&#8217;s EC2 cloud on Ubuntu 10.04.1 LTS) is straightforward. These are my notes <span class="spCh spChx2013">&#8211;</span> not intended to be a detailed how-to, but possibly enough for experienced programmers/sysadmins to work it out.</span></span></p>
<ul class="lib">
<li>
<p>Check it out.</p>
<pre>sudo svn co https://the-fascinator.googlecode.com/svn/the-fascinator/trunk /opt/fascinator</pre>
</li>
<li>
<p>Install Sun&#8217;s Java</p>
<pre><span class="Source_20_Text">sudo apt-get install python-software-properties</span></pre>
<pre><span class="Source_20_Text">sudo add-apt-repository ppa:sun-java-community-team/sun-java6</span></pre>
<pre><span class="Source_20_Text">sudo apt-get update</span></pre>
<pre><span class="Source_20_Text">sudo apt-get install sun-java6-jdk</span></pre>
<p><a href="http://stackoverflow.com/questions/3747789/how-to-install-the-sun-java-jdk-on-ubuntu-10-10-maverick-meerkat/3997220#3997220"><span class="Source_20_Text">http://stackoverflow.com/questions/3747789/how-to-install-the-sun-java-jdk-on-ubuntu-10-10-maverick-meerkat/3997220#3997220</span></a><span class="Source_20_Text"><span style="background-color:transparent;color:#000000;font-size:10.5pt;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal"><span class="T7"> </span></span></span></p>
</li>
<li>
<p>Install Maven 2.</p>
<pre>sudo apt-get install maven2</pre>
</li>
<li>
<p><a href="http://code.google.com/p/integrated-content-environment/wiki/InstalIICEServiceOnUbuntu"><span>Install ICE</span></a> or point your config at an ICE service. I have <a href="http://ec2-50-19-86-198.compute-1.amazonaws.com/api/convert"><span>one running for the jiscPUB project</span></a> <span class="spCh spChx2013">&#8211;</span> you can point to this by changing the <code>~/.fascinator/system-config.json</code> file.</p>
</li>
<li>
<p>Install Dropbox or your file replication service of choice <span class="spCh spChx2013">&#8211;</span> a little bit of work on a headless server but there are instruction linked from the Dropbox.com site.</p>
</li>
<li>
<p>Make some configuration changes, see below.</p>
</li>
<li>
<p>To run ICE and The Fascinator on their default ports on the same machine add this stuff to /etc/apache2/apache.conf (I think the proxy modules I&#8217;m using here is non-standard).</p>
<pre>LoadModule  proxy_module /usr/lib/apache2/modules/mod_proxy.so</pre>
<pre>LoadModule  proxy_http_module /usr/lib/apache2/modules/mod_proxy_http.so</pre>
<pre>ProxyRequests Off</pre>
<pre>&lt;Proxy *&gt;</pre>
<pre>Order deny,allow</pre>
<pre>Allow from all</pre>
<pre>&lt;/Proxy&gt;</pre>
<pre>ProxyPass        /api/ http://localhost:8000/api/</pre>
<pre>ProxyPassReverse /api/  http://localhost:8000/api/</pre>
<pre />
<pre>ProxyPass       /portal/ http://localhost:9997/portal/</pre>
<pre>ProxyPassReverse /portal/ http://localhost:9997/portal/</pre>
</li>
<li>
<p>Run it.</p>
<pre>cd /opt/fascinator</pre>
<pre>./tf.sh restart</pre>
</li>
</ul>
<p class="P9">Configuration follows:</p>
<ul class="lib">
<li>
<p><span style="font-style:normal"><span class="T2">To set up the harvester, add this to the empty jobs list in  </span></span><code><span style="font-style:normal"><span class="T2">~/.fascinator/system-config.json</span></span></code></p>
</li>
</ul>
<pre>&quot;jobs&quot; : [
&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;{
&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;&quot;name&quot;: &quot;dropbox-public&quot;,
&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;&quot;type&quot;: &quot;harvest&quot;,
&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;&quot;configFile&quot;:
&quot;${fascinator.home}/harvest/local-files.json&quot;,
&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;&quot;timing&quot;: &quot;0/30 * * * * ?&quot;
&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;} </pre>
<p class="P2" />
<p><i>And change</i> /harvest/local-files.json to point at the Dropbox directory</p>
<pre>"harvester": {</pre>
<pre>        "type": "file-system",</pre>
<pre>        "file-system": {</pre>
<pre>            "targets": [</pre>
<pre>                {</pre>
<pre>                    "baseDir": "${user.home}/Dropbox/",</pre>
<pre>                    "facetDir": "${user.home}/Dropbox/",</pre>
<pre>                    "ignoreFilter": ".svn|.ice|.*|~*|Thumbs.db|.DS_Store",</pre>
<pre>                    "recursive": true,</pre>
<pre>                    "force": false,</pre>
<pre>                    "link": true</pre>
<pre>                }</pre>
<pre>            ],</pre>
<pre>            "caching": "basic",</pre>
<pre>            "cacheId": "default"</pre>
<pre>        }</pre>
<p>To add the EPUB support and the red branding, unzip the skin files in this zip file into the portal/default/ directory: <a href="http://ec2-50-19-86-198.compute-1.amazonaws.com/portal/default/download/551148ce6d80bfc0c9c36914f9df4f91/jiscpub.zip"><span>http://ec2-50-19-86-198.compute-1.amazonaws.com/portal/default/download/551148ce6d80bfc0c9c36914f9df4f91/jiscpub.zip</span></a></p>
<pre>unzip -d /opt/fascinator/portal/src/main/config/portal/default/ jispub.zip</pre>
<p><span style="font-style:normal"><span class="T2"></span></span></p>
<p class="center">Copyright <span><span>Peter Sefton</span></span>, 2011-07-12. Licensed under <span>Creative Commons Attribution-Share Alike 2.5 </span><span>Australia</span>. &lt;http://creativecommons.org/licenses/by-sa/2.5/au/&gt;</p>
<p class="center"><span class="Default_20_Paragraph_20_Font"><span><span class="T1"><a name="graphics1"><span /></a><img alt="graphics1" class="fr3" height="31" src="http://ptsefton.com/wp-content/uploads/2011/07/m40ca94ba1.png" style="border:0px;vertical-align: top" width="88" /></span></span></span></p>
<p class="center">This post was written in OpenOffice.org, using templates and tools provided by the <a href="http://ice.usq.edu.au/"><span>Integrated Content Environment</span></a> project.</p>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://jiscpub.blogs.edina.ac.uk/2011/07/15/the-repository-is-watching-automated-harvesting-from-replicated-filesystems-2/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Making EPUB from WordPress (and other) web collections</title>
		<link>http://jiscpub.blogs.edina.ac.uk/2011/05/25/making-epub-from-wordpress-and-other-web-collections/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=making-epub-from-wordpress-and-other-web-collections</link>
		<comments>http://jiscpub.blogs.edina.ac.uk/2011/05/25/making-epub-from-wordpress-and-other-web-collections/#comments</comments>
		<pubDate>Wed, 25 May 2011 02:05:33 +0000</pubDate>
		<dc:creator>Peter Sefton</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Workpackage 3]]></category>

		<guid isPermaLink="false">http://jiscpub.blogs.edina.ac.uk/?p=258</guid>
		<description><![CDATA[Background As part of Workpackage 3 I have been looking at WordPress as a way of creating scholarly monographs. This post carries on from the last couple, but it&#8217;s not really about EPUB or about WordPress, it&#8217;s about interoperability and &#8230; <a href="http://jiscpub.blogs.edina.ac.uk/2011/05/25/making-epub-from-wordpress-and-other-web-collections/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<div>
<div class="page-toc">  </div>
<div>
<h1><a id="id2"><span /></a>Background</h1>
<p>As part of <a href="http://jiscpub.blogs.edina.ac.uk/2011/03/03/workpackage-3/"><span>Workpackage 3</span></a> I have been <a href="http://jiscpub.blogs.edina.ac.uk/2011/05/10/wordpress/"><span>looking at WordPress</span></a> as a way of creating scholarly monographs. This post carries on from the last couple, but it&#8217;s not really about EPUB or about WordPress, it&#8217;s about interoperability and how tools might work together in a <a href="http://scholarlyhtml.org/"><span>Scholarly HTML</span></a> mode so that people can package and repackage their resources much more reliably and flexibly than they can now.</p>
<p>While exploring WordPress I had a look at the JISC funded <a href="http://knowledgeblog.org/"><span>KnowledgeBlog project</span></a>. The team there has released a plugin for WordPress to show a table of contents made up of all the posts in a particular category. It seemed that with a bit of enhancement this could be a useful component of a production workflow for book-like project, particularly for project reports and theses (where they are being written online in content management systems <span class="spCh spChx2013">&#8211;</span> maybe not so common now, but likely to become more common) and for course materials.</p>
<p>Recently <a href="http://jiscpub.blogs.edina.ac.uk/2011/05/11/anthologize-a-wordpress-based-collection-tool/"><span>I looked</span></a> at <a href="http://anthologize.org/"><span>Anthologize</span></a>, a WordPress-based way of creating ebooks from HTML resources sourced from around the web (I noted a number of limitations which I am sure will be dealt with sooner or later). Anthologize is using a design pattern that I have seen a couple of times with EPUB, converting the multiple parts of a project to an XML format that already has some tools for rendering and using those tools to generate outputs like PDF or EPUB. Asciidoc does this using the DocBook tool-chain and Anthologize uses TEI tools. I will write more on this design pattern and its implications soon. There is another obvious approach; to leave things in HTML and build books from that, for example using <a href="http://calibre-ebook.com/"><span>Calibre</span></a> which already has ways to build ebooks from HTML sources. This is an approach which could be added to Anthologize very easily, to complement the TEI approach. </p>
<p>So, I have put together a workflow using Calibre to build EPUBs straight from a blog.</p>
<p>Why would you want to do this? Two main reasons. Firstly, to read a report, thesis or course, or an entire blog on a mobile device. Secondly, to be able to deposit a snapshot of same into a repository.</p>
<p>In this post I will talk about some academic works:</p>
<ul class="lib">
<li>
<p><a href="http://tait.josswinn.org/"><span>A thesis by Joss Winn </span></a>who is on the <a href="http://jiscpress.org/"><span>JISCPress project</span></a>.</p>
</li>
<li>
<p>The Calibre <a href="http://calibre-ebook.com/user_manual/cli/ebook-convert.html"><span>ebook-convert</span></a> tool and the Calibre <a href="http://calibre-ebook.com/user_manual/cli/calibre-server.html"><span>server</span></a>. I am running both of these as command line tools but there are versions you can run as desktop applications.</p>
</li>
<li>
<p>Project work from my own website including blog posts from the jiscPUB project workpackage 3. I set up a page <a href="http://ptsefton.com/projects/"><span>with two projects on it</span></a> to show how these can be compiled into a book together.</p>
</li>
<li>
<p>A <a href="http://digitalworlds.wordpress.com/about/"><span>set of open course materials on game design</span></a> by Tony Hirst which I have <a href="http://anthologize.ptsefton.com/digital-worlds"><span>imported into a test blog</span></a>.</p>
</li>
</ul>
<p>The key to this effort is the KnowledgeBlog table of contents plugin <a href="http://wordpress.org/extend/plugins/knowledgeblog-table-of-contents/"><span>ktoc</span></a>, with some enhancements <a href="http://code.google.com/r/ptsefton-schtml/source/browse#hg%2Ftrunk%2Fplugins%2Fkblog-table-of-contents"><span>I have added</span></a> to make it easier to harvest web content into a book. </p>
<p>The <a href="http://ec2-50-19-73-99.compute-1.amazonaws.com/"><span>results are available on a Calibre server I&#8217;m running in the Amazon cloud</span></a> <span class="spCh spChx2013">&#8211;</span> just for the duration of this project. (The server is really intended for local use, the way I am running it behind an Apache reverse proxy it doesn&#8217;t seem very happy <span class="spCh spChx2013">&#8211;</span> you may have to refresh a couple of times until it comes good). This is rough. It is certainly not production quality.</p>
<p><a><span /></a><img alt="graphics1" class="fr3" height="328" src="http://ptsefton.com/wp-content/uploads/2011/05/m4163bf9b_643x328.jpeg" style="border:0px;vertical-align: middle" width="643" /></p>
<p>These books are created using calibre &#8216;recipes&#8217;: <a href="https://bitbucket.org/wwmm/schtml/src/a5e876450421/recipes/"><span>available here</span></a>. You run them like this:</p>
<blockquote class="bq"><p><code>ebook-convert thesis-demo.recipe .epub </code><code>--test</code></p>
</blockquote>
<p>If you are just trying this out, to be kind to site owners <code>--test</code> will  cause it to only fetch a couple of articles per feed.</p>
<p>I added them to the calibre server like this:</p>
<blockquote class="bq"><p><code>calibredb add --library-path=./books</code> <code>thesis-demo.epub </code></p>
</blockquote>
<p>The projects page at my site has two TOCs for two different projects.</p>
<blockquote class="bq"><p><code>[ktoc cat="jiscPUB" title="Digital Monograph Technical Landscape study #jiscPUB"  show_authors="false"  orderby="date" toc_author="Peter Sefton"]</code></p>
<p><code>[ktoc cat="ScholarlyHTML" title="Scholarly HTML posts" orderby="date" show_authors="false" toc_author="Peter Sefton" ]</code></p>
</blockquote>
<p>I the title is used to create sections in the book, in both cases the post are displayed in date-order and I am not showing the name of the author on the page because that&#8217;s not needed when it is all me.</p>
<p>The resulting book has a nested table of contents, seen here in Adobe Digital Editions.</p>
</p>
<p class="Illustration" style="width:442px"><span><a><span /></a><img alt="graphics2" class="fr4" height="273" src="http://ptsefton.com/wp-content/uploads/2011/05/mbfd0b2d_442x273.jpeg" style="border:0px;vertical-align: top" width="442" /></span>Illustration 1: A book built from a WordPress page with two table of contents blocks generated from WordPress categories.</p>
<p>Read on for more detail about the process of developing these things and some comments about the problems I encountered working with multiple conflicting WordPress plugins, etc. </p>
<p />
<h1><a id="id3"><span /></a>The Scholarly HTML way to EPUB</h1>
<p class="P1">The first thing I tried in this exploration was writing a recipe to make an EPUB book from a Knowledge Blog, for the <a href="http://ontogenesis.knowledgeblog.org/table-of-contents"><span>Ontogenesis project</span></a>. It is a kind of encyclopaedia of ontology development maintained in a WordPress site with multiple contributors. It worked well, for a demonstration, and did not take long to develop. The <a href="https://bitbucket.org/wwmm/schtml/src/a5e876450421/recipes/ontogenesis.recipe"><span>Ontogenesis recipe is available here</span></a> and the resulting book is available on the <a href="http://ec2-50-19-73-99.compute-1.amazonaws.com/"><span>Calibre server</span></a>.</p>
<p class="P1">But there was a problem.</p>
<p class="P1">The second blog I wanted to try it on was my own, so I installed ktoc changed the URL in the recipe and ran it. Nothing. The problem is that Ontogenesis and my blog use different WordPress themes  so the structure is different. Recipes have stuff like this in them to locate the parts of a page, such as <code>&lt;p class='details_small'&gt;</code>:</p>
<blockquote class="bq"><p><code>remove_tags_before = dict(name='p', attrs={'class':'details_small'})</code></p>
<p><code>remove_tags_after  = dict(name='div', attrs={'class':'post_content'})</code></p>
</blockquote>
<p class="P1">That&#8217;s for Ontogenesis, different rules are needed for other sites.  You also need code to find the table of contents amongst all the links on a WordPress page, and deal with pages that might have two or more ktoc-generated tables for different sections of a journal, or parts of a project report. </p>
<p class="P1">Anyway, I wrote a different recipe for my site, but as I was doing so I was thinking about how to make this easier. What if:</p>
<ul class="lib">
<li>
<p>The ktoc plugin output a little more information in its list of posts that made it easy to find no matter what WordPress theme was being used.</p>
</li>
<li>
<p>The actual <i>post</i> part of each page (ie not the navigation, or ads) identified itself as such.</p>
</li>
<li>
<p>The same technique could be extended to other websites in general.</p>
</li>
</ul>
<p class="P1">There is already a standard way to do the most important part of this, listing a set of resources that make up an aggregated resource; the <a href="http://www.openarchives.org/ore/1.0/rdfa.html"><span>Object Reuse and Exchange specification, embedded in HTML using RDFa</span></a>. ORE in RDFa. Simple. </p>
<p class="P1">Well no, it&#8217;s not, unfortunately. ORE is complicated and has some very important but hard to grasp abstractions such the difference between an Aggregation, and a Resource Map. An Aggregation is a collection of resources which has a URI, while a resource map describes the relationship between the aggregation and the resources it aggregates. These things are supposed to have different URIs. Now, for a simple task like making a table of contents of WordPress posts machine-readable so you can throw together a book, these abstractions are not really helpful to developers or consumers. But what if there were a simple recipe/microformat <span class="spCh spChx2013">&#8211;</span> what we call a convention in Scholarly HTML <span class="spCh spChx2013">&#8211;</span>  to follow, which was ORE compliant and that was also simple to implement at both the server and client end?</p>
<p class="P1">What I have been doing over the last couple of days, as I continue this EPUB exploration is try to use the  ORE spec in a way that will be easy implement, say in the Digress.it TOC page, or in Anthologize, while still being ORE compliant. That discussion is ongoing, and will take place in the Google groups for <a href="http://scholarlyhtml.org/"><span>Scholarly HTML</span></a> and ORE. It is worth pursuing because if we can get it sorted out then with a few very simple additions to the HTML they spit out, <i>any</i> web system can get EPUB export quickly and cheaply by adhering to a narrowly defined profile of ORE <span class="spCh spChx2013">&#8211;</span> subject to the donor service being able to supply reasonable quality HTML. More sophisticated tools that do understand RDFa and ORE will be able to process arbitrary pages that use the Scholarly HTML convention, but developers can choose the simpler convention over a full implementation for some tasks. </p>
<p class="P1">The details may change, as I seek advice from experts, but basically, there are two parts to this.</p>
<p class="P1"><b>Firstly</b> there&#8217;s adding ORE semantics to the ktoc (or any) table of contents. It used to be a plain-old unordered list, with list items in it:</p>
<blockquote class="bq"><p><code>&lt;p&gt;&lt;strong&gt;Articles&lt;/strong&gt;&lt;/p&gt;<br />&lt;ul&gt;<br />&lt;li&gt;&lt;a href="http://ontogenesis.knowledgeblog.org/49"&gt;Automatic<br />maintenance of multiple inheritance ontologies&lt;/a&gt; by Mikel Egana<br /></code><code>Aranguren&lt;/li&gt;<br /></code><code>&lt;li&gt;&lt;a href="http://ontogenesis.knowledgeblog.org/257"&gt;Characterising<br />Representation&lt;/a&gt; by Sean Bechhofer and Robert Stevens&lt;/li&gt;<br />&lt;li&gt;&lt;a href="http://ontogenesis.knowledgeblog.org/1001"&gt;Closing Down<br />the Open World: Covering Axioms and Closure Axioms&lt;/a&gt; by Robert<br />Stevens&lt;/li&gt;<br />&lt;/ul&gt; </code></p>
</blockquote>
<p class="P1">The list items now explicitly say what is being aggregated. The plain old &lt;li&gt; becomes:</p>
<blockquote class="bq"><p><code>&lt;li &#160;rel=&quot;http://www.openarchives.org/ore/terms/aggregates&quot;<br />resource="http://ontogenesis.knowledgeblog.org/49"&gt; </code></p>
</blockquote>
<p class="P1">(The fact that this is an <code>&lt;li&gt;</code> does not matter, it could be any element.)</p>
<p class="P1">And there is a separate URI for the Aggregation and resource map <span class="spCh spChx2013">&#8211;</span> courtesy of different IDs. And the resource map says that it describes the Aggregation <span class="spCh spChx2013">&#8211;</span> as per the ORE spec.</p>
<blockquote class="bq"><p>&lt;div id=&#8221;AggregationScholarlyHTM<code>L"&gt;</code></p>
<p><code>&lt;div rel="http://www.openarchives.org/ore/terms/describes" resource="#AggregationScholarlyHTML" id="ResourceMapScholarlyHTML" </code><code>about="#ResourceMapScholarlyHTML"&gt;</code></p>
</blockquote>
<p class="P1">It is verbose, but nobody will have to type this stuff. What I have tried to do here (and it is a work in progress) is to simplify an existing standard which could be applied in any number of ways and boil it down to a simple convention that&#8217;s easy to implement but that still honours the more complicated specifications in the background. (Experts this will realise that I have used an RDFa 1.1 approach here, meaning that current RDFa processors will not understand, this is so that we don&#8217;t have to deal with namespaces and CURIES which complicate processing for non-native tools.)</p>
<p class="P1"><b>Secondly</b><span style="font-weight:normal"><span class="T5"> the plugin wraps a &lt;div&gt; element around the content for every post to label it as being </span></span><a href="http://scholarlyhtml.org/core-specification/"><span style="font-weight:normal"><span class="T5">scholarly HTML</span></span></a><span style="font-weight:normal"><span class="T5">, this is a way of saying that this part of the whole page is the content that makes up the article, thesis chapter or similar.  Without a marker like this finding the content is a real challenge where pages are loaded up with all sorts of navigation, decoration and advertisements, it is different on just about every site, and it can change at the whim of the blog owner if they change themes. </span></span></p>
<blockquote class="bq"><p><code>&lt;div rel="http://scholarly-html.org/schtml"&gt;</code></p>
</blockquote>
<h2><a id="id5"><span /></a>Why not define an even simpler format?</h2>
<p class="P1">It would be possible to come up with a simple microformat that had nice human readable class attributes or something to mark the parts of a TOC page. I didn&#8217;t do that because then people will rightly point out that ORE exists and we would end up with a convention that covered a subset of the existing spec, making it harder for tool makers to cover both and less likely that services will interoperate.</p>
<h2><a id="id6"><span /></a>So why not just use general ORE and RDFa?</h2>
<p class="P1">There are several reasons:</p>
<ul class="lib">
<li>
<p>Tool support is extremely limited for client and server side processing of full RDFa, for example in supporting the way namespaces are handled in RDFa using CURIES. (Sam Adams has pointed out that it would be a lot easier to debug my code if I did use CURIES and RDFa 1.0 <span class="spCh spChx2013">&#8211;</span> so I followed his advice, did some search and replacing and checked that the work I am doing here is indeed ORE compliant).</p>
</li>
<li>
<p>The ORE spec is suited only for experienced developers with a lot of patience for complexities like the difference between an aggregation and a resource map.</p>
</li>
<li>
<p>RDFa needs to apply to a whole page, with the correct document type <span class="spCh spChx2013">&#8211;</span> and that&#8217;s not always possible to do when we&#8217;re dealing with systems like WordPress. The convention approach means you can at least produce something that can become proper RDFa if put into the right context.</p>
</li>
</ul>
<h2><a id="id8"><span /></a>Why not use RSS/Atom feeds?</h2>
<p class="P1">Another way to approach this would be to use a feed, in RSS or Atom format. WordPress <a href="http://codex.wordpress.org/WordPress_Feeds"><span>has good support for feeds</span></a> <span class="spCh spChx2013">&#8211;</span> there&#8217;s one for just about everything. So you can look at all the posts on my website:</p>
<blockquote class="bq"><p> <a href="http://ptsefton.com/category/uncategorized/feed/atom"><span>http://ptsefton.com/category/uncategorized/feed/atom</span></a></p>
</blockquote>
<p class="P1">or use <a href="http://blog.ouseful.info/2009/02/02/single-item-rss-feeds-from-wordpress-blogs/"><span>Tony Hirst&#8217;s approach</span></a> to fetch a singe post from the jiscPUB blog</p>
<blockquote class="bq"><p><a href="http://jiscpub.blogs.edina.ac.uk/2011/05/23/a-view-from-academia-on-digital-humanities/feed/?withoutcomments=1"><span>http://jiscpub.blogs.edina.ac.uk/2011/05/23/a-view-from-academia-on-digital-humanities/feed/?withoutcomments=1</span></a></p>
</blockquote>
<p class="P1">The nice thing about this single post technique is that it gives you just the content in a content element <span class="spCh spChx2013">&#8211;</span> so there is no screen scraping involved. The problem is that the site has to be set up to provide full HTML versions of all posts in its feeds or you only get a summary. There&#8217;s a problem with using feeds on categories too, I believe, in that there is an upper limit to how many posts a WordPress site will serve. The site admin can change that to a larger number but then that will affect subscribers to the general purpose feeds as well. They probably don&#8217;t want to see three hundred posts in Google Reader when they sign up to a new blog.</p>
<p class="P1">Given that Atom (the best standardised and most modern feed format) is one of the official serialisation formats for ORE it is probably worth revisiting this question later if someone, such as JISC, decides to invest more in this kind of web-to-ebook-compiling application.</p>
<h1><a id="id9"><span /></a>What next?</h1>
<p>There are some obvious things that could be done to further this work:</p>
<ul class="lib">
<li>
<p>Set up a more complete and robust book server which builds and rebuilds books from particular sites and distributes them in some way, using Open Publication Distribution System (<a href="http://opds-spec.org/specs/opds-catalog-1-0-20100830/"><span>OPDS</span></a>) or something like <a href="http://delivereads.com/"><span>this thing that sends stuff to your Kindle</span></a>. </p>
</li>
<li>
<p>Write a &#8216;recipe factory&#8217;. With a little more work the ScholarlyHTML recipe can be got to the point where the only required variable is a single page URL <span class="spCh spChx2013">&#8211;</span> everything else can be harvested from the page or over-ridden by the recipe.</p>
</li>
<li>
<p>Combining the above to make a WordPress plugin that can create EPUBs from collections of in-built content (tricky because of the present calibre dependency but it could be re-coded in PHP).</p>
</li>
<li>
<p>Add the same ScholarlyHTML convention for ORE to other web systems such as the Digress.it plugin and Anthologize. Anthologize is appealing because it allows you to order resources in &#8216;projects&#8217; and nest them into &#8216;parts&#8217; rather than being based on simple queries but at the moment it does not actually have a way to publish a project directly to the web.</p>
</li>
<li>
<p>Explore the same technique in the next phase of WorkPackage 3 when I return to looking at word processing tools and examine how cloud replication services like DropBox might help people to manage book-like projects that consist of multiple parts.</p>
</li>
</ul>
<h1><a id="id11"><span /></a>Postscript: Lessons and things that need fixing or investiging</h1>
<p>I encountered some issues. Some of these are mentioned above but I wanted to list them here as fodder for potential new projects.</p>
<ul class="lib">
<li>
<p>As with Anthologize, if you use the WordPress RSS importer to bring-in content it does not change the links between posts so they point to the new location. Likewise with importing a WordPress export file.</p>
</li>
<li>
<p>The RSS importer applied to the thesis created hundreds of blank categories.</p>
</li>
<li>
<p>I tried to add my ktoc plugin to a Digress.it site, but ran into problems. It uses PHP&#8217;s simplexml parser which chokes on what I am convinced is perfectly valid XML in unpredictable ways. And the default Digress.it configuration expects posts to be formatted in a particular way <span class="spCh spChx2013">&#8211;</span> as a list of top-level paragraphs, rather than with nested divs. I will follow this up with the developers.</p>
</li>
<li>
<p>Calibre does a pretty good job of taking HTML and making it into EPUBs but it does have its issues. I will work through these on the relevant forums as time permits.</p>
<ul class="lib">
<li>
<p>There are some encoding problems with the table of contents in some places. Might be an issue with my coding in the recipes.</p>
</li>
<li>
<p>Unlike other Calibre workflows, such as creating books from raw HTML, <code>ebook-convert</code> adds navigation to each HTML page in the book created by a recipe. This navigation is redundant in an EPUB, but apparently it would require a source code change to get rid of it.</p>
</li>
<li>
<p>It does something complicated to give each book its style information. There are some odd presentation glitches in the samples as a result of Calibre&#8217;s algorithms. This requires more investigation.</p>
</li>
<li>
<p>It doesn&#8217;t find local links between parts of a book (ie links from one post to another which occur a lot in my work and in Tony&#8217;s course), but I have coded around that in the Scholarly HTML recipes.</p>
</li>
</ul>
</li>
</ul>
<p>It will be up to Theo Andrew, the project manager if any of these next steps or issues get any attention during  the rest of this project.</p>
<p class="center">Copyright <span><span>Peter Sefton</span></span>, 2011-05-25. Licensed under <span>Creative Commons Attribution-Share Alike 2.5 Australia</span>. &lt;http://creativecommons.org/licenses/by-sa/2.5/au/&gt;</p>
<p class="center"><span class="Default_20_Paragraph_20_Font"><span><span class="T1"><a><span /></a><img alt="graphics3" class="fr2" height="31" src="http://ptsefton.com/wp-content/uploads/2011/05/m40ca94ba3.png" style="border:0px;vertical-align: top" width="88" /></span></span></span></p>
<p class="center">This post was written in OpenOffice.org, using templates and tools provided by the <a href="http://ice.usq.edu.au/"><span>Integrated Content Environment</span></a> project.</p>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://jiscpub.blogs.edina.ac.uk/2011/05/25/making-epub-from-wordpress-and-other-web-collections/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>A view from academia on digital humanities research</title>
		<link>http://jiscpub.blogs.edina.ac.uk/2011/05/23/a-view-from-academia-on-digital-humanities/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=a-view-from-academia-on-digital-humanities</link>
		<comments>http://jiscpub.blogs.edina.ac.uk/2011/05/23/a-view-from-academia-on-digital-humanities/#comments</comments>
		<pubDate>Mon, 23 May 2011 14:00:21 +0000</pubDate>
		<dc:creator>Theo</dc:creator>
				<category><![CDATA[News]]></category>

		<guid isPermaLink="false">http://jiscpub.blogs.edina.ac.uk/?p=237</guid>
		<description><![CDATA[This is a guest blog post from Charlotte Hastings which describes a event recently held at the Centre for Research in the Arts, Social Sciences and Humanities (CRASSH) looking at the impact of the digital humanities. Charlotte is a graduate &#8230; <a href="http://jiscpub.blogs.edina.ac.uk/2011/05/23/a-view-from-academia-on-digital-humanities/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><em>This is a guest blog post from <a href="http://edinburgh.academia.edu/CharlotteHastings">Charlotte Hastings</a> which describes a event recently held at the Centre for Research in the Arts, Social Sciences and Humanities (CRASSH) looking at the impact of the digital humanities. Charlotte is a graduate student from the <a href="http://www.ed.ac.uk/schools-departments/education">Moray House School of Education</a> at the University of Edinburgh who has been researching gender and the development of education policy in colonial Nigeria.</em></p>
<p>I’m interested in digital publishing following a focus  group organised by the #jiscPUB project into attitudes to ebooks amongst  researchers. I’m really just starting to find out about digital  publishing. As a way to find out more, and to report back to the project  team on current initiatives and thinking I attended the recent Centre  for Research in the Arts, Social Sciences and Humanities (CRASSH)  seminar on digital humanities for early career researchers (ECRs) &amp;  postgrads: <a href="http://www.crassh.cam.ac.uk/events/1610/">The future might be digital</a>.</p>
<p><a href="http://jiscpub.blogs.edina.ac.uk/files/2011/05/main_1610.jpg"><img class="aligncenter size-full wp-image-248" src="http://jiscpub.blogs.edina.ac.uk/files/2011/05/main_1610.jpg" alt="" width="200" height="283" /></a></p>
<p><img src="/DOCUME%7E1/tandrew/LOCALS%7E1/Temp/moz-screenshot-1.png" alt="" /></p>
<p>A full room heard the varied programme, with a wide range of research interests represented, from Music to Law. Digital publishing clearly interests a lot of people. Following a key-note by Prof Claire Warwick, of the digital humanities team at UCL, the day was split in two between professionals and academics, and postgraduates and ECRs sharing their experiences of working on digital projects.  The full programme is <a href="http://www.crassh.cam.ac.uk/events/1610/programme/">here</a>.</p>
<h2>A blended future</h2>
<p>Prof Warwick&#8217;s keynote was upbeat and inspiring. She emphasised the opportunities available for academics able to work across fields, and demonstrated the success of her department in achieving this through  projects such as <a href="http://www.qrator.org/">http://www.qrator.org/</a> (a project which uses iPads to enable museum visitors to interact with museum objects and each other).  However, rather than suggesting digital formats would replace hard copy, she suggested a future filled with both.  To support this reading of the different ways people experience reading, she tantalised the audience with evidence from soon-to-be published  research into different brain imaging results when reading electronic and printed texts.</p>
<h2>Demand driving supply</h2>
<p>Less positive (or perhaps representing the cold hard publishing bottom line?) was the representative from Cambridge University Press, Richard Fisher, who argued that the growth of humanities research going on in the UK means there is too much to publish. He suggested publishers could only react to the demand of their customers. Not enough academic e-books available?  That&#8217;s our fault, people! I find the price of ebooks off-putting (rather than the devices themselves). I’m also tired of lugging books up and down the country. As a result I’m hoping prices drop and I can access more electronic resources on the move.</p>
<h2>Embracing the digital</h2>
<p>In contrast to the view of CUP as a major publisher, the head of publications at the <a href="http://www.history.ac.uk/">Institute of Historical Research</a>, Dr Jane Winters, drawing on research conducted by the IHR into digital publishing in academia emphasised the importance of taking every opportunity to use digital resources, stressing the importance of citing digital tools, for example, rather than their paper equivalents, a radical thought to many of us in the room.  Dr Winters emphasised graduates shouldn’t worry about the digital publishing of their thesis by their university risking subsequent publication prospects. Subsequent publishing in academic journals or as printed monographs is not affected.</p>
<h2>Digital projects to note</h2>
<p>The grad students and ECRs spoke about their specific experience on digital projects. The projects outlined were really different.  For example, Dr Alexi Baker and Katy Barrett described their work on the <a href="http://www.nmm.ac.uk/blogs/longitude/">Board of Longitude Project</a>.  Their work was part of a larger project supported by AHRC grants and the Maritime  Museum. In contrast, Marie Leger-St-Jean set up <a href="http://www.english.cam.ac.uk/pop/">Price One Penny</a> site independently (although it&#8217;s now hosted by Cambridge U) to catalogue early Victorian penny fiction. It&#8217;s an impressive achievement, representing a genuine solution to the problem of disparate sources in her area, and now adding donations and recommendations of others as the site becomes more well known.</p>
<h2>The rise of the academic blog</h2>
<p>Katy Barrett described the contrasting challenges of the project blog (closely supervised by museum staff) and the freedom to write in her own personal academic blog, concerning the issues raised by her research.  Whereas the project blog was closely controlled by museum staff in order to fit museum priorities, her personal blog could reflect more accurately the shape of her project. However, those bloggers present did raise the importance of caution and brevity in reporting yet-to-be-published research.</p>
<p>The plenary discussions and informal networking sessions led on from these presentations. The wide range of interests in the room meant that there was a real enthusiasm for the subject.  I came away inspired to think again about the use of an academic blog as a way to shape an academic web identity. Prof Warwick spoke of their use by interview committees in evaluating the work of researchers. It was also viewed as a good way to develop writing skills and share your research with an interested community (however small!). Where to start?  Just begin, I was told.   WordPress came recommended as a good tool to use. I&#8217;d read other academic blogs in the past and found them useful. In particular, I’ve followed academics writing about fieldwork in my area, and reflecting on designing and running courses.  I had not thought about them as an ECR or postgraduate tool: but will do so now.</p>
<p>&nbsp;</p>
<div class="mcePaste" style="width: 1px;height: 1px;overflow: hidden">
<p><!--[if gte mso 9]&gt;  Normal 0   false false false        MicrosoftInternetExplorer4  &lt;![endif]--><!--[if gte mso 9]&gt;   &lt;![endif]--><!--[if !mso]&gt; &lt;!  st1\:*{behavior:url(#ieooui) } --> <!--[endif]--><!--  /* Style Definitions */  p.MsoNormal, li.MsoNormal, div.MsoNormal 	{mso-style-parent:""; 	margin:0cm; 	margin-bottom:.0001pt; 	mso-pagination:widow-orphan; 	font-size:12.0pt; 	font-family:"Times New Roman"; 	mso-fareast-font-family:"Times New Roman";} a:link, span.MsoHyperlink 	{color:blue; 	text-decoration:underline; 	text-underline:single;} a:visited, span.MsoHyperlinkFollowed 	{color:purple; 	text-decoration:underline; 	text-underline:single;} pre 	{margin:0cm; 	margin-bottom:.0001pt; 	mso-pagination:widow-orphan; 	font-size:10.0pt; 	font-family:"Courier New"; 	mso-fareast-font-family:"Times New Roman";} @page Section1 	{size:612.0pt 792.0pt; 	margin:72.0pt 90.0pt 72.0pt 90.0pt; 	mso-header-margin:36.0pt; 	mso-footer-margin:36.0pt; 	mso-paper-source:0;} div.Section1 	{page:Section1;} --><!--[if gte mso 10]&gt; &lt;!   /* Style Definitions */  table.MsoNormalTable 	{mso-style-name:&quot;Table Normal&quot;; 	mso-tstyle-rowband-size:0; 	mso-tstyle-colband-size:0; 	mso-style-noshow:yes; 	mso-style-parent:&quot;&quot;; 	mso-padding-alt:0cm 5.4pt 0cm 5.4pt; 	mso-para-margin:0cm; 	mso-para-margin-bottom:.0001pt; 	mso-pagination:widow-orphan; 	font-size:10.0pt; 	font-family:&quot;Times New Roman&quot;; 	mso-ansi-language:#0400; 	mso-fareast-language:#0400; 	mso-bidi-language:#0400;} --> <!--[endif]--></p>
<pre>This is a guest blog post from <a href="http://edinburgh.academia.edu/CharlotteHastings">Charlotte Hastings</a> which describes a event recently held at the Centre for Research in the Arts, Social Sciences and Humanities (CRASSH) looking at the impact of the digital humanities.</pre>
<pre>Charlotte is a graduate student from the <a href="http://www.ed.ac.uk/schools-departments/education">Moray House School of Education</a> at the University of Edinburgh who has been researching gender and the development of education policy in colonial Nigeria.</pre>
</div>
]]></content:encoded>
			<wfw:commentRss>http://jiscpub.blogs.edina.ac.uk/2011/05/23/a-view-from-academia-on-digital-humanities/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Anthologize: a WordPress based collection tool</title>
		<link>http://jiscpub.blogs.edina.ac.uk/2011/05/11/anthologize-a-wordpress-based-collection-tool/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=anthologize-a-wordpress-based-collection-tool</link>
		<comments>http://jiscpub.blogs.edina.ac.uk/2011/05/11/anthologize-a-wordpress-based-collection-tool/#comments</comments>
		<pubDate>Wed, 11 May 2011 08:58:52 +0000</pubDate>
		<dc:creator>Peter Sefton</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://jiscpub.blogs.edina.ac.uk/?p=232</guid>
		<description><![CDATA[In this post I&#8217;ll look at Anthologize. Anthologize lets you write or import content into a WordPress instance, organise the &#8216;parts&#8217; of your &#8216;project&#8217; and publish to PDF or EPUB, HTML or into TEI XML format. This is what I &#8230; <a href="http://jiscpub.blogs.edina.ac.uk/2011/05/11/anthologize-a-wordpress-based-collection-tool/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<div>
<div class="page-toc"></div>
<div>
<p>In this post I&#8217;ll look at <a href="http://anthologize.org/"><span>Anthologize</span></a>. Anthologize lets you write or import content into a WordPress instance, organise the &#8216;parts&#8217; of your &#8216;project&#8217; and publish to PDF or EPUB, HTML or into TEI XML format. This is what I referred to in my last post <a href="http://jiscpub.blogs.edina.ac.uk/2011/05/10/wordpress/"><span>about WordPres</span></a>s as an aggregation platform.</p>
<h1><a id="id2"><span> </span></a>Anthologize background and use-cases</h1>
<p>Anthologize was created in an interesting way. It is the (unfinished as yet) outcome of a one-week workshop conducted at the Centre for History and New Media <span class="spCh spChx2013">–</span> the same group that brought us Zotero and Omeka, which is one good reason to take it seriously. They produce very high quality software.</p>
<blockquote class="bq"><p><span class="Strong_20_Emphasis"><span style="background-color: transparent;color: #333333;font-size: 11pt;font-style: normal;font-variant: normal;font-weight: normal;letter-spacing: normal"><span class="T2">Anthologize</span></span></span><span style="color: #333333;font-size: 11pt;font-style: normal;font-variant: normal;font-weight: normal;letter-spacing: normal"><span class="T3"> is a project of </span></span><a href="http://www.oneweekonetool.org/"><span style="background-color: transparent;color: #25589a;font-size: 11pt;font-style: normal;font-variant: normal;font-weight: normal;letter-spacing: normal;text-decoration: underline"><span class="T4">One Week | One Tool</span></span></a><span style="color: #333333;font-size: 11pt;font-style: normal;font-variant: normal;font-weight: normal;letter-spacing: normal"><span class="T3"> a project of the </span></span><a href="http://chnm.gmu.edu/"><span style="background-color: transparent;color: #25589a;font-size: 11pt;font-style: normal;font-variant: normal;font-weight: normal;letter-spacing: normal;text-decoration: underline"><span class="T4">Center for History and New Media</span></span></a><span style="color: #333333;font-size: 11pt;font-style: normal;font-variant: normal;font-weight: normal;letter-spacing: normal"><span class="T3">, </span></span><a href="http://www.gmu.edu/"><span style="background-color: transparent;color: #25589a;font-size: 11pt;font-style: normal;font-variant: normal;font-weight: normal;letter-spacing: normal;text-decoration: underline"><span class="T4">George Mason University</span></span></a><span style="color: #333333;font-size: 11pt;font-style: normal;font-variant: normal;font-weight: normal;letter-spacing: normal"><span class="T3">. Funding provided by the National Endowment for the Humanities. <span class="spCh spChxa9">©</span> 2010, Center for History and New Media. For more information, contact </span></span><span class="Strong_20_Emphasis"><span style="background-color: transparent;color: #333333;font-size: 11pt;font-style: normal;font-variant: normal;font-weight: normal;letter-spacing: normal"><span class="T2">infoATanthologizeDOTorg</span></span></span><span style="color: #333333;font-size: 11pt;font-style: normal;font-variant: normal;font-weight: normal;letter-spacing: normal"><span class="T3">. Follow </span></span><a href="http://www.twitter.com/anthologize"><span style="background-color: transparent;color: #25589a;font-size: 11pt;font-style: normal;font-variant: normal;font-weight: normal;letter-spacing: normal;text-decoration: underline"><span class="T4">@anthologize</span></span></a><span style="color: #333333;font-size: 11pt;font-style: normal;font-variant: normal;font-weight: normal;letter-spacing: normal"><span class="T3">.</span></span><span style="font-size: 11pt"><span class="T5"> </span></span></p></blockquote>
<p class="P1">Anthologize is a WordPress plugin that adds import and organisation features to WordPress. You can author posts and pages as normal, or you can import anything with an RSS/Atom feed.  The imported documents don&#8217;t seem to be able to be published for others to view but you can edit them locally. This could be useful <span class="spCh spChx2013">–</span> but introduces a whole lot of management issues around provenance and version control. When you import a post from somewhere else the images stay on the other site, so you have a partial copy of the work with references back to a different site. I can see some potential problems with that if other sites go offline or change.</p>
<p class="P1">Let&#8217;s remind ourselves about the use-cases in <a href="http://jiscpub.blogs.edina.ac.uk/2011/03/03/workpackage-3/"><span>workpackage 3</span></a>:</p>
<blockquote class="bq"><p>The three main use cases identified in the current plan, and a fourth proposed one: [numbering added for this post]</p>
<ol class="li-lower-alpha">
<li>Postgrad serializing PhD (or conference paper etc) for mobile devices</li>
<li>Retiring academic publishing their <span class="spCh spChx2018">‘</span>best-of<span class="spCh spChx2019">’</span> research (books)</li>
<li>Present final report as epub</li>
<li>Publish course materials as an eBook (Proposed extra use-case proposed by Sefton)</li>
</ol>
<p><a href="http://jiscpub.blogs.edina.ac.uk/2011/03/03/workpackage-3/"><span>http://jiscpub.blogs.edina.ac.uk/2011/03/03/workpackage-3/</span></a></p></blockquote>
<p class="P1">Many documents like (a) theses or (c) reports are likely to be written as monolithic documents in the first place, so it would be a bit strange to write, say, a report in Word, or LaTeX or asciidoc (which is how I think Liza Daly will go about writing the landscape paper for this project) , export that as a bunch of WordPress posts for dissemination, then reprocess back into an Anthologize project, and then to EPUB. There&#8217;s much more to go wrong with that, and information to be lost than going straight from the source document to EPUB. It is conceivable that this would be a good tool for thesis by publication, where the publications were available as HTML that could be fed or pasted in to WordPress.</p>
<p class="P1">I do see some potential with (d) courseware here <span class="spCh spChx2013">–</span> it seems to me that it might make sense to author course materials in a blog-post like way covering topics one by one. I have put some feelers out for someone who might like to test publishing course materials, without spending too much of this project&#8217;s time as this is not one of the core use cases. If anyone wants to try this or can point me to some suitable open materials somewhere with categories and feeds I can use then I will give it a go.</p>
<p class="P1">There is also some potential with (c), project reports, particularly if anyone takes up the <a href="http://jiscpress.org/"><span>JiscPress</span></a> way of doing things and creates their project outputs directly in WordPress+digress.it. It would also be ideal for compiling stuff that happens on the project blog as a supporting Appendix. So, an EPUB that gathers together, say all the blog posts I have made on WorkPackage 3 or the whole of the jiscPUB blog might make sense. These could be distributed to JISC and stakeholders as EPUB documents to read on the train, or deposited in a repository.</p>
<p class="P1">The retiring academic (b) (or any academic really) might want to make use of Anthologize too <span class="spCh spChx2013">–</span> particularly if they&#8217;ve been publishing online. If not they could paste their works into WordPress as posts, and deal with the HTML conversion issues inherent in that, or try to post from Word to WordPress. The test project I chose was to convert the blog posts I have done for jiscPUB into an EPUB book. That&#8217;s use case (c) more or less.</p>
<p class="P1">&nbsp;</p>
<h1><a id="id3"><span> </span></a>How did the experiment go?</h1>
<p>I have documented the basic process of creating an EPUB using Anthologize below, with lots of screenshots, but here is a summary of the outcomes.</p>
<p>Some things went really well.</p>
<ul class="lib">
<li>Using the control panel at my web host I was able set up a new WordPress website on my domain, add the Anthologize plugin and make my first EPUB in well under an hour. (But as usual, it takes a lot longer to back-track and investigate and try different options, and read the google group to see if bugs have been reported and so on).</li>
<li>The application is easy to install and easy to use <span class="spCh spChx2013">–</span> with some issues I note below.</li>
<li>Importing a feed just works if you search to find out how to do it on a standard WordPress host (although I think there might be issues trying to get large amounts of content if the source does not include everything in the feed).</li>
<li>Creating parts and dragging in content is simple.</li>
<li>Anthologize <em>looks</em> good.</li>
</ul>
<p>The good looks and simple interface are deceptive, lots of functionality I was expecting to be there just wasn&#8217;t <span class="spCh spChx2013">–</span> yet. I have been in contact with the developers and noted my biggest concerns, but here&#8217;s a list of the major issues I see with the product at this stage of its development:</p>
<ul class="lib">
<li>There does not seem to be a way to publish the project (or the imported docs) directly to the web <span class="spCh spChx2013">–</span> rather than export it. Seems like an obvious win to add that. I can see that being really useful with Digress.it for one thing. The other big win there would be if the Table of Contents could have some semantics embedded in it so it could act like an ORE resource map &#8211; meaning that machines would be able to interpret the content. (I will come back to this idea soon with a demo of using Calibre to make an EPUB)</li>
<li>There are no TOC entries for the posts within a &#8216;part&#8217; that is, if you pull in a lot of WordPress posts, they don&#8217;t get individual entries in the EPUB ToC.</li>
<li>Links, even internal ones, like the table of contents links on my posts all point back to the original post <span class="spCh spChx2013">–</span> this makes packaging stuff up much less useful <span class="spCh spChx2013">–</span> you&#8217;d need to be online, and you lose the context of an intra-linked resource.<a href="http://anthologize.org/trac/ticket/106"><span> This is a known problem</span></a>, and the developers say they are going to fix it.</li>
<li>Potentially a problem is the way Anthologize EPUB export puts all the HTML content for the whole project into one HTML file <span class="spCh spChx2013">–</span> I gather from poking around with Calibre etc that many book readers need their content chunked into multiple files.</li>
<li>There&#8217;s a wizard for exporting your EPUB, and you can enter some metadata and choose some options <span class="spCh spChx2013">–</span> all of which is immediately forgotten by the application, so if you do it again, you have to re-enter all the information.</li>
<li>Epubcheck complains about the test book I made:
<ul class="lib">
<li>It says the mimetype (a simple file that MUST be there in all EPUB) is wrong <span class="spCh spChx2013">–</span> looks OK to me.</li>
<li>It complains about the XHTML containing stuff from the TEI namespace and a few other things.</li>
</ul>
</li>
<li>Finally, PDF export fails on my blog with a timeout error <span class="spCh spChx2013">–</span> but that&#8217;s not an issue for this investigation.</li>
</ul>
<h1><a id="id5"><span> </span></a>Summary</h1>
<p>For the use case of bundling together a bunch of blog posts (or anything that has a feed) into a curated whole Anthologize is a promising application, but unless your needs are very simple it&#8217;s probably not quite ready for production use. I spent a bit of time looking at it though, as it shows great promise and comes from a good stable.</p>
<p>Here&#8217;s the result I got importing the first handful of posts from my work on this project.</p>
<p class="Illustration" style="width: 643px"><span><a><span> </span></a><img class="fr4" style="border: 0px;vertical-align: top" src="http://ptsefton.com/wp-content/uploads/2011/05/39b75480_643x345.jpeg" alt="graphics8" width="643" height="345" /></span>Illustration 1: The test book in Adobe Digital Edtions &#8211; note some encoding problems bottom right and the lack of depth in the table of contents. There are several posts but no way to navigate to them. Also, clicking on those table of contents links takes you back to tbe jiscPUB blog not to the heading.</p>
<h1><a id="id6"><span> </span></a>Walk through</h1>
<p>&nbsp;</p>
<p class="Illustration" style="width: 471px"><span><a><span> </span></a><img class="fr4" style="border: 0px;vertical-align: top" src="http://ptsefton.com/wp-content/uploads/2011/05/5eb1e9cf.png" alt="graphics1" width="471" height="252" /></span>Illustration 2: Anthologize uses &#8216;projects&#8217;. These are aggregated resources, in many cases they will be books but project seems like a nice media-neutral term.</p>
<p>&nbsp;</p>
<p class="Illustration" style="width: 643px"><span><a><span> </span></a><img class="fr4" style="border: 0px;vertical-align: top" src="http://ptsefton.com/wp-content/uploads/2011/05/m45bc0c57_643x257.jpeg" alt="graphics2" width="643" height="257" /></span>Illustration 3: A new project in a fresh WordPress install <span class="spCh spChx2013">–</span> only two things can be added to it until you write or import some content.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p class="Illustration" style="width: 643px"><span><a><span> </span></a><img class="fr5" style="border: 0px;vertical-align: baseline;margin-bottom: 0.01171875px" src="http://ptsefton.com/wp-content/uploads/2011/05/m4c92efc2_643x192.jpeg" alt="graphics3" width="643" height="192" /></span>Illustration 4: Importing the feed for workpackage 3 in the jiscPUB project. http://jiscpub.blogs.edina.ac.uk/category/workpackage-3/feed/atom/</p>
<p>&nbsp;</p>
<p class="center">&nbsp;</p>
<p class="center">&nbsp;</p>
<p class="center">&nbsp;</p>
<p class="Illustration" style="width: 643px"><span><a><span> </span></a><img class="fr4" style="border: 0px;vertical-align: top" src="http://ptsefton.com/wp-content/uploads/2011/05/m6c4a7bd6_643x301.jpeg" alt="graphics4" width="643" height="301" /></span>Illustration 5: You can select which things to keep from the feed. Ordering is done later. Remember that imported documents are copies, so there is potential for confusion if you edit them in Anthologize.</p>
<p class="center">&nbsp;</p>
<p class="Illustration" style="width: 643px"><span><a><span> </span></a><img class="fr4" style="border: 0px;vertical-align: top" src="http://ptsefton.com/wp-content/uploads/2011/05/48fc7a8b_643x472.jpeg" alt="graphics5" width="643" height="472" /></span>Illustration 6: Exporting content is via a wizard, easy to use but frustrating becuase it asks some of the same questions every time you export.</p>
<p class="center">&nbsp;</p>
<p class="Illustration" style="width: 643px"><span><a><span> </span></a><img class="fr4" style="border: 0px;vertical-align: top" src="http://ptsefton.com/wp-content/uploads/2011/05/m38722fbe_643x480.jpeg" alt="graphics6" width="643" height="480" /></span>Illustration 7: Having to retype the export information is a real problem as you can only export one format at a time. Exported material is not stored in the WordPress site, either, it is downloaded, so there is no audit trail of versions.</p>
<p class="center">&nbsp;</p>
<p class="center">&nbsp;</p>
<p class="center">&nbsp;</p>
<p class="center">Copyright <span><span>Peter Sefton</span></span>, 2011-05-04. Licensed under <span>Creative Commons Attribution-Share Alike 2.5 Australia</span>. &lt;http://creativecommons.org/licenses/by-sa/2.5/au/&gt;</p>
<p class="center"><span class="Default_20_Paragraph_20_Font"><span><span class="T1"><a><span> </span></a><img class="fr6" style="border: 0px;vertical-align: top" src="http://ptsefton.com/wp-content/uploads/2011/05/m40ca94ba2.png" alt="HTTP://DBPEDIA.ORG/SNORQL/?QUERY=SELECT+%3FRESOURCE%0D%0AWHERE+{+%0D%0A%3FRESOURCE+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%2FBIRTHPLACE%3E+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FRESOURCE%2FSYDNEY%3E+%3B%0D%0A%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%" width="88" height="31" /></span></span></span></p>
<p class="center">This post was written in OpenOffice.org, using templates and tools provided by the <a href="http://ice.usq.edu.au/"><span>Integrated Content Environment</span></a> project.</p>
<p class="center">&nbsp;</p>
<p class="center">&nbsp;</p>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://jiscpub.blogs.edina.ac.uk/2011/05/11/anthologize-a-wordpress-based-collection-tool/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>WordPress</title>
		<link>http://jiscpub.blogs.edina.ac.uk/2011/05/10/wordpress/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=wordpress</link>
		<comments>http://jiscpub.blogs.edina.ac.uk/2011/05/10/wordpress/#comments</comments>
		<pubDate>Tue, 10 May 2011 08:41:14 +0000</pubDate>
		<dc:creator>Peter Sefton</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://jiscpub.blogs.edina.ac.uk/?p=227</guid>
		<description><![CDATA[Introduction So far in the jiscPUB project I have been looking at word processing applications and EPUB, as well as how repositories and other web applications might support EPUB document production. One of the tasks in workpackage 3 is to &#8230; <a href="http://jiscpub.blogs.edina.ac.uk/2011/05/10/wordpress/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<div>
<div class="page-toc"></div>
<div>
<h1><a id="id2"><span> </span></a>Introduction</h1>
<p>So far in the jiscPUB project I have been looking at word processing applications and EPUB, as well as how repositories and other web applications might support EPUB document production. One of the tasks in <a href="http://jiscpub.blogs.edina.ac.uk/2011/03/03/workpackage-3/"><span>workpackage 3</span></a> is to look at WordPress as an example of an online tool that&#8217;s being used quite a bit in academia for both writing and publishing.</p>
<blockquote class="bq"><p>The three main use cases identified in the current plan, and a fourth proposed one: [numbering added for this post]</p>
<ol class="li-lower-alpha">
<li>Postgrad serializing PhD (or conference paper etc) for mobile devices</li>
<li>Retiring academic publishing their <span class="spCh spChx2018">‘</span>best-of<span class="spCh spChx2019">’</span> research (books)</li>
<li>Present final report as epub</li>
<li>Publish course materials as an eBook (Proposed extra use-case proposed by Sefton)</li>
</ol>
</blockquote>
<ul class="lip">
<li><a href="http://jiscpub.blogs.edina.ac.uk/2011/03/03/workpackage-3/"><span>http://jiscpub.blogs.edina.ac.uk/2011/03/03/workpackage-3/</span></a></li>
</ul>
<p>The next few posts will explore web based authoring and publishing with a focus on WordPress, and how they relate to packaging content as electronic books.</p>
<p>WordPress can be used in a number of different ways. For this project I am thinking of it as:</p>
<ul class="lib">
<li>A publishing platform.</li>
<li>A collaboration platform.</li>
<li>A content aggregation platform.</li>
<li>An authoring environment where people might write academic content. (I put this last, because I think it&#8217;s the most controversial).</li>
</ul>
<p>All of these overlap, and the same installation of WP might be doing all or none, as might other content management systems being used in academia.</p>
<p>In future posts I&#8217;m going to look at building ebooks via aggregation, using the <a href="http://anthologize.org/"><span>Anthologize</span></a> plugin, look at an alternative way of building EPUB books from lists of WordPress posts using <a href="http://calibre-ebook.com/"><span>Calibre</span></a>, and take a look at <a href="http://blogs.plos.org/mfenner/2011/02/01/epub-wordpress-plugin-released-today/"><span>Martin Fenner&#8217;s EPUB plugin for WordPress</span></a>. In this post I will look at some of the issues around WordPress as used in a couple of projects related to this one, looking particularly at JISC-funded or JISC-friendly work. This is not a survey of how WordPress is being used in academia everywhere <span class="spCh spChx2013">–</span> there&#8217;s no time for that. Please use the comments below if I&#8217;ve missed something that&#8217;s important to this project.</p>
<p>At the moment, I am thinking that the most compelling match up between the use cases for this project and what is being done with WordPress are these:</p>
<ul class="lib">
<li><strong>b: Retiring academic publishing their <span class="spCh spChx2018">‘</span>best-of<span class="spCh spChx2019">’</span> research (books)</strong>: not so much books but using a tool like Anthologize to draw together papers or other documents.</li>
<li><strong>d: Publish course materials as an eBook</strong> (Proposed extra use-case proposed by Sefton): I see great potential for tools like Anthologize as a way of compiling reading packages from web resources and packaging them to take-away on mobile devices, likewise for conference proceedings and programs and other aggregated documents.</li>
</ul>
<p>And possibly, where people are using JiscPress this use-case: <strong>c: Present final report as epub</strong>.</p>
<h1><a id="id3"><span> </span></a>Publishing platform</h1>
<p>A great example of using a blogging platform for scholarship is the <a href="http://knowledgeblog.org/"><span>KnowledgeBlog project</span></a>:</p>
<blockquote class="bq"><p>We are investigating a new, light-weight way of publishing scientific, academic and technical knowledge on the web. Currently, Knowledge Blog is being funded by a <a href="http://www.jisc.ac.uk/"><span>JISC</span></a> <a href="http://knowledgeblog.org/2010/08/02/a-new-grant-for-knowledge-blog/"><span>grant</span></a>.</p></blockquote>
<p>And the sites it has under its wing.</p>
<ul class="lib">
<li><a href="http://ontogenesis.knowledgeblog.org/"><span>Ontogenesis</span></a></li>
<li><a href="http://process.knowledgeblog.org/"><span>Process</span></a></li>
<li><a href="http://taverna.knowledgeblog.org/"><span>Taverna</span></a></li>
</ul>
<p>KnowledgeBlogs use the WordPress platform to publish articles and for article review and serves as a live example of a new mode of scholarship. It&#8217;s a publisher, but not as we know it.</p>
<p>A new entrant in the WordPress backed publishing space (and in the Authoring space) is <a href="http://annotum.wordpress.com/"><span>Annotum</span></a> which has not released any code, but has very lofty ambitions. I&#8217;ll come back to Annotum below.</p>
<h1><a id="id4"></a>An aggregation platform <span class="spCh spChx2013">–</span> bringing together content from elsewhere.</h1>
<p>I&#8217;ll cover this in my next post, looking at Anthologize, which is a promising but immature tool for pulling together stuff from multiple sources and/or authoring it locally, then grouping it with a customized table of contents and publishing to a variety of media.</p>
<h1><a id="id5"></a>An authoring platform</h1>
<p>I has to be said that WordPress as an editor gets some bad press from time to time. Phillip Lord at KnowledgeBlog <a href="http://process.knowledgeblog.org/3"><span>advises against using it for authoring</span></a>.</p>
<p>WordPress is not an authoring environment</p>
<blockquote class="bq"><p><a id="authoring"></a><a href="http://www.knowledgeblog.org/"><span>http://www.knowledgeblog.org</span></a> is hosted using WordPress. It<span class="spCh spChx2019">’</span>s a very good tool in many ways, but it was intended for and is most suited for use as a publishing tool; most blogs are written by single authors who wish to place their thoughts on the web either for authors or themselves to be able to read. It is not an authoring tool, however. It does not provide a particularly rich environment for editing, and particularly not for collaborative editing. Most people get <a href="http://ontogenesis.knowledgeblog.org/647"><span>tired</span></a> of the wordpress authoring tool very quickly, as it<span class="spCh spChx2019">’</span>s just not suited for serious scientific authoring. Nor does it provide good facilities for collaborative editing; normally, only one person can see a draft post, so you cannot pass this around between several authors.</p>
<p><a href="http://process.knowledgeblog.org/3"><span>http://process.knowledgeblog.org/3</span></a></p></blockquote>
<p>The KnowledgeBlog site encourages people to use their current authoring tools and treat the KnowledgeBlog WordPress platform as a publishing and review system.</p>
<p>Others are more positive about WordPress as an editor. <a href="http://blogs.plos.org/mfenner/"><span>Martin Fenner</span></a>, for example is a tireless promoter of the practice. And the Digress.it help recommends using WordPress to create content from scratch, the opposite of the advice coming from KnowledgeBlogs:</p>
<blockquote class="bq"><p>We recommend using the WordPress editor directly for a number of reasons:</p>
<ul class="lib">
<li>Multiple authors can easily collaborate on a single document;</li>
<li>A complete revision history of the document is maintained with the ability to roll-back to earlier versions;</li>
<li>This method produces a web-ready document, native to WordPress, and avoids the two-stage process of <span class="spCh spChx2018">‘</span>re-publishing<span class="spCh spChx2019">’</span> on your Digress.it site; and</li>
<li>You can easily embed video and other objects.</li>
</ul>
</blockquote>
<p>And then there&#8217;s Annotum. The site <a href="http://annotum.wordpress.com/about/"><span>says</span></a>:</p>
<blockquote class="bq"><p>Annotum will build upon the WordPress platform as a foundation, filling in the gaps by providing the following additional features:</p>
<ul class="lib">
<li>Rich, web-based authoring and editing:
<ul class="lib">
<li><span class="spCh spChx201c">“</span>What you see is what you get<span class="spCh spChx201d">”</span> (WYSIWYG) authoring with rich toolset (equations, figures, tables, citations and references)</li>
<li>coauthoring, comments, version tracking, and revision comparisons</li>
<li>Strict conformance to a subset of the NLM  <a href="http://jats.nlm.nih.gov/publishing"><span>journal article publishing tag set</span></a></li>
</ul>
</li>
</ul>
</blockquote>
<p>And a long list of other features. There is no code to show yet, though.</p>
<h1><a id="id6"></a>Collaboration platform</h1>
<p>Others are seeing WordPress as a place for collaborative authoring and editing. Annotum promises this on a grand scale. For those who would like to get started, Martin Fenner listed some resources <a href="http://blogs.plos.org/mfenner/2010/12/05/blogging-beyond-the-pdf/"><span>late last yea</span></a>r:</p>
<blockquote class="bq"><p>The <a href="http://wordpress.org/extend/plugins/co-authors-plus/"><span>Co-Authors Plus</span></a> Plugin enables multiple authors per article. Each author can be linked to an author page for displaying biographical info. WordPress could be extended to include additional info such as institution or past publications. Linking the WordPress user account to the unique author identifier <a href="http://www.orcid.org/"><span>ORCID</span></a>, and describing the role of the author in the paper (e.g. conceived and designed the experiments or analyzed the data) would be particularly interesting. Plugins such as <a href="http://www.editflow.org/"><span>Edit Flow</span></a> can extend the workflow by adding custom status messages (e.g. resubmission), reviewer comments, and email notifications.</p>
<p><a href="http://blogs.plos.org/mfenner/2010/12/05/blogging-beyond-the-pdf/"><span>http://blogs.plos.org/mfenner/2010/12/05/blogging-beyond-the-pdf/</span></a><span style="font-size: 11pt"><span class="T37"> </span></span></p></blockquote>
<p>Collaboration post publication is handled by a WordPress tool that&#8217;s been a hit in the UK, and with JISC.  <a href="http://digress.it/"><span>Digress.it</span></a> is a tool for public annotation and discussion of long-form documents. The JISC incarnation is at <a href="http://jiscpress.org/"><span>jiscpress.org</span></a>. Digress.it is related to  Commentpress. (They&#8217;re different things although sometimes confused with each other at least by me. See them <a href="http://cowriting.trincoll.edu/alternative/"><span>compared here</span></a>.)</p>
<p>For a JiscPress example see<a href="http://mobilereview.jiscpress.org/2010/11/1-iii-principal-findings/"><span> this document, which has a number of comments</span></a>.</p>
<h1><a id="id7"></a>Issues</h1>
<p>Some issues I have observed with WordPress in the past include the problems with its authoring environment, covered above but also a number of other considerations.</p>
<p>There is the WordPress version of Microsoft&#8217;s <span class="spCh spChx201c">“</span><a href="http://en.wikipedia.org/wiki/DLL_hell"><span>DLL hell</span></a><span class="spCh spChx201d">”</span> &#8211; <span class="spCh spChx201c">“</span>Plugin hell<span class="spCh spChx201d">”</span> &#8211; many WordPress plugins and/or themes interact with each other in unpredictable ways. I found this out first hand, trying to show-off some work my team at USQ had done on an annotation system. It worked (with bugs) in a plain WordPress site, but failed completely in Martin Fenner&#8217;s demo site where there are many other plugins installed. I never got to the bottom of that. Plugins also go out out sync with the WordPress as it evolves, so a site with lots of plugins can be hard to maintain, this is also the case with systems like Drupal which have their own enthusiastic following.</p>
<p>Some of the above systems require the content management system to be used in very particular ways <span class="spCh spChx2013">–</span> for example Digress it treats each document as a new WordPress site and asks you to upload posts in a particular order so that the Table of Contents for the site looks right. There are two issues with this kind of approach. I&#8217;m not saying that people are not already aware of these issues, but noting that they are there:</p>
<ul class="lib">
<li>There&#8217;s sometimes a fair bit of overhead involved in setting things up just so. Sometimes, it would make sense to automate some of the processes. Other times maybe a re-think to reduce complexity might be in order.</li>
<li>There is a risk of creating a new form of the proprietary lock-in we had up until recently (and arguably we still have) with document formats like Microsoft&#8217;s .doc. The documents we create in some of these systems may end up being unusable in other systems. If you author a long document in Digress.it and depend on a particular configuration of WP and, having posts in a certain order and so on for the document&#8217;s integrity, then it is essential to consider an exit strategy and an archiving strategy (more on that soon <span class="spCh spChx2013">–</span> an EPUB export might be just the ticket).
<p>There are similar issues/risks with stuff like WordPress shortcodes such as<a href="http://knowledgeblog.org/kcite-plugin"><span> KCite</span></a> from KnowledgeBlogs. It&#8217;s a great tool for authors, allowing them to cite things in a rational way:</p>
<blockquote class="bq"><p>DOI Example <span class="spCh spChx2013">–</span> [cite source=<span class="spCh spChx2019">’</span>doi<span class="spCh spChx2019">’</span>]10.1021/jf904082b[/cite]</p>
<p>PMID example <span class="spCh spChx2013">–</span> [cite source=<span class="spCh spChx2019">’</span>pubmed<span class="spCh spChx2019">’</span>]17237047[/cite]</p></blockquote>
<p>But it&#8217;s proprietary to a particular processing environment. If one wants to be able to re-used these documents or archive them then it is important to consider which version of the documents in WP to keep. (I&#8217;d argue that in this case best practice would be to transform the above to an RDFa representation in HTML and treat the HTML version as the version of record <span class="spCh spChx2013">–</span> more on this later in the project).</li>
</ul>
<p>All this adds up to saying that<strong> WordPress + plugins can be fragile</strong> <span class="spCh spChx2013">–</span> the application itself needs to be updated frequently for security reasons, and so does the operating system underneath and inevitably stuff breaks. The more complex the plugin-set and the further you stray from straight WordPress the worse the risk. Even on simple sites there can be issues. For example one of the WordPress sites I use regularly currently has a bug with remote publishing via Atompub and XMLRPC. One day it was working and the next all my attempts to post from the tools I use everyday, as per the best practice advice from the KnowledgeBlog people, were minus the characters <code>&lt;</code> and <code>&gt;</code> in the document source, both of which are obviously essential to the web.</p>
<p>For those interested in learning more about WordPress for scholarship, there&#8217;s a Google Group called <a href="https://groups.google.com/forum/#!forum/wordpress-for-scientists"><span>WordPress for Scientists</span></a> that is worth joining even if you are not a scientist and a <a href="http://blogs.xartrials.org/"><span>test site that Martin Fenner has set up</span></a> for WordPress plugins.</p>
<p class="center">Copyright <span><span>Peter Sefton</span></span>, 2011-05-09. Licensed under <span>Creative Commons Attribution-Share Alike 2.5 Australia</span>. &lt;http://creativecommons.org/licenses/by-sa/2.5/au/&gt;</p>
<p class="center"><span class="Default_20_Paragraph_20_Font"><span><span class="T1"><a></a></span></span></span></p>
<p class="center">This post was written in OpenOffice.org, using templates and tools provided by the <a href="http://ice.usq.edu.au/"><span>Integrated Content Environment</span></a> project.</p>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://jiscpub.blogs.edina.ac.uk/2011/05/10/wordpress/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>How to add EPUB support to EPrints</title>
		<link>http://jiscpub.blogs.edina.ac.uk/2011/05/03/how-to-add-epub-support-to-eprints-8/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=how-to-add-epub-support-to-eprints-8</link>
		<comments>http://jiscpub.blogs.edina.ac.uk/2011/05/03/how-to-add-epub-support-to-eprints-8/#comments</comments>
		<pubDate>Tue, 03 May 2011 05:16:36 +0000</pubDate>
		<dc:creator>Peter Sefton</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Workpackage 3]]></category>

		<guid isPermaLink="false">http://jiscpub.blogs.edina.ac.uk/?p=218</guid>
		<description><![CDATA[In a previous post here on the jiscPUB project I said it would be good for the EPrints repository software to support EPUB uploads. I’d love to do something with a repository – I’m thinking that it would be great to &#8230; <a href="http://jiscpub.blogs.edina.ac.uk/2011/05/03/how-to-add-epub-support-to-eprints-8/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<div>
<div class="page-toc"></div>
<div>
<p>In a <a href="http://jiscpub.blogs.edina.ac.uk/2011/04/11/metadata-in-word-processing-monographs-2/">previous post here on the jiscPUB projec<span style="color: #000000"><span class="T1">t</span></span></a><span style="color: #000000"><span class="T1"> I sai</span></span>d it would be good for the EPrints repository software to support EPUB uploads.</p>
<blockquote class="bq"><p>I<span class="spCh spChx2019">’</span>d love to do something with a repository <span class="spCh spChx2013">–</span> I<span class="spCh spChx2019">’</span>m thinking that it would be great to deposit theses in EPUB format <span class="spCh spChx2013">–</span> and the repository could provided a web-based reader, along the lines of <a href="http://ibisreader.com/"><span>IbisReader</span></a>, which Liza Daly and company created. I<span class="spCh spChx2019">’</span>m looking at you, Eprints! Eprints already almost supports this, if you upload a zip file it will stash all the parts for you in a single record. All we would need would be something like this<a href="../little%20reader%20my%20colleagues%20at%20USQ%20made"><span> little reader my colleagues at USQ made</span></a>. It would just be a matter of transforming the EPUB TOC into JSON, and loading the JavaScript into an Eprints page.</p></blockquote>
<p>I Called Les Carr&#8217;s attention to the post and he responded:</p>
<blockquote class="bq"><p><a href="https://twitter.com/#!/lescarr"><span>lescarr</span></a> <a href="https://twitter.com/#!/ptsefton"><span>@ptsefton</span></a> just tell us what to do and we&#8217;ll do it.</p></blockquote>
<p>OK. Here goes with my specification for how EPrints could add at least basic support for EPUB.</p>
<h1><a id="id2"><span> </span></a>Putting EPUB into EPrints as-is</h1>
<p class="P2">To explore this, I ran the <a href="http://wiki.eprints.org/w/EPrints_Live_CD_Help"><span>EPrints live CD</span></a> (livecd_v3.1-x.iso) under VirtualBox on Windows 7 <span class="spCh spChx2013">–</span> this worked well when I gave it a decent amount of memory <span class="spCh spChx2013">–</span> it didn&#8217;t manage to boot in several hours at 256Mb. (Note that no repositories were harmed in the making of this post <span class="spCh spChx2013">–</span> I did not change the  Eprints code at all.)</p>
<p>The EPUB format is a zipfile containing some XHTML payload documents, a manifest, and a table of contents. On one level EPRINTS already supports this, in that there is support for uploading ZIP files. I tested this using Danny Kingsley&#8217;s thesis (as received, with no massaging or adding metadata apart from tweaking the title in Word) <a href="http://dl.dropbox.com/u/24994372/Formatted_PhD_12May09.epub"><span>converted to EPUB</span></a> via the ICE service <a href="http://ec2-50-16-170-243.compute-1.amazonaws.com/api/convert/doc"><span>I have been working on</span></a>.</p>
<p>The procedure:</p>
<ol class="lin">
<li>Generated an EPUB using ICE.</li>
<li>Changed the file extension to .zip.</li>
<li>Uploaded it into EPrints.</li>
</ol>
<p>The result is an EPrints item with many parts. If you click on any of the HTML files that make up the thesis then they work as web pages <span class="spCh spChx2013">–</span> ie the table of contents (if you can find it amongst the many files) links to the other pages. But there is no navigation to tie it all together you have to keep hitting back <span class="spCh spChx2013">–</span> each HTML page from the EPUB is a stand alone fragment.</p>
<p>&nbsp;</p>
<p class="Illustration" style="width: 643px"><span><a href="http://jiscpub.blogs.edina.ac.uk/files/2011/05/ma53f402_643x4001.jpg"><img class="alignnone size-full wp-image-221" src="http://jiscpub.blogs.edina.ac.uk/files/2011/05/ma53f402_643x4001.jpg" alt="" width="643" height="400" /></a><br />
</span>Illustration 1: The management interface in EPrints showing all the parts of an EPUB file which has been uploaded and saved as a series of parts in a single record.</p>
<p>&nbsp;</p>
<p>At this point I went off on a side trip, and <a href="http://jiscpub.blogs.edina.ac.uk/2011/04/14/introducing-epub2html-adding-a-plain-html-view-to-an-epub/">wrote this little tool <span class="spCh spChx2013">–</span> to add an HTML view to an EPUB file</a>.</p>
<h1><a id="id3"><span> </span></a>Putting enhanced EPUB into Eprints</h1>
<p>Now, lets try that again with <a href="http://dl.dropbox.com/u/24994372/Formatted_PhD_12May09.epub"><span>the version</span></a> where I added an HTML index page to the EPUB using the new demo tool, epub2html. I uploaded the file, clicked around semi-randomly until I figured out how to see all the files listed from the zip, and selected index.html as the &#8216;main&#8217; file. From memory I thought the repository would do that for me but it didn&#8217;t. Anyway, I ended up with this:</p>
<p>&nbsp;</p>
<p class="Illustration" style="width: 643px"><span><a href="http://jiscpub.blogs.edina.ac.uk/files/2011/05/5fc3b428_643x332.jpg"><img class="alignnone size-full wp-image-220" src="http://jiscpub.blogs.edina.ac.uk/files/2011/05/5fc3b428_643x332.jpg" alt="" width="643" height="332" /></a><br />
</span>Illustration 2: The details screen that users see &#8211; clicking on the description takes you to the HTML page I picked as the main file.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p class="Illustration" style="width: 643px"><span><a href="http://jiscpub.blogs.edina.ac.uk/files/2011/05/m629e91b8_643x417.jpg"><img class="alignnone size-full wp-image-222" src="http://jiscpub.blogs.edina.ac.uk/files/2011/05/m629e91b8_643x417.jpg" alt="" width="643" height="417" /></a><br />
</span>Illustration 3: A rudimentary ebook reader using an inline frame.</p>
<p>If I click on the link starting with Other, there we have it <span class="spCh spChx2013">–</span> more-or-less working navigation within the limits of this demo-quality software. All I had to do was change the extension from .epub to .zip and select the entry page, and I had a working, navigable document.</p>
<p>The initial version of epub2html used the unsupported epubjs as a web based reader-application <span class="spCh spChx2013">–</span> but Liza Daly suggested I use the more up to date Monocle.js library instead. I tried that but I&#8217;m afraid the amount of setup required is too much for the moment so what you see here is an HTML page with an inline frame for the content.</p>
<h1><a id="id4"><span> </span></a>What does the repository need to do?</h1>
<p>So what does the EPrints team need to do to support EPUB a bit better?</p>
<ul class="lib">
<li>Add EPUB to the list of recognised files.</li>
<li>Upon recognising an EPUB&#8230;
<ul class="lib">
<li>Use a service like epub2html that can generate an HTML view of the EPUB. I wrote mine in Python, Eprints is written in Perl but I&#8217;m sure that can be sorted out via a re-write or a web service or something<span class="footnote" style="vertical-align: super"><a id="ftn1-text" class="footnote" href="#ftn1"><span>*</span></a></span>.</li>
<li>Allow the user to download the whole EPUB, or choose to use an online viewer. Could be static HTML, frames (not nice), or some kind of JavaScript based viewer.</li>
<li>Embed some kind of viewer in the EPrints page itself, or at least provide a back-link in the document viewer to the EPrints page.</li>
</ul>
</li>
</ul>
<p>Does that make sense, Les?</p>
<p class="center">Copyright <span><span>Peter Sefton</span></span>, 2011-04-15. Licensed under <span>Creative Commons Attribution-Share Alike 2.5 Australia</span>. &lt;http://creativecommons.org/licenses/by-sa/2.5/au/&gt;</p>
<p class="center">This post was written in OpenOffice.org, using templates and tools provided by the <a href="http://ice.usq.edu.au/"><span>Integrated Content Environment</span></a> project.</p>
<hr />
<p>&nbsp;</p>
<div style="font-size: .9em"><span class="footnote-defined"><a id="ftn1" href="#ftn1-text"><span>*</span></a> Maybe there&#8217;s a Python interpreter written in Perl?</span></div>
<p>&nbsp;</p>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://jiscpub.blogs.edina.ac.uk/2011/05/03/how-to-add-epub-support-to-eprints-8/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Introducing Epub2Html &#8211; adding a plain HTML view to an EPUB</title>
		<link>http://jiscpub.blogs.edina.ac.uk/2011/04/14/introducing-epub2html-adding-a-plain-html-view-to-an-epub/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=introducing-epub2html-adding-a-plain-html-view-to-an-epub</link>
		<comments>http://jiscpub.blogs.edina.ac.uk/2011/04/14/introducing-epub2html-adding-a-plain-html-view-to-an-epub/#comments</comments>
		<pubDate>Thu, 14 Apr 2011 06:16:43 +0000</pubDate>
		<dc:creator>Peter Sefton</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Workpackage 3]]></category>
		<category><![CDATA[#jiscPUB]]></category>
		<category><![CDATA[WP3]]></category>

		<guid isPermaLink="false">http://jiscpub.blogs.edina.ac.uk/?p=156</guid>
		<description><![CDATA[Background Demo Trying it out / the future Background EPUB ebook files are useful if you have an application to read them, but not everyone does. We have been discussing this in the Scholarly HTML movement; to some of us &#38;hellip; &#60;a href=&#34;http://jiscpub.blogs.edina.ac.uk/?p=150&#34;&#62;Continue reading &#60;span class=&#34;meta-nav&#34;&#62;&#38;rarr;&#60;/span&#62;&#60;/a&#62; <a href="http://jiscpub.blogs.edina.ac.uk/2011/04/14/introducing-epub2html-adding-a-plain-html-view-to-an-epub/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<div>
<div class="page-toc">
<ul>
<li><a href="#id2"><span>Background</span></a></li>
<li><a href="#id4"><span>Demo </span></a></li>
<li><a href="#id5"><span>Trying it out / the future</span></a></li>
</ul>
</div>
<div>
<h1><a name="id2"></a>Background</h1>
<p>EPUB ebook files are useful if you have an application to read them, but not everyone does. We have been discussing this in the <a href="http://scholarlyhtml.org/"><span>Scholarly HTML</span></a> movement; to <a href="http://www.teleread.com/ebooks/beyond-the-pdf-%E2%80%A6-is-epub-by-martin-fenner-workshop-materials/"><span>some of us</span></a> EPUB <a href="http://ptsefton.com/2010/08/13/epub-as-a-way-of-packaging-scholarly-resources.htm"><span>looks like a good general purpose packaging format</span></a> for scholarship. Not just for HTML (if you can make it XTHML, that is) but potentially for other stuff that makes up a research object, such as data files or provenance information. One of the big problems, though is that the format is still not that widely known; what is a researcher to do when they are given file ending in <code>.epub</code>? That question remains unresolved at the moment, but in this post I will talk about one small step to making EPUB potentially more useful in the general academic community.</p>
<p>This week, I was looking at the potential for EPUB support in repositories, which I will cover in my next post. An EPUB is full of HTML, but it&#8217;s not something that is necessarily straightforward to display on the web. jiscPUB colleague Liza Daly&#8217;s company has a thing called <a href="http://ibisreader.com/"><span>IbisReader</span></a> that serves EPUB over the web and worked on<a href="http://bookworm.oreilly.com/"><span> BookWorm</span></a>, parts of which are also <a href="http://code.google.com/p/threepress/"><span>available as open source</span></a>.</p>
<p>What I wanted was a bit different <span class="spCh spChx2013">–</span> I wanted to be able to add something equivalent to a README file to an EPUB that let people read the content and web site or repository managers would be able to do something with it. So, I wrote a small tool intended as demonstrator only which:</p>
<ul class="lib">
<li>Generates a plain HTML table of contents.</li>
<li>Adds an index.html page to the root of an EPUB (this is legit, it gets added to the manifest as well, but not the TOC) with a simple frame-based navigation system so if you can open the EPUB zip, you can browse it.</li>
<li>Bundles in a lightweight JavaScript viewer. Initially I tried the <a href="http://demo.adfi.usq.edu.au/paquete/demo/#module01.htm"><span>Paquete system</span></a> from USQ, but it turned out to have a few more issues than I had hoped. For this first release I have used a bit of Liza&#8217;s code from a couple of years ago, <a href="http://blog.threepress.org/2009/02/09/introducing-epubjs/"><span>epubjs</span></a> with couple of modifications. Status? Works for me.</li>
</ul>
<h1><a name="id4"></a>Demo</h1>
<p>So here&#8217;s what it looks like in real life, warts and all.</p>
<p>I used the test file I was <a href="http://jiscpub.blogs.edina.ac.uk/2011/04/11/metadata-in-word-processing-monographs-2/"><span>working on earlier in the week</span></a> with embedded metadata.</p>
<p class="Illustration" style="width: 643px"><span><a name="graphics1"></a><img class="fr3" style="border: 0px;vertical-align: top" src="http://jiscpub.blogs.edina.ac.uk/files/2011/04/2f5de0c5_643x5791.jpeg" alt="graphics1" width="643" height="579" /></span>Illustration 1: Test epub from Edinburgh thesis template, with added metadata in Adobe Digital Editions</p>
<p>I ran the new code:</p>
<blockquote class="bq"><p><code>python epub2html.py  Edinburgh-ThesisSingleSided-plus-inline-metadata.epub</code></p></blockquote>
<p>Which made a new file. (It does make <a href="http://code.google.com/p/epubcheck/"><span>epubckeck</span></a> complain, but that&#8217;s mostly to do with HTML attributes it doesn&#8217;t like, not EPUB structural problems).</p>
<blockquote class="bq"><p>Edinburgh-ThesisSingleSided-plus-inline-metadata-html.epub</p></blockquote>
<p>Now, if I unzip it there is an index.html, and some JavaScript from epubjs. In Firefox that looks like this.</p>
<p class="center">&nbsp;</p>
<p class="Illustration" style="width: 643px"><span><a name="graphics2"></a><img class="fr3" style="border: 0px;vertical-align: top" src="http://jiscpub.blogs.edina.ac.uk/files/2011/04/m7b4d760f_643x3991.jpeg" alt="graphics2" width="643" height="399" /></span>Illustration 2: HTML view of the EPUB being served from the file system, using epubjs for navigation</p>
<p>But, if the JavaScript is not working, then you can still see the content courtesy of the less than ideal inline frame:</p>
<p class="Illustration" style="width: 643px"><span><a name="graphics3"></a><img class="fr3" style="border: 0px;vertical-align: top" src="http://jiscpub.blogs.edina.ac.uk/files/2011/04/m77e0cca1_643x3921.jpeg" alt="graphics3" width="643" height="392" /></span>Illustration 3: Fall-back to plain HTML with no JavaScript, the index.html file has an inline frame for the EPUB content. Not elegant, but lets the content be seen.</p>
<h1><a name="id5"></a>Trying it out / the future</h1>
<p>If you want to try this out, or help out you can get the tool from Google code.</p>
<blockquote class="bq"><p><code>svn co https://integrated-content-environment.googlecode.com/svn/branches/temp-2011/epub2html</code></p></blockquote>
<p>There are lots of things to do, like add command line options for output files, extracting the EPUB+HTML for immediate use (after safety checking it), choosing whether to bundle the JavaScript in the EPUB or linking to it via the web. Does anyone want this? Let us know.</p>
<p>One of the things I like about Paquete is that it generates # URLS for the different pages you view, making bookmarking chapters possible like this: <a href="http://demo.adfi.usq.edu.au/paquete/demo/#configuration.htm"><span>http://demo.adfi.usq.edu.au/paquete/demo/#configuration.htm</span></a>. I will explore whether this can be added to epubjs or whether it is worth pressing on with Paquete, which does have some more options like navigation buttons and a tree-widget for the table of contents.</p>
<p>Like I said, I did this as part of the notes I was putting together for how repositories might support EPUB, and maybe, finally, start serving real web content rather than exclusively PDF, more on that soon.</p>
<p>This approach might also help us add previews to web services so people can see their content in ereader-mode, something I know David Flanders the JISC manager on this project is keen on.</p>
<p>And finally something like this approach might be part of a tool-chain that could help people break up long documents into parts, packaged in EPUB and upload them to services like <a href="http://digress.it/"><span>http://digress.it</span></a> which want things broken up into parts.</p>
<p class="center">Copyright <span><span>Peter Sefton</span></span>, 2011-04-14. Licensed under <span>Creative Commons Attribution-Share Alike 2.5 Australia</span>. &lt;http://creativecommons.org/licenses/by-sa/2.5/au/&gt;</p>
<p class="center"><span class="Default_20_Paragraph_20_Font"><span><span class="T1"><a name="HTTP:::DBPEDIA.ORG:SNORQL:?QUERY=SELECT+?RESOURCE WHERE+{+ ?RESOURCE+&lt;HTTP://DBPEDIA.ORG/ONTOLOGY/PERSON/BIRTHPLACE&gt;+&lt;HTTP://DBPEDIA.ORG/RESOURCE/SYDNEY&gt;+; &lt;HTTP://DBPEDIA.ORG/ONTOLOGY/PERSON%"></a><img class="fr2" style="border: 0px;vertical-align: top" src="http://jiscpub.blogs.edina.ac.uk/files/2011/04/m40ca94ba5.png" alt="HTTP://DBPEDIA.ORG/SNORQL/?QUERY=SELECT+%3FRESOURCE%0D%0AWHERE+{+%0D%0A%3FRESOURCE+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%2FBIRTHPLACE%3E+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FRESOURCE%2FSYDNEY%3E+%3B%0D%0A%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%" width="88" height="31" /></span></span></span></p>
<p class="center">This post was written in OpenOffice.org, using templates and tools provided by the <a href="http://ice.usq.edu.au/"><span>Integrated Content Environment</span></a> project.</p>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://jiscpub.blogs.edina.ac.uk/2011/04/14/introducing-epub2html-adding-a-plain-html-view-to-an-epub/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
	</channel>
</rss>

