JPEG2000 and Digitisation: Expert round table

JPEG2000 round table discussion

With digital preservation, and in particular the preservation of digital assets created by digitisation, very much a hot topic in the archives and libraries communities recently; we are being asked more and more frequently by clients which is the "best" image format to use.

Of course the answer is almost always "It depends on your project's goals."

But more specifically, we are finding an increasing number asking about JPEG2000. Some are concerned that they might be unable to access their collections in the future if they don't digitise to JPEG2000, others have a strong trust in tried and tested TIFF files and have some trepidation at switching to a new format, and yet others aren't completely sure of the advantages JPEG2000 provides over the original JPEG format...

How suitable is JPEG2000 for Digitisation?

So to shed light on JPEG2000 as a format, it's potential role in digitisation, and it's suitability for digital preservation, we asked four experts for their answer to the following two questions:

1. “Is JPEG2000 a suitable format for preservation and access of archival collections?”

2. “Should archives be digitising their collections to JPEG2000 instead of TIFF in 2015?”

Read on below to hear responses from Dave Thompson (Digital Curator – Wellcome Library), Melissa Terras (Director – UCL Centre for Digital Humanities), Paul Sugden (Senior Digitisation Consultant – TWA), and Michael Pritchard (Director General – The Royal Photographic Society).

Dave Thompson - Digital Curator at Wellcome Library

Dave Thompson – Digital Curator, Wellcome Library

“Is JPEG2000 a suitable format for preservation and access of archival collections?”

When we talk about digitising to JPEG2000 in libraries and archives we are really talking about using JPEG2000 (Part 1). This is the version that can be used royalty free. The JPEG2000 (Part 1) – let’s call it JP2 for short – format is a mature, flexible, and adaptable format for digitisation of any material, not just archives. But of course no format lasts forever, and technology and user requirements evolve over time. From a preservation perspective, JP2 works as a surrogate format because of its flexibility and adaptability.

The ‘traditional’ choice has been TIFF (.tif), a digitisation file format in use for almost as long as digitisation itself. TIFF is a stable, mature format which has an extensive suite of existing tools and best practice models that organisations can follow. The use of TIFF can be a well-trodden path. However in the past five years JPEG2000 has become increasingly popular with digitisation, especially mass digitisation, projects.

The JPEG2000 format is a suite of formats, specifications really, that addresses requirements for digital imaging. Its wavelet based technology means that it can be used in a compressed format which is visually lossless. Like TIFF, JP2 can only be displayed natively with the use of browser plugins.

The flexibility of JP2 lies in its composition of a – user determined – number of ‘tile’ layers. Tile layers which exist as individual JPEG images. Each JPEG ‘tile’ exists as a standalone image from the JP2. This can allow for incredible deep zooming in which a user can zoom in and enlarge a small portion of an image retaining a high quality view of that detail. This can be particularly advantageous in archives where a minute detail in a handwritten page may be important.

It would be easy to be swayed by purely technical arguments for the adoption of JP2. But the more ‘social’ requirements of user interaction with the material, of zooming whilst retaining a high level of quality; are perhaps more compelling. We digitise for the benefit of our users and the flexibility of JP2 supports this, whilst at the same time offering technical advantages over other formats.

When the time comes to move on and migrate away from JP2, its ability to be readily converted to another format can be built upon. Though the choice of which digitisation format to use for surrogate masters is of course just one of the planning decisions facing a digitisation project.

“Should archives be digitising their collections to JPEG2000 instead of TIFF in 2015?”

Much as I would like to promote the use of JPEG2000 as a master image format for digitisation in 2015, the issue is not as simple as that. Why digitise? What material to select? What resources are available? What is the user need? What are the drivers? These are decisions that each individual organisation has to take. It is these drivers that might lead to the decision to use TIFF over JPEG2000 (JP2) or vice versa, rather than the other way around.

The use of JP2 for its flexibility offers advantages over TIFF. The ability to easily and programmatically create other dissemination formats from JP2 such as PDF can support users working remotely with archival collections. Both formats are commonly converted to JPEG for web based dissemination, however compressed JP2 files can be of sufficiently high quality to be used in printed reproduction.

Should archives be digitising their collections? The answer is unequivocally ‘YES!’ How that is achieved though is more complex and depends on answers to a number of equally complex questions. Questions about technical resources, infrastructure and costs are probably bigger drivers in digitisation projects than master format choice. The decision to use one format over another can be framed in terms of user needs, in which case the lighter, nimbler JP2 undeniably carries some advantages.

Melissa Terras - Director, UCL Centre for Digital Humanities

Melissa Terras – Director, UCL Centre for Digital Humanities

“Is JPEG2000 a suitable format for preservation and access of archival collections?”

Is JPEG2000 a suitable format to use in digitisation? Well, it’s confusing isn’t it? In the class I teach - an introduction to digitisation for those undertaking MAs in Library studies, Archive studies, Information studies, and Digital Humanities - it is certainly something we discuss.

It’s a good example of how the digitisation community has a raft of guidelines from esteemed bodies to follow, yet no core decision when it comes down to how to adopt new and changing file formats. The industry standard often emerges without having an actual “standard” - and this is something the students - who will often be going into positions within the cultural heritage sector - have to learn: how to find out the actual industry standard, how to keep abreast of changes, and how to find sources to trust and base decisions upon at the time of actually carrying out a project.

So, back to JPEG2000. At time of writing, the Wikipedia page is flagged up as having “multiple issues” so you can’t trust that, and it points out the tensions around this discussion! Turn to somewhere more trustworthy, such as Jisc Digital, and the latest blog on the situation highlights the latest support from Adobe in supporting JPEG2000 and suggests that camera manufacturers are about to offer it as a capture format… which would be a game changer, as an industry standard becomes much easier to adopt by those trying to figure out how supported and trustworthy a format it is. We use various online sources such as this as discussion material in class, and understanding the source and nature of confusion as well as the benefits of the technology is important, whilst highlighting the decisions that have to be made at every step of the digitisation process.

“Should archives be digitising their collections to JPEG2000 instead of TIFF in 2015?”

I think that, with the future of JPEG2000 as a digitisation format seemingly so unclear, it is difficult to say. At the moment, our workflow still uses TIFF, but we are keeping abreast of the discussion, with the hope of more clarity from those who have more heft, and to sense if and when it’s time for us to make the shift. Clarifying issues of using JPEG2000 as a delivery or preservation medium (or both!) would certainly help those in the same position as us: keeping a watchful eye, and trying to understand the ins and outs of both a technical and procedural discussion. It certainly makes for a great class discussion!

Paul Sugden - Senior Digitisation Consultant at TownsWeb Archiving

Paul Sugden – Senior Digitisation Consultant, TownsWeb Archiving

“Is JPEG2000 a suitable format for preservation and access of archival collections?”

With regards to preservation, the most widely used file formats (TIFF, JPEG and PDF) can vary in type and there are always concerns regarding long term viability when there are commercial vendor tie-ins. Surely then JPEG2000 - a single non-proprietary file format that is capable of storing images, audio, and motion, without any reliance on third party vendors - is the ideal preservation format?

Perhaps. But consider this... most devices do not capture directly to JPEG2000 format, predominantly capturing to RAW or lossless TIFF instead. Converting an image from lossless TIFF to a lossless JPEG2000 file format is relatively painless, thus the assumption would be that the retro conversion from lossless JPEG2000 back to the lossless TIFF format, as originally captured, should be equally painless. However “as originally captured” is not that straightforward.

I attended a conference last year where an expert seemed to demonstrate that it was not possible to convert a JPEG2000 image precisely back to the original lossless TIFF file from which it was created. His example showed that after the retro conversion the file size of the newly created TIFF was different to the original TIFF as captured, and there were also visible differences between the images (albeit minor differences). In my opinion, a file format that does not offer accurate retro-conversion back to precisely the image that was originally captured certainly cannot be seen as a reliable preservation format.

But what about access? At present JPEG2000 is not supported by the majority of internet browsers as standard, with most applications of JPEG2000 found online relying on some sort of plug-in 'player' to display the JP2 file. It is also not supported by many mainstream graphics packages, again with those that do support it generally requiring a plug-in. So with software vendors still playing catch-up when it comes to supporting JP2 files, it cannot be seen as a suitable file format for access.

“Should archives be digitising their collections to JPEG2000 instead of TIFF in 2015?”

Currently we only have two clients that request their materials be digitised to JPEG2000 format, and both of these clients additionally request Master TIFF and standard JPEG formats. There can be quite significant cost implications when creating JP2 files and whilst both of these clients are happy to pay, others see it as a unnecessary expense at this stage (as JP2 files can only be created from TIFFs/JPEGs anyway then this can be carried out at any stage in the future as and when there is a requirement).

Michael Pritchard - Director-General at The Royal Photographic Society

Michael Pritchard – Director-General, The Royal Photographic Society

“Is JPEG2000 a suitable format for preservation and access of archival collections?”

In theory JPEG2000 should be a better file format for preservation and access to archival collections compared to the widely used JPEG compression. It uses a more efficient compression algorithm, is less prone to error and offers better image quality, plus it can be saved with no loss of data. For various reasons the only area where JPEG2000 has gained a hold is in digital video. JPEG2000 has never gained great traction and take up; and some widely available imaging editing software does not offer the option to save files as JEPG2000. For these reasons digitising collections to JPEG2000 could be a technical cul-de-sac, despite its advantages (think Betamax compared to VHS).

“Should archives be digitising their collections to JPEG2000 instead of TIFF in 2015?”

In a straight comparison I would suggest continuing to use TIFF for preservation, despite the larger file sizes, and then (if the CMS allows) serving these as JPEGs for user access. To bring in another angle though: PNG is the second most used imaging format on the net – it offers lossless compression, scales well and is less complex and prone to error than TIFF files. Moreover, compared to JPEG2000 it is widely supported.

JPEG2000: A Divisive format

If one thing is clear from the expert points of view above, it is that opinion is definitely divided when it comes to JPEG2000 as a master format for digitisation and digital preservation.

On the one hand several experts are confident about the advanced compression and embedded metadata functionality it offers, but on the other hand are serious concerns over hardware support and compatibility.

If you would like to find out more about JPEG2000 and it's potential for application in archives, we recommend reading Rob Buckley's research report produced for the Wellcome Library. Or if you are planning a digitisation project and would like advice about the best image formats for your goals, please feel free to contact us.

Explore recent articles