top > photograph and camera > How to get Raw data from CR2 format > Decoding Lossless JPEG - DHT

Tweet |

Here is the part that how to decode lossless JPEG in CR2 format.

To understanding more easier, the file (IMG_2026_IFD3_JPEG.jpg) which is clipped lossless JPEG section is used.

In CR2 format, Lossless JPEG has five markers including SOI and EOI.

Marker | Symbol | Description |
---|---|---|

0xFFD8 | SOI | Start Of Image |

0xFFC4 | DHT | Define Huffman Tables |

0xFFC3 | SOF3 | Start Of Frame Lossless (sequential) (non-differential, Huffman coding) |

0xFFDA | SOS | Start Of Scan |

0xFFD9 | EOI | End Of Image |

Below figure is a partial dump of lossless JPEG in the sample.

In JPEG, the byte order is used the big endian.

The 2 byte value (0xFF 0xC4) in address 0x0000 0002 is DHT Marker.

The 2 byte value (0x00 0x42) in address 0x0000 0004 is the segment length including itself (Lh).

This segment length is 66.

The high 4 bit of next 1 byte (address 0x0000 0006) means Table Class (Tc). 0 is DC table or lossless table. 1 is AC table.
Now here is lossless JPEG, this is always 0.

The low 4 bit of this 1 byte means Huffman table destination identifier (Th). In the sample, this value is 0, so this segment is ID #0 huffman table.

The next 16 byte (from address 0x0000 0007) means Number of Huffman codes of length i (Li).

Bit length (i) | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 |

Number of Huffman codes | 0x00 | 0x01 | 0x04 | 0x02 | 0x03 | 0x01 | 0x01 | 0x01 | 0x01 | 0x01 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 |

That is,

0 Huffman code of 1 bit length is.

1 Huffman code of 2 bit length is.

4 Huffman codes of 3 bit length are.

:

Now, we make a huffman tree from these data.

The number of 1 bit length Huffman code is 0, and we need not consider it.

The number of 2 bit length Huffman code is 1 in above table.

2 bit length code are of 4 types as follows:

00

01

10

11

And we should select 1 code in the most minimum and have not yet selected.

In the sample, we can get 00.

The number of 3 bit length Huffman codes is 4 in above table.

3 bit length code are of 8 types as follows:

000

001

010

011

100

101

110

111

And we should select 4 code in the most minimum and have not yet selected in the same way.

000 and 001 have been selected in 2 bit length, and they must be excluded.

Therefore in the sample, we can get 010, 011, 100 and 101.

This can be easily expressed by the tree structure like following figure.

The top "R" circle means the root. The white circle means node. The yellow circle means the leaf.

The depth of tree means bit length. The left side number is count of leaves in the depth. It is same as number of Huffman codes.

In the sample, the Huffman code list is below table.

Code |
---|

00 |

010 |

011 |

100 |

101 |

1100 |

1101 |

11100 |

11101 |

11110 |

111110 |

1111110 |

11111110 |

111111110 |

1111111110 |

The next 15 byte (from address 0x0000 0017) means Value associated with each Huffman code (Vi).

These values are sequentially put on the Huffman code table as follows. This is #0 Huffman table.

Code | Value |
---|---|

00 | 0x06 |

010 | 0x04 |

011 | 0x08 |

100 | 0x05 |

101 | 0x07 |

1100 | 0x03 |

1101 | 0x09 |

1110 0 | 0x00 |

1110 1 | 0x0A |

1111 0 | 0x02 |

1111 10 | 0x01 |

1111 110 | 0x0C |

1111 1110 | 0x0B |

1111 1111 0 | 0x0D |

1111 1111 10 | 0x0E |

In address 0x0000 0026, Tc and Th exist as address 0x0000 0006.

And Th is 1, the following data is for ID #1 Huffman table.

In the sample, in fact, data for #1 is same as data for #0. Therefore #1 Huffman table is same as #0 Huffman table.

next page : Decoding Lossless JPEG part2 - SOF3 and SOS

previous page
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
next page