annotate doc/PBG3 @ 730:f2c3848dabff

PyTouhou: Fix text comparison to be done after encoding to bytes
author Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>
date Sun, 08 Dec 2019 20:14:28 +0100
parents 6b2c7af2384c
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
1 The PBG3 format is an archive format used by Touhou 6 (The Embodiment of Scarlet Devil).
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
2
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
3 It is a bitstream composed of a header, a file table, and LZSS-compressed files.
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
4
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
5
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
6
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
7 Reading integers
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
8 ----------------
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
9
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
10 Integers in PBG3 files are never signed, they are not byte-aligned, and have a variable size.
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
11 Their size is given by two bits: 00 means the number is stored in one byte, 10 means it is stored in three bytes.
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
12
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
13 Ex:
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
14 0x0012 is stored as: 0000010010
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
15 0x0112 is stored as: 010000000100010010
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
16
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
17
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
18
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
19 Reading strings
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
20 ---------------
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
21
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
22 Strings are stored as standard NULL-terminated sequences of bytes.
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
23 The only catch is they are not byte-aligned.
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
24
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
25
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
26
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
27 Header
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
28 ------
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
29
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
30 The header is composed of three fields:
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
31 * magic (string): "PBG3"
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
32 * number of entries (integer)
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
33 * offset of the file table (integer)
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
34
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
35 The size of the header is thus comprised between 52 bits and 100 bits.
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
36
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
37
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
38
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
39 File table
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
40 ----------
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
41
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
42 The file table starts at a byte boundary, but as the rest of the file, isn't byte-aligned.
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
43 It consists of a sequence of entries.
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
44 Each entry is composed of five fields:
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
45 * unknown1 (int) #TODO
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
46 * unknown2 (int) #TODO
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
47 * checksum (int): simple checksum of compressed data
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
48 * size (int): size of uncompressed data
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
49 * name (string): name of the file
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
50
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
51 The checksum is a mere sum of the compressed data.
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
52 Files are compressed using the LZSS algorithm, with a dictionary size of 8192 bytes and a minimum matching length of 4 bytes.
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
53 The size of the offset component of (offset, length) tuples is 13 bits, whereas the size of the length component is 4 bits.
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
54 A file ends with a (0, 0) tuple, that is, 18 zero bits.
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
55
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
56 Uncompressing a LZSS-compressed file is quite easy, see lzss.py.
6b2c7af2384c Hello Gensokyo _o/
Thibaut Girka <thib@sitedethib.com>
parents:
diff changeset
57