From dc3e97adc70bb2a6e7f71081a6997f20bbea9eb6 Mon Sep 17 00:00:00 2001 From: Alexander Neumann Date: Sat, 17 Jan 2015 16:57:16 +0100 Subject: [PATCH] Update documentation --- doc/Design.md | 200 +++++++++++++++++++++++++------------------------- 1 file changed, 101 insertions(+), 99 deletions(-) diff --git a/doc/Design.md b/doc/Design.md index df290c2ff..d1456c3c4 100644 --- a/doc/Design.md +++ b/doc/Design.md @@ -48,8 +48,6 @@ The basic layout of a sample restic repository is shown below: ├── keys │ └── b02de829beeb3c01a63e6b25cbd421a98fef144f03b9a02e46eff9e2ca3f0bd7 ├── locks - ├── maps - │ └── 3c0721e5c3f5d2d78a12664b568a1bc992d17b993d41079599f8437ed66192fe ├── snapshots │ └── 22a5af1bdc6e616f8a29579458c49627e01b32210d09adb288d1ecda7c5711ec ├── tmp @@ -60,8 +58,11 @@ The basic layout of a sample restic repository is shown below: │ │ └── 32ea976bc30771cebad8285cd99120ac8786f9ffd42141d452458089985043a5 │ ├── 95 │ │ └── 95f75feb05a7cc73e328b2efa668b1ea68f65fece55a93bc65aff6cd0bcfeefc - │ └── e0 - │ └── e01150928f7ad24befd6ec15b087de1b9e0f92edabd8e5cabb3317f8b20ad044 + │ ├── b8 + │ │ └── b8138ab08a4722596ac89c917827358da4672eac68e3c03a8115b88dbf4bfb59 + │ ├── e0 + │ │ └── e01150928f7ad24befd6ec15b087de1b9e0f92edabd8e5cabb3317f8b20ad044 + │ [...] └── version A repository can be initialized with the `restic init` command, e.g.: @@ -124,8 +125,13 @@ pretty-print the contents of a snapshot file: Enter Password for Repository: { "time": "2015-01-02T18:10:50.895208559+01:00", - "tree": "2da81727b6585232894cfbb8f8bdab8d1eccd3d8f7c92bc934d62e62e618ffdf", - "map": "3c0721e5c3f5d2d78a12664b568a1bc992d17b993d41079599f8437ed66192fe", + "tree": "", + "tree": { + "id": "2da81727b6585232894cfbb8f8bdab8d1eccd3d8f7c92bc934d62e62e618ffdf", + "size": 282, + "sid": "b8138ab08a4722596ac89c917827358da4672eac68e3c03a8115b88dbf4bfb59", + "ssize": 330 + }, "dir": "/tmp/testdata", "hostname": "kasimir", "username": "fd0", @@ -134,122 +140,118 @@ pretty-print the contents of a snapshot file: } Here it can be seen that this snapshot represents the contents of the directory -`/tmp/testdata`. - -The two most important fields are `map` and `tree`. - -Maps ----- +`/tmp/testdata`. The most important field is `tree`. All content within a restic repository is referenced according to its SHA-256 hash. Before saving, each file is split into variable sized chunks of data. The SHA-256 hashes of all chunks are saved in an ordered list which then represents -the content of the file. In order to relate these plain text hashes to the -actual encrypted storage hashes (which vary due to random IVs), each snapshot -references a map. +the content of the file. -A map is an encrypted and compressed JSON document which contains a large list -of plain text hashes and associated storage hashes. This list is sorted by the -plain text hash in order to speed up lookups. - -Maps are referenced by their storage ID, which is the SHA-256 hash of the -encrypted file stored in the `maps` directory. - -The command `restic cat map` can be used to inspect the content of a map: - - $ restic -r /tmp/restic-repo cat map 3c0721e5c3f5d2d78a12664b568a1bc992d17b993d41079599f8437ed66192fe - Enter Password for Repository: - [ - { - "id": "1424916fc7279d58e3b2d8b533f481981ea5cb0f21a43932f26475e308e9b599", - "size": 287, - "sid": "32ea976bc30771cebad8285cd99120ac8786f9ffd42141d452458089985043a5", - "ssize": 335 - }, - { - "id": "160916dec2e9f4597a2cc3f0787ff6b3726c21e056177292eb85281c9c2afaa0", - "size": 812, - "sid": "73d04e6125cf3c28a299cc2f3cca3b78ceac396e4fcf9575e34536b26782413c", - "ssize": 860 - }, - [...] - ] +In order to relate these plain text hashes to the actual encrypted storage +hashes (which vary due to random IVs), each object contains a list that maps +all referenced plaintext hashes to storage hashes. In the case of the snapshot +data structure listed above, the list only consists of one entry for the +referenced tree, so the field `tree` consists of such a mapping. Trees and Data -------------- -The second thing a snapshot references is a tree. Trees are referenced by the -SHA-256 hash of the JSON string representation of its contents and are saved in -a subdirectory of the directory `trees`. The sub directory's name is the first -two characters of the filename the tree object is stored in. +A snapshot references a tree by the SHA-256 hash of the JSON string +representation of its contents. Trees are saved in a subdirectory of the +directory `trees`. The sub directory's name is the first two characters of the +filename the tree object is stored in. The command `restic cat tree` can be used to inspect the tree referenced above: - $ restic -r /tmp/restic-repo cat tree 2da81727b6585232894cfbb8f8bdab8d1eccd3d8f7c92bc934d62e62e618ffdf + $ restic -r /tmp/restic-repo cat tree b8138ab08a4722596ac89c917827358da4672eac68e3c03a8115b88dbf4bfb59 Enter Password for Repository: - [ - { - "name": "testdata", - "type": "dir", - "mode": 493, - "mtime": "2014-12-22T14:47:59.912418701+01:00", - "atime": "2014-12-06T17:49:21.748468803+01:00", - "ctime": "2014-12-22T14:47:59.912418701+01:00", - "uid": 1000, - "gid": 100, - "user": "fd0", - "inode": 409704562, - "content": null, - "subtree": "a8838fdbf2902095fb1b9de8b0e30d2e4e2a91bbc82fb15f98f6f1535b9ccbe6" - } - ] + { + "nodes": [ + { + "name": "testdata", + "type": "dir", + "mode": 493, + "mtime": "2014-12-22T14:47:59.912418701+01:00", + "atime": "2014-12-06T17:49:21.748468803+01:00", + "ctime": "2014-12-22T14:47:59.912418701+01:00", + "uid": 1000, + "gid": 100, + "user": "fd0", + "inode": 409704562, + "content": null, + "subtree": "b26e315b0988ddcd1cee64c351d13a100fedbc9fdbb144a67d1b765ab280b4dc" + } + ], + "map": [ + { + "id": "b26e315b0988ddcd1cee64c351d13a100fedbc9fdbb144a67d1b765ab280b4dc", + "size": 910, + "sid": "8b238c8811cc362693e91a857460c78d3acf7d9edb2f111048691976803cf16e", + "ssize": 958 + } + ] + } -A tree is a list of entries which contain meta data like a name and timestamps. -When the entry references a directory, the field `subtree` contains the plain -text ID of another tree object. The associated storage ID can be found in the -map object. +A tree contains a list of entries (in the field `nodes`) which contain meta +data like a name and timestamps. When the entry references a directory, the +field `subtree` contains the plain text ID of another tree object. The +associated storage ID can be found in the map object. All referenced plaintext +hashes are mapped to their corresponding storage hashes in the list containid +in the field `map`. -This can also be inspected by using `restic cat tree`, which automatically -searches all available maps for the storage ID: +When the command `restic cat tree` is used, the storage hash is needed to print +a tree. The tree referenced above can be dumped as follows: - $ restic -r /tmp/restic-repo cat tree a8838fdbf2902095fb1b9de8b0e30d2e4e2a91bbc82fb15f98f6f1535b9ccbe6 + $ restic -r /tmp/restic-repo cat tree 8b238c8811cc362693e91a857460c78d3acf7d9edb2f111048691976803cf16e Enter Password for Repository: - [ - { - "name": "testfile", - "type": "file", - "mode": 420, - "mtime": "2014-12-06T17:50:23.34513538+01:00", - "atime": "2014-12-06T17:50:23.338468713+01:00", - "ctime": "2014-12-06T17:50:23.34513538+01:00", - "uid": 1000, - "gid": 100, - "user": "fd0", - "inode": 416863351, - "size": 1234, - "links": 1, - "content": [ - "50f77b3b4291e8411a027b9f9b9e64658181cc676ce6ba9958b95f268cb1109d" - ] - }, - [...] - ] + { + "nodes": [ + { + "name": "testfile", + "type": "file", + "mode": 420, + "mtime": "2014-12-06T17:50:23.34513538+01:00", + "atime": "2014-12-06T17:50:23.338468713+01:00", + "ctime": "2014-12-06T17:50:23.34513538+01:00", + "uid": 1000, + "gid": 100, + "user": "fd0", + "inode": 416863351, + "size": 1234, + "links": 1, + "content": [ + "50f77b3b4291e8411a027b9f9b9e64658181cc676ce6ba9958b95f268cb1109d" + ] + }, + [...] + ], + "map": [ + { + "id": "50f77b3b4291e8411a027b9f9b9e64658181cc676ce6ba9958b95f268cb1109d", + "size": 1234, + "sid": "00634c46e5f7c055c341acd1201cf8289cabe769f991d6e350f8cd8ce2a52ac3", + "ssize": 1282 + }, + [...] + ] + } -This tree contains a file entry. In contrast to the entry above, the `subtree` -field is not present and the `content` field contains a list with one plain -text SHA-256 hash. The storage ID for this ID can in turn be looked up in the -map. Data chunks stored as encrypted files in a sub directory of the directory -`data`, similar to tree objects. +This tree contains a file entry. This time, the `subtree` field is not present +and the `content` field contains a list with one plain text SHA-256 hash. The +storage ID for this ID can in turn be looked up in the map. Data chunks stored +as encrypted files in a sub directory of the directory `data`, similar to tree +objects. -The command `restic cat data` can be used to lookup, extract and decrypt data, -e.g. for the data mentioned above: +The command `restic cat data` can be used to extract and decrypt data given a +storage hash, e.g. for the data mentioned above: - $ restic -r /tmp/restic-repo cat blob 50f77b3b4291e8411a027b9f9b9e64658181cc676ce6ba9958b95f268cb1109d | sha256sum + $ restic -r /tmp/restic-repo cat blob 00634c46e5f7c055c341acd1201cf8289cabe769f991d6e350f8cd8ce2a52ac3 | sha256sum Enter Password for Repository: 50f77b3b4291e8411a027b9f9b9e64658181cc676ce6ba9958b95f268cb1109d - -As can be seen from the output of the program `sha256sum`, the hash is the -same, so the correct data has been returned. +As can be seen from the output of the program `sha256sum`, the hash matches the +plaintext hash from the map included in the tree above, so the correct data has +been returned. Backups and Deduplication =========================