Recovering lost / deleted OpenDocument file

Post by **^rooker** » Thu Nov 17, 2011 2:08 pm

A friend of mine accidentally killed one of her text documents and found it with 0 Byte size. Unfortunately, it has been over a month since this accident, until she discovered it - and so the rdiff-backup also couldn't help

1) Search for zip files:
Why "zip" files?
Because OpenDocument files (ods, odt, odg, ...) are actually packed XMLs (and some other stuff) - but technically, they're plain .zip files.
In my case, I was looking for a text document (.odt in original).

2) Use "scalpel" to find the zips:
Great thing that GNU/Linux systems are equipped with top-notch professional forensic-tools, so my plan was to use "scalpel" to find all traces of ".zip" files on her disk.

2b) Enable the filetype "zip" in scalpel's config:
Edit the file /etc/scalpel/scalpel.conf and search for "zip". Then uncomment the zip line to look like this:

Code: Select all

#---------------------------------------------------------------------
# MISCELLANEOUS
#---------------------------------------------------------------------
#
zip y   10000000    PK\x03\x04  \x3c\xac
#
#   java    y   1000000 \xca\xfe\xba\xbe
#

IMPORTANT: Make sure that all other filetypes are commented out.

3) Let scalpel search the disk:

Code: Select all

scalpel -v -o /path/to/results /dev/sdX

I've used scalpel to search the whole disk (not the partition) - so it's /dev/sda, not /dev/sda1 - but depending on your case (and size of the disk), it might be better/faster to just search a single partition.

In my case it was a 500 GB disk - but it didn't take that long. Probably something around 2 hours or so.

4) Find the "good" files:
Since scalpel will carve out files which actually just looked like zips, but are junk, it's necessary to sort the good files out.
I've used "unzip" to check if the files were complete garbage or not.

In order to check several thousand files that scalpel carved out (> 20.000 in my case), I wrote a short bash script to sort all zips according to the error result code that unzip returned.
The script's a quick-n-dirty hack, but I'm sure you can adapt it to fit your needs.

What it does is:

a) Run "unzip -l" (list) on each zip file
b) Take the return value of that execution (RESULT=$?)...
c) and if it's not '9' (=invalid file), copy the zip into a subfolder, according to the error code (usually 0, 1 or 2)

5) Have something to identify the file with:
There were several odt files stored on that disk, but I was looking for a certain one that got lost.
IMPORTANT: You need to know a string that appears in the file you're looking for!
(hint: If you know the text-title headline of the document, you could use that)

6) Now, find the "right" file:
Unzip all the "valid" zips into subfolders for each file (you can use my script for that), and then use "grep" to search for the string:

Code: Select all

grep -lr "text I am looking for" *

This will give you the folder/filename of files containing the string you've been looking for, which is usually "content.xml" in an OpenDocument text.

That's it!
In my case, that narrowed >20.000 files down to 2 - and one of them was a match.