Full rewrite with new flexible funcionality

This commit is contained in:
Rupus Reinefjord 2021-05-19 21:30:10 +02:00
parent 4418d476f7
commit c130ca7ee1
3 changed files with 320 additions and 250 deletions

6
.gitignore vendored
View file

@ -1,3 +1,7 @@
pass.csv pass.csv
venv/
__pycache__/ __pycache__/
*.egg-info/
build/
dist/
venv/

115
README.md
View file

@ -11,92 +11,94 @@ You can install it directly from PyPI with pip:
``` ```
$ pass2csv --help $ pass2csv --help
usage: pass2csv [-h] [-a] [-b GPGBINARY] [-x] [-l LOGIN_FIELDS [LOGIN_FIELDS ...]] [-u] usage: pass2csv [-h] [-b path] [-g executable] [-a] [--encoding encoding] [-o file]
[-e EXCLUDE_ROWS [EXCLUDE_ROWS ...]] [-e pattern [pattern ...]] [-f name pattern] [-l name pattern]
path store_path
positional arguments: positional arguments:
path path to the password-store folder to export store_path path to the password-store to export
optional arguments: optional arguments:
-h, --help show this help message and exit -h, --help show this help message and exit
-a, --agent ask gpg to use its auth agent -b path, --base path path to use as base for grouping passwords
-b GPGBINARY, --gpgbinary GPGBINARY -g executable, --gpg executable
path to the gpg binary you wish to use path to the gpg binary you wish to use (default 'gpg')
-x, --kpx format the CSV for KeePassXC -a, --use-agent ask gpg to use its auth agent
-l LOGIN_FIELDS [LOGIN_FIELDS ...], --login-fields LOGIN_FIELDS [LOGIN_FIELDS ...] --encoding encoding text encoding to use when reading gpg output (default
strings to interpret as names of login fields (only used with -x) 'utf-8')
-u, --get-url match row starting with 'url:' and extract it (only used with -x) -o file, --outfile file
-e EXCLUDE_ROWS [EXCLUDE_ROWS ...], --exclude-rows EXCLUDE_ROWS [EXCLUDE_ROWS ...] file to write exported data to (default stdin)
regexps to exclude from the notes field (only used with -x) -e pattern [pattern ...], --exclude pattern [pattern ...]
regexps for lines which should not be exported
-f name pattern, --get-field name pattern
a name and a regexp, the part of the line matching the
regexp will be removed and the remaining line will be added
to a field with the chosen name. only one match per
password, matching stops after the first match
-l name pattern, --get-line name pattern
a name and a regexp for which all lines that match are
included in a field with the chosen name
``` ```
## Export format ### Format
There are two ways to export CSV data:
1. The format for the KeePass Generic CSV Importer: The output format is
Group(/),Title,Password,Notes Group(/),Title,Password,[custom fields...],Notes
Where 'Password' is the first line of the entry in `pass` and You may add custom fields with `--get-field` or `--get-line`. You supply
'Notes' are all subsequent lines. '\\' should not be interpreted as a name for the field and a regexp pattern. The field name is used for
an escape character. the header of the output CSV and to group multiple patterns for the same
field; you may specify multiple patterns for the same field by
This is the default mode. specifying `--get-field` or`--get-line` multiple times with the same
name. Regexps are case-insensitive.
2. The format for the KeePassXC Importer:
Group(/),Title,Login,Password,URL,Notes
Where 'Password' is the first line of the entry in `pass`, 'User' is
configured with `-l`, URL is extracted if `-u` is
set, and 'Notes' contains any other fields that do not match
`-e`.
'User' field is chosen by searching for the first field with a name
set by `-l`. Once the field is found, the login is set and the field
is removed from notes.
Use `-x` or `--kpx` to enable this mode.
### Example KeePassXC Import ### Examples
- Cmd line
pass2csv ~/.password-store -x -l username login email -u -e '^---$' * Password entry (`~/.password-store/sites/example/login.gpg`):
- Password entry (`sites/example`)
```
password123 password123
--- ---
username: user_name username: user_name
email: user@example.com email user@example.com
url:example.com url:example.com
some_note Some note
```
- Output CSV row (formatted) * Command
sites, example, user_name, password123, example.com, "email: user@example.com\nsome_note" ```
pass2csv ~/.password-store \
--exclude '^---$' \
--get-field Username '(username|email):?' \
--get-field URL 'url:?'
```
- `user_name` was chosen because `username` was the first argument to `-l`. * Output
- Both login and URL fields were excluded from the notes field because they
were used in another field. ```
- `---` Was not included in the notes field because it was matched by `-e`. Group(/),Title,Password,URL,Username,Notes
sites/example,login,password123,example.com,user_name,"email user@example.com\nSome note"
```
### Example KeePass Generic CSV Importer ### Grouping
- Cmd line
pass2csv ~/.password-store The group is relative to the path, or the --base if given.
Given the password `~/.password-store/site/login/password.gpg`:
- Password entry: Same as above $ pass2csv ~/.password-store/site
- Output CSV row (formatted) # Password will have group "login"
sites, example, password123, "---\nusername: user_name\nemail: user@example.com\nurl: example.com\nsome_note" $ pass2csv ~/.password-store/site --base:~/.password-store
# Password will have group "site/login"
## Development ## Development
Create a virtual environment: Create a virtual environment:
python3 -m venv venv python3 -m venv venv
@ -115,6 +117,7 @@ you can install `pip-tools`. The latter is recommended.
### pip-tools ### pip-tools
[pip-tools](https://github.com/jazzband/pip-tools) can keep your virtual [pip-tools](https://github.com/jazzband/pip-tools) can keep your virtual
environment in sync with the `requirements.txt` file, as well as environment in sync with the `requirements.txt` file, as well as
compiling a new `requirements.txt` when adding/removing a dependency in compiling a new `requirements.txt` when adding/removing a dependency in

385
pass2csv
View file

@ -1,214 +1,277 @@
#!/usr/bin/env python3 #!/usr/bin/env python3
import argparse
import csv import csv
import logging import logging
import os
import re
import pathlib import pathlib
from argparse import Action, ArgumentParser, FileType import re
import sys
import gnupg import gnupg
logger = logging.getLogger(__name__)
class CSVExporter:
def __init__(self, kpx_format, login_fields, get_url, exclude_rows):
logging.basicConfig(level=logging.INFO) logging.basicConfig(level=logging.INFO)
self.logger = logger
# Set to True to allow for alternate password csv to be created
# See README for differences
self.kpx_format = kpx_format
if self.kpx_format: def set_meta(entry, path, grouping_base):
# A list of possible fields (in order) that could be converted to pure_path = pathlib.PurePath(path)
# login fields group = pure_path.relative_to(grouping_base).parent
self.login_fields = login_fields or [] if group.name == '':
# Set to True to extract url fields group = ''
self.get_url = get_url entry['group'] = group
# A regular expression list of lines that should be excluded from entry['title'] = pure_path.stem
# the notes field
self.exclude_rows = exclude_rows or []
self.logger.info("Using KPX format: %s", self.kpx_format)
def traverse(self, path): def set_data(entry, data, exclude, get_fields, get_lines):
for root, dirs, files in os.walk(path): lines = data.splitlines()
if '.git' in dirs: tail = lines[1:]
dirs.remove('.git') entry['password'] = lines[0]
for name in files:
yield os.path.join(root, name)
def get_metadata(self, notes_raw): filtered_tail = []
lines = notes_raw.split('\n') for line in tail:
for exclude_pattern in exclude:
# A list of lines to keep as notes (will be joined by newline) if exclude_pattern.search(line):
notes = [] break
# The extracted user field
user = ''
# The extracted URL field
url = ''
# This will extract each field name (for example, if a line in notes
# was `user: user1`, fields should contain 'user')
all_fields = set()
for line in lines:
field_search = re.search('^(.*) ?: ?.*$', line, re.I)
if field_search:
all_fields.add(field_search.group(1))
# Check if any of the fields match the login names
login_fields = [
field for field in self.login_fields if field in all_fields
]
# Get the field to use for the login. Since self.login_fields is in order,
# the 0th element will contain the first match
login_field = None if not login_fields else login_fields[0]
# Iterate through the file again to build the return array
for line in lines:
# If any of the exclusion patterns match, ignore the line
if [pattern for pattern in self.exclude_rows if re.search(pattern, line, re.I)]:
continue
if login_field:
user_search = re.search(
'^' + login_field + ' ?: ?(.*)$', line, re.I)
if user_search:
user = user_search.group(1)
# The user was matched, don't add it to notes
continue
if self.get_url:
url_search = re.search('^url ?: ?(.*)$', line, re.I)
if url_search:
url = url_search.group(1)
# The url was matched, don't add it to notes
continue
notes.append(line)
return (user, url, '\n'.join(notes).strip())
def parse(self, basepath, path, data):
p = pathlib.Path(path)
name = p.stem
self.logger.info("Processing %s", name)
group = os.path.dirname(os.path.relpath(path, basepath))
split_data = data.split('\n', maxsplit=1)
password = split_data[0]
# Perform if/else in case there are no notes for a field
notes = split_data[1] if len(split_data) > 1 else ""
if self.kpx_format:
# We are using the advanced format; try extracting user and url
user, url, notes = self.get_metadata(notes)
return [group, name, user, password, url, notes]
else: else:
# We are not using KPX format; just use notes filtered_tail.append(line)
return [group, name, password, notes]
matching_indices = set()
fields = entry.setdefault('fields', {})
for i, line in enumerate(filtered_tail):
for (name, pattern) in get_fields:
if name in fields:
# multiple patterns with same name, we've already found a match
continue
match = pattern.search(line)
if not match:
continue
inverse_match = line[0:match.start()] + line[match.end():]
value = inverse_match.strip()
fields[name] = value
matching_indices.add(i)
break
matching_lines = {}
for i, line in enumerate(filtered_tail):
for (name, pattern) in get_lines:
match = pattern.search(line)
if not match:
continue
matches = matching_lines.setdefault(name, [])
matches.append(line)
matching_indices.add(i)
break
for name, matches in matching_lines.items():
fields[name] = '\n'.join(matching_lines)
final_tail = []
for i, line in enumerate(filtered_tail):
if i not in matching_indices:
final_tail.append(line)
entry['notes'] = '\n'.join(final_tail).strip()
def main(gpgbinary, use_agent, pass_path, base_path, def write(file, entries, get_fields, get_lines):
kpx_format, login_fields, get_url, exclude_rows, outfile): get_field_names = set(x[0] for x in get_fields)
exporter = CSVExporter(kpx_format, login_fields, get_url, exclude_rows) get_line_names = set(x[0] for x in get_lines)
gpg = gnupg.GPG(use_agent=use_agent, gpgbinary=gpgbinary) field_names = get_field_names | get_line_names
gpg.encoding = 'utf-8' header = ["Group(/)", "Title", "Password", *field_names, "Notes"]
csv_data = [] csvw = csv.writer(file)
for file_path in exporter.traverse(pass_path): logging.info("Writing data to %s", file.name)
if os.path.splitext(file_path)[1] == '.gpg': csvw.writerow(header)
with open(file_path, 'rb') as f: for entry in entries:
data = str(gpg.decrypt_file(f)) fields = [entry['fields'].get(name) for name in field_names]
if len(data) == 0: columns = [
logger.warning("Could not decrypt %s or it is empty.", file_path) entry['group'], entry['title'], entry['password'],
base = base_path if base_path else pass_path *fields,
parsed = exporter.parse(base, file_path, data) entry['notes']
csv_data.append(parsed) ]
csvw.writerow(columns)
writer = csv.writer(outfile, delimiter=',')
writer.writerows(csv_data)
outfile.close()
class ExtendAction(Action): def main(store_path, grouping_base, outfile, gpgbinary, use_agent, encoding,
exclude, get_fields, get_lines):
entries = []
failures = []
path = pathlib.Path(store_path)
grouping_path = pathlib.Path(grouping_base)
gpg = gnupg.GPG(gpgbinary=gpgbinary, use_agent=use_agent)
gpg.encoding = encoding
for file in path.glob('**/*.gpg'):
logging.info("Processing %s", file)
with open(file, 'rb') as fp:
decrypted = gpg.decrypt_file(fp)
if not decrypted.ok:
logging.error("Could not decrypt %s: %s", file, decrypted.status)
failures.append((file, decrypted.status))
continue
entry = {}
set_meta(entry, file, grouping_path)
set_data(entry, str(decrypted), exclude, get_fields, get_lines)
entries.append(entry)
if failures:
for file, status in failures:
logging.warning("Could not decrypt %s: %s", file, status)
write(outfile, entries, get_fields, get_lines)
def parse_args(args):
class ExtendAction(argparse.Action):
# Python 3.8 has 'extend' built in. # Python 3.8 has 'extend' built in.
def __call__(self, parser, namespace, values, option_string=None): def __call__(self, parser, namespace, values, option_string=None):
items = getattr(namespace, self.dest) or [] items = getattr(namespace, self.dest) or []
items.extend(values) items.extend(values)
setattr(namespace, self.dest, items) setattr(namespace, self.dest, items)
parser = argparse.ArgumentParser()
class OptionsParser(ArgumentParser): parser.add_argument(
def __init__(self, *args, **kwargs): 'store_path',
super().__init__(*args, **kwargs)
self.add_argument(
'pass_path',
metavar='path',
type=str, type=str,
help="path to the password-store folder to export", help="path to the password-store to export",
) )
self.add_argument( parser.add_argument(
'-b', '--base', '-b', '--base',
metavar='path',
type=str, type=str,
help="path to use as base for grouping passwords", help="path to use as base for grouping passwords",
dest='base_path' dest='base_path'
) )
self.add_argument( parser.add_argument(
'-a', '--agent', '-g', '--gpg',
metavar='executable',
type=str,
default="gpg",
help="path to the gpg binary you wish to use (default 'gpg')",
dest='gpgbinary'
)
parser.add_argument(
'-a', '--use-agent',
action='store_true', action='store_true',
default=False,
help="ask gpg to use its auth agent", help="ask gpg to use its auth agent",
dest='use_agent', dest='use_agent'
) )
self.add_argument( parser.add_argument(
'-g', '--gpgbinary', '--encoding',
metavar='encoding',
type=str, type=str,
help="path to the gpg binary you wish to use", default="utf-8",
dest='gpgbinary', help="text encoding to use when reading gpg output (default 'utf-8')",
default="gpg" dest='encoding'
) )
self.add_argument( parser.add_argument(
'-o', '--outfile', '-o', '--outfile',
type=FileType('w'), metavar='file',
help="Store to an output file", type=argparse.FileType('w'),
dest='outfile', default="-",
default="-" help="file to write exported data to (default stdin)",
dest='outfile'
) )
self.add_argument( parser.add_argument(
'-x', '--kpx', '-e', '--exclude',
action='store_true', metavar='pattern',
help="format the CSV for KeePassXC",
dest='kpx_format',
)
self.add_argument(
'-l', '--login-fields',
action=ExtendAction, action=ExtendAction,
nargs='+', nargs='+',
type=str, type=str,
help="strings to interpret as names of login fields (only used with -x)" default=[],
help="regexps for lines which should not be exported",
dest='exclude'
) )
self.add_argument( parser.add_argument(
'-u', '--get-url', '-f', '--get-field',
action='store_true', metavar=('name', 'pattern'),
help="match row starting with 'url:' and extract it (only used with -x)" action='append',
) nargs=2,
self.add_argument(
'-e', '--exclude-rows',
action=ExtendAction,
nargs='+',
type=str, type=str,
help="regexps to exclude from the notes field (only used with -x)" default=[],
help=(
"a name and a regexp, the part of the line matching the regexp"
" will be removed and the remaining line will be added to a field"
" with the chosen name. only one match per password, matching"
" stops after the first match"
),
dest='get_fields'
) )
parser.add_argument(
'-l', '--get-line',
metavar=('name', 'pattern'),
action='append',
nargs=2,
type=str,
default=[],
help=(
"a name and a regexp for which all lines that match are included"
" in a field with the chosen name"
),
dest='get_lines'
)
return parser.parse_args(args)
def compile_regexp(pattern):
try:
regexp = re.compile(pattern, re.I)
except re.error as e:
logging.error(
"Could not compile pattern '%s', %s at position %s",
pattern.replace("'", "\\'"), e.msg, e.pos
)
return None
return regexp
if __name__ == '__main__': if __name__ == '__main__':
PARSER = OptionsParser() parsed = parse_args(sys.argv[1:])
ARGS = PARSER.parse_args()
main(**vars(ARGS)) failed = False
exclude_patterns = []
for pattern in parsed.exclude:
regexp = compile_regexp(pattern)
if not regexp:
failed = True
exclude_patterns.append(regexp)
get_fields = []
for (name, pattern) in parsed.get_fields:
regexp = compile_regexp(pattern)
if not regexp:
failed = True
get_fields.append((name, regexp))
get_lines = []
for (name, pattern) in parsed.get_lines:
regexp = compile_regexp(pattern)
if not regexp:
failed = True
get_lines.append((name, regexp))
if failed:
sys.exit(1)
if parsed.base_path:
grouping_base = parsed.base_path
else:
grouping_base = parsed.store_path
kwargs = {
'store_path': parsed.store_path,
'grouping_base': grouping_base,
'gpgbinary': parsed.gpgbinary,
'use_agent': parsed.use_agent,
'encoding': parsed.encoding,
'outfile': parsed.outfile,
'exclude': exclude_patterns,
'get_fields': get_fields,
'get_lines': get_lines
}
main(**kwargs)