In my ongoing quest to get .po files sensibly under git version control, I have written this script for use in diff and filter-clean:
msgcat --no-location --no-wrap --sort-output - | msgattrib --no-obsolete - | grep -Ev '^"POT-Creation-Date|^"PO-Revision-Date|^"Last-Translator|^"X-Generator'
Each of the steps accomplishes one crucial function:
- Refactors and unrelated code changes make source location and entry order change incessantly. This msgcat invocation normalises those, so the commit-tet files do not have these changes.
- Obsolete entries double the size of change sets whenever a msgid changes. We have version control -> this msgattrib invocation removes them.
- These fields in the metadata entry of a .po file change every time someone edits it with a dedicated tool – we have version control -> this grep invocation removes them.
Now another hurdle came up. Notice that --no-wrap
in the front? It does not work.
Furthermore, the wrapping is not stable: Every odd commit, someone gets a word flipped between lines without human changes.
Way furthermore, related tool Poedit also has a “Wrap at” setting, disabling which also does not work. (Because it delegates to the gettext tools, no surprise there. But it means there is no way for me to normalise the files ‘just this once’ and be done with it. – Hence I use a git filter in the first place.)
Very well, I figured, and tested this invocation to remove them: sed -z -e 's/"n"//'
.
It is to concatenate any wrapped lines in ‘msgid’ and ‘msgstr’, such that the tools may wrap them however they please, every day another way, for all I care.
- It works stand-alone:
sed -z -e 's/"n"//' test.po
prints the expected output. - It works when piped into from cat:
cat test.po | sed -z -e 's/"n"//'
prints the expected output. - When added to the existing script, however, the sed step has no effect.
It would seem that it receives the input line-by-line. I tried putting it in front of grep for no improvement, and I cannot move it in front of the other two, as msgcat and msgattrib have their “I like my wrappings random” issue that sparked the whole ordeal to begin with.
Can I stream the output of grep, or more generally the output of a line-based pipe, so my stream editor can do its thing?
(Note: There are a lot of answers on this site that go “sed cannot do this” – but sed is specifically designed to not care about line breaks, so that sounds like misinformation. From the man page:
Sed is a stream editor. A stream editor is used to perform basic text transformations on an input stream (a file or input from a pipeline). […] sed works by making only one pass over the input(s)[.]
emphasis mine
The problem is quite clearly, that there is more than one input, the very string I want to replace, split between them.)