respace  1.0

Table of Contents

Introduction

respacer is part of a larger project of mine that requires reconstructing sentences that have lost their whitespace. For example, given an input of "itisiyourking", we would like to produce "it is i your king", the most likely sentence originally. This mini-project serves to capture the work done for that purpose.

To achieve this goal, I have drawn from recent experience working with natural language processing tools.

I dare claim that it's performance is up there but I have little to compare it against. It is fast enough in the context of the aforementioned larger project so I will most probably not spend more time to optimize. Note also that I've only ever run this project on MacOS but the code is cross-platform and so are its dependencies.

Technical considerations

Dependencies

respacer is dependent on libkenlm. libkenlm provides the language model analysis facilities. libkenlm is itself dependent on libboost-system, libboost-thread, libz and libbz2.

Run-time inputs

In order to use respacer, one must supply two files at run-time:

Thanks

Sample code

This sample code produces an executable that reads a string from standard input and produces a sentence including spaces on the standard output. It uses two included files:

#include "respacer.h"
#include <lm/model.hh>
#include <iostream>
#include <string>
using namespace std;
int main(int argc, const char * argv[])
{
respacer r("./aspell_en_expanded", "./romeo_and_juliet.mmap");
string sentence;
getline(cin, sentence);
auto const respaced = r.respace(sentence);
for(auto i = respaced.begin(); i != respaced.end(); ++i)
{
cout << (i == respaced.begin() ? "" : " ") << *i;
}
cout << endl;
return 0;
}

License

(C) Copyright Thierry Seegers 2015. Distributed under the following license:

Boost Software License - Version 1.0 - August 17th, 2003

Permission is hereby granted, free of charge, to any person or organization
obtaining a copy of the software and accompanying documentation covered by
this license (the "Software") to use, reproduce, display, distribute,
execute, and transmit the Software, and to prepare derivative works of the
Software, and to permit third-parties to whom the Software is furnished to
do so, all subject to the following:

The copyright notices in the Software and this entire statement, including
the above license grant, this restriction and the following disclaimer,
must be included in all copies of the Software, in whole or in part, and
all derivative works of the Software, unless such copies or derivative
works are solely in the form of machine-executable object code generated by
a source language processor.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.