Эффективно анализируйте тривиальные файлы с Boost Spirit X3

Question

Эффективно анализируйте тривиальные файлы с Boost Spirit X3

Я новичок в C ++ и Boost Spirit X3. Для моего проекта я анализирую геосоциальный график из двух файлов со следующей структурой с Boost Spirit X3 в график повышения.

У меня есть рабочая реализация. Поскольку у меня нет никакого предыдущего опыта работы с библиотеками, мне интересно, что вы думаете о подходе и посоветуете ли вы использовать другой подход.

В графическом файле есть одна строка для каждого ребра.
При разборе ребер я должен создать узлы графа на тот случай, если узел не был виден раньше. Я использую семантическое действие, которое проверяет каждый раз, когда он встречает идентификатор узла, если этот узел уже находится в графе. Прочитав полную строку, я использую семантическое действие, которое затем добавляет край.

В файле местоположений есть одна строка на известное местоположение узла в данный момент времени. Я сохраняю первое местоположение, известное узлу на графике (используя собственное свойство графика повышения).

У меня есть конкретные вопросы, но я был бы рад получить любые мысли и предложения:

Можно ли использовать вложенные семантические действия, как я делаю для графического файла? Вредит ли это производительности?
Рекомендуется ли сразу анализировать весь файл с помощью Spirit X3, или я должен анализировать каждую строку отдельно с помощью Spirit X3?

График (обозначает ребра на графике)

[user1]     [user2]
0           3

Места

[user]  [check-in time]         [latitude]      [longitude]     [location id]
0       2010-10-19T23:55:27Z    30.2359091167   -97.7951395833      22847

Код разбора Spirit X3

// Parse the gowalla edge file
boost::spirit::istream_iterator file_iterator(edge_file), eof;

x3::phrase_parse(file_iterator, eof,
// Begin grammar
(
*((x3::int_[add_vertex] >> x3::int_[add_vertex])[add_edge])
),
// End grammar
x3::space
);

// Fail if we couldn't parse the whole edges file
if (file_iterator != eof) {
std::cerr << "Couldn't parse whole edges file" << std::endl;
}

// Parse the gowalla location file
file_iterator = boost::spirit::istream_iterator(location_file);

x3::phrase_parse(file_iterator, eof,
// Begin grammar
(
// vertex_id   time of checkin       latitude  longitude             location id
*((x3::int_ >> x3::lexeme[*x3::graph] >> x3::double_ >> x3::double_)[add_location] >> x3::int_ >> x3::eol)
),
// End grammar
x3::blank
);

// Fail if we couldn't parse the whole location file
if (file_iterator != eof) {
std::cerr << "Couldn't parse whole location file" << std::endl;
}

Семантические действия под названием X3

// Lambda function that adds vertex to graph if not already added
auto add_vertex = [&](auto& ctx){
// Return if the vertex is already known
if (vertices.find(x3::_attr(ctx)) != vertices.end())    {
return false;
}

// Otherwise add vertex to graph
auto v = boost::add_vertex(g);

// And add vertex descriptor to map
vertices[x3::_attr(ctx)] = v;
};

// Lambda function that adds edge to graph
auto add_edge = [&](auto& ctx){
// _attr(ctx) returns a boost fusion tuple
auto attr = x3::_attr(ctx);

// Add edge from the vertices returned from context
boost::add_edge(vertices[fusion::at_c<0>(attr)],
vertices[fusion::at_c<1>(attr)], g);
};

// Lambda function that adds locations to vertices in the graph
auto add_location = [&](auto& ctx){
// _attr(ctx) returns a boost fusion tuple
auto attr = x3::_attr(ctx);
auto vertex_id = fusion::at_c<0>(attr);

if (location_already_added.find(vertex_id) != location_already_added.end()) {
// Exit, as we already stored the location for this vertex
return true;
}
location_already_added.insert(vertex_id);

// Test if vertex is in our graph
// We are parsing locations from a different file than the graph,
// so there might be inconsistencies
if (vertices.find(vertex_id) == vertices.end()) {
std::cerr << "Tried to add location to vertex " << vertex_id << ", but this vertex is not in our graph" << std::endl;
return false;
}

auto vertex = vertices[vertex_id];

// Add location to the vertex
g[vertex].latitude = fusion::at_c<2>(attr);
g[vertex].longitude = fusion::at_c<3>(attr);

return true;
};

График повышения

struct vertex_property {
double longitude;
double latitude;
};

// Define our graph
// We use setS to enforce our graph not to become a multigraph
typedef boost::adjacency_list<boost::setS, boost::vecS, boost::undirectedS, vertex_property, edge_property > graph;

3

boost-graph boost-spirit boost-spirit-x3 c++

Решение

Другие решения

Других решений пока нет …

Источник

Accepted Answer

Q. Можно ли использовать вложенные семантические действия, как я делаю для графического файла? Вредит ли это производительности?

Я бы не стал это делать. Вероятно, гораздо проще просто добавить ребра в целом:

x3::parse(file_iterator, eof,
*((x3::int_ >> '\t' >> x3::int_ >> x3::eol)[add_edge])
);

куда add_ege может быть так просто, как:

auto add_edge = [&](auto& ctx){
// Add edge from from context
vertex_decriptor source, target;
auto tup = std::tie(source, target);

fusion::copy(x3::_attr(ctx), tup);

boost::add_edge(map_vertex(source), map_vertex(target), g);
};

Q. Рекомендуется ли сразу анализировать весь файл с помощью Spirit X3, или я должен анализировать каждую строку отдельно с помощью Spirit X3?

Я не думаю, что дух дает какие-либо рекомендации. Я бы сделал весь файл сразу. И я рекомендую использовать отображенные в память файлы, чтобы вы были более эффективными (итерация с произвольным доступом без multi_pass адаптация итератора).

Основные пометки:

вы пытаетесь использовать парсеры с учетом пространства но используя их с istream_iterators. Вы должен не забудьте сбросить skipws флаг на потоке тогда.
vertices карта кажется пустой тратой ресурсов; подумайте, можете ли вы использовать [user] вещь (vertex_id) непосредственно вместо перевода на vertex_descriptor,

Вот очищенная версия, которая анализирует файлы из https://snap.stanford.edu/data/loc-gowalla.html очень хорошо примерно в 19-х (это уже значительно быстрее):

Жить на Колиру

#include <boost/fusion/adapted/std_tuple.hpp>
#include <boost/graph/adjacency_list.hpp>
#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/include/support_istream_iterator.hpp>
#include <fstream>
#include <iostream>

namespace x3 = boost::spirit::x3;
namespace fusion = boost::fusion;

struct vertex_property {
double longitude;
double latitude;
};

struct edge_property { };

struct Reader {
bool read_edges(std::string fname) {
// Lambda function that adds edge to graph
auto add_edge = [this](auto& ctx){
// Add edge from from context
vertex_decriptor source, target;
auto tup = std::tie(source, target);

fusion::copy(x3::_attr(ctx), tup);

boost::add_edge(this->map_vertex(source), this->map_vertex(target), g);
};

// Parse the gowalla edge file
std::ifstream edge_file(fname);
if (!edge_file) return false;

boost::spirit::istream_iterator file_iterator(edge_file >> std::noskipws), eof;

x3::parse(file_iterator, eof, *((x3::int_ >> '\t' >> x3::int_ >> x3::eol)[add_edge]));

// Fail if we couldn't parse the whole edges file
return (file_iterator == eof);
}

bool read_locations(std::string fname) {
// Lambda function that adds locations to vertices in the graph
auto add_location = [&](auto& ctx){
// _attr(ctx) returns a boost fusion tuple
auto attr = x3::_attr(ctx);
auto vertex_id = fusion::at_c<0>(attr);

if (!location_already_added.insert(vertex_id).second)
return true; // Exit, as we already stored the location for this vertex

// Test if vertex is in our graph
// We are parsing locations from a different file than the graph, so
// there might be inconsistencies
auto mapped = mapped_vertices.find(vertex_id);
if (mapped == mapped_vertices.end()) {
std::cerr << "Tried to add location to vertex " << vertex_id << ", but this vertex is not in our graph" << std::endl;
return false;
}

// Add location to the vertex
auto& props = g[mapped->second];
props.latitude  = fusion::at_c<1>(attr);
props.longitude = fusion::at_c<2>(attr);

return true;
};

// Parse the gowalla location file
std::ifstream location_file(fname);
if (!location_file) return false;

boost::spirit::istream_iterator file_iterator(location_file >> std::noskipws), eof;

x3::parse(file_iterator, eof,
// [vertex_id]   [time of checkin]       [latitude]  [longitude]             [location] id
*((x3::int_ >> '\t' >> x3::omit[*x3::graph] >> '\t' >> x3::double_ >> '\t' >> x3::double_)[add_location] >> '\t' >> x3::int_ >> x3::eol)
);

// Fail if we couldn't parse the whole location file
return (file_iterator == eof);
}

private:
// We use setS to enforce our graph not to become a multigraph
typedef boost::adjacency_list<boost::setS, boost::vecS, boost::undirectedS, vertex_property, edge_property> graph;
using vertex_decriptor = graph::vertex_descriptor;

std::map<int, vertex_decriptor> mapped_vertices;
std::set<int> location_already_added;
graph g;

// Lambda function that adds vertex to graph if not already added
vertex_decriptor map_vertex(int id) {
auto match = mapped_vertices.find(id);

if (match != mapped_vertices.end())
return match->second; // vertex already known
else                      // Otherwise add vertex
return mapped_vertices[id] = boost::add_vertex(g);
};
};

int main() {
Reader reader;
if (!reader.read_edges("loc-gowalla_edges.txt"))
std::cerr << "Couldn't parse whole edges file" << std::endl;

if (!reader.read_locations("loc-gowalla_totalCheckins.txt"))
std::cerr << "Couldn't parse whole location file" << std::endl;
}

Сопоставленные файлы

Для сравнения, замена отображенными в память файлами делает это МНОГО быстрее: завершается за 3 секунды (это более чем в 6 раз быстрее снова):

Жить на Колиру

Пример измененного фрагмента:

    boost::iostreams::mapped_file_source mm(fname);
auto f = mm.begin(), l = mm.end();
x3::parse(f, l, *((x3::int_ >> '\t' >> x3::int_ >> x3::eol)[add_edge]));

Накладные расходы памяти

После профилирования. похоже, что наличие карты / набора не так уж и плохо:

Из того, что я вижу, программа использует 152MiB, из которых только 4.1 отображаются как location_already_added на первый взгляд.

Сокращение использования памяти и времени

Несмотря на это, замена set<int> location_already_added с динамическим набором битов и удаления map<int, vertex_descriptor> еще больше уменьшает использование памяти так же как время выполнения программы.

На этот раз он завершается в до 2 с (еще 33%).

Занимает примерно На 10% меньше памяти по понятным причинам: 138,7 МиБ.

Жить на Колиру

Изменения:

#include <boost/fusion/adapted/std_tuple.hpp>
#include <boost/graph/adjacency_list.hpp>
#include <boost/spirit/home/x3.hpp>
#include <boost/iostreams/device/mapped_file.hpp>
#include <boost/dynamic_bitset.hpp>
#include <fstream>
#include <iostream>

namespace x3 = boost::spirit::x3;
namespace fusion = boost::fusion;

struct vertex_property {
double longitude;
double latitude;
};

struct edge_property { };

struct Reader {
Reader() {
g.m_vertices.reserve(1024);
}

bool read_edges(std::string fname) {
// Lambda function that adds edge to graph
auto add_edge = [this](auto& ctx){
// Add edge from from context
vertex_decriptor source, target;
auto tup = std::tie(source, target);

fusion::copy(x3::_attr(ctx), tup);

boost::add_edge(this->map_vertex(source), this->map_vertex(target), g);
};

// Parse the gowalla edge file
boost::iostreams::mapped_file_source mm(fname);

auto f = mm.begin(), l = mm.end();

x3::parse(f, l, *((x3::int_ >> '\t' >> x3::int_ >> x3::eol)[add_edge]));

// Fail if we couldn't parse the whole edges file
return f == l;
}

bool read_locations(std::string fname) {
boost::dynamic_bitset<> location_already_added(num_vertices(g));

// Lambda function that adds locations to vertices in the graph
auto add_location = [&](auto& ctx){
// _attr(ctx) returns a boost fusion tuple
auto const& attr = x3::_attr(ctx);
auto vertex_id = fusion::at_c<0>(attr);

if (location_already_added.test(vertex_id))
return true; // Exit, as we already stored the location for this vertex
location_already_added.set(vertex_id);

// Test if vertex is in our graph
// We are parsing locations from a different file than the graph, so
// there might be inconsistencies
auto mapped = this->mapped_vertex(vertex_id);
if (graph::null_vertex() == mapped) {
std::cerr << "Tried to add location to vertex " << vertex_id << ", but this vertex is not in our graph" << std::endl;
return false;
}

// Add location to the vertex
auto& props = g[mapped];
props.latitude  = fusion::at_c<1>(attr);
props.longitude = fusion::at_c<2>(attr);

return true;
};

// Parse the gowalla location file
std::ifstream location_file(fname);
if (!location_file) return false;

boost::iostreams::mapped_file_source mm(fname);

auto f = mm.begin(), l = mm.end();

x3::parse(f, l,
// [vertex_id]   [time of checkin]       [latitude]  [longitude]             [location] id
*((x3::int_ >> '\t' >> x3::omit[*x3::graph] >> '\t' >> x3::double_ >> '\t' >> x3::double_)[add_location] >> '\t' >> x3::int_ >> x3::eol)
);

// Fail if we couldn't parse the whole location file
return f == l;
}

typedef boost::adjacency_list<boost::setS, boost::vecS, boost::undirectedS, vertex_property, edge_property> graph;
private:
// We use setS to enforce our graph not to become a multigraph
using vertex_decriptor = graph::vertex_descriptor;

graph g;

#if USE_VERTEX_DESCRIPTOR_MAPPING
std::map<int, vertex_decriptor> mapped_vertices;

vertex_decriptor map_vertex(int id) {
auto match = mapped_vertices.find(id);

if (match != mapped_vertices.end())
return match->second; // vertex already known
else                      // Otherwise add vertex
return mapped_vertices[id] = boost::add_vertex(g);
};

vertex_decriptor mapped_vertex(int id) const {
auto mapped = mapped_vertices.find(id);

return mapped == mapped_vertices.end()
? return graph::null_vertex()
: mapped->second;
}
#else
static vertex_decriptor map_vertex(int id) { return id; }
static vertex_decriptor mapped_vertex(int id) { return id; }
#endif
};

int main() {
Reader reader;
if (!reader.read_edges("loc-gowalla_edges.txt"))
std::cerr << "Couldn't parse whole edges file" << std::endl;

if (!reader.read_locations("loc-gowalla_totalCheckins.txt"))
std::cerr << "Couldn't parse whole location file" << std::endl;
}

6