[skip ci] Update README

This commit is contained in:
ado 2023-08-06 12:26:54 +02:00
parent 6a832ba11a
commit 974257e099

104
README.md
View File

@ -17,11 +17,11 @@
A header only "csv" parser which is fast and versatile with modern C++ api. Requires compiler with C++17 support. [Can also be used to convert strings to specific types.](#The-converter) A header only "csv" parser which is fast and versatile with modern C++ api. Requires compiler with C++17 support. [Can also be used to convert strings to specific types.](#The-converter)
Conversion for floating point values invoked using [fast-float](https://github.com/fastfloat/fast_float) . Conversion for floating point values invoked using [fast-float](https://github.com/fastfloat/fast_float) .\
Function traits taken from [qt-creator](https://code.woboq.org/qt5/qt-creator/src/libs/utils/functiontraits.h.html) . Function traits taken from [qt-creator](https://code.woboq.org/qt5/qt-creator/src/libs/utils/functiontraits.h.html) .\
# Example # Example
Lets say we have a csv file containing students in a given format 'ID,AGE,GRADE' and we want to parse and print all the valid values: Lets say we have a csv file containing students in a given format |Id,Age,Grade| and we want to parse and print all the valid values:
```shell ```shell
$ cat students.csv $ cat students.csv
@ -34,7 +34,7 @@ Bill (Heath) Gates,65,3.3
#include <ss/parser.hpp> #include <ss/parser.hpp>
int main() { int main() {
ss::parser<ss::throw_on_error> p{"students.csv", ","}; ss::parser<ss::throw_on_error> p{"students.csv"};
for (const auto& [id, age, grade] : p.iterate<std::string, int, float>()) { for (const auto& [id, age, grade] : p.iterate<std::string, int, float>()) {
std::cout << id << ' ' << age << ' ' << grade << std::endl; std::cout << id << ' ' << age << ' ' << grade << std::endl;
@ -58,9 +58,9 @@ Bill (Heath) Gates 65 3.3
* [Works with quotes, escapes and spacings](#setup) * [Works with quotes, escapes and spacings](#setup)
* [Works with values containing new lines](#multiline) * [Works with values containing new lines](#multiline)
* [Columns and rows can be ignored](#special-types) * [Columns and rows can be ignored](#special-types)
* Works with any type of delimiter * [Works with any type of delimiter](#delimiter)
* Can return whole objects composed of converted values * Can return whole objects composed of converted values
* [Descriptive error handling can be enabled](#error-handling) * [Error handling can be configured](#error-handling)
* [Restrictions can be added for each column](#restrictions) * [Restrictions can be added for each column](#restrictions)
* [Works with `std::optional` and `std::variant`](#special-types) * [Works with `std::optional` and `std::variant`](#special-types)
* Works with **`CRLF`** and **`LF`** * Works with **`CRLF`** and **`LF`**
@ -80,7 +80,7 @@ $ cmake --configure .
$ sudo make install $ sudo make install
``` ```
*Note, this will also install the fast_float library* *Note, this will also install the fast_float library.*\
The library supports [CMake](#Cmake) and [meson](#Meson) build systems The library supports [CMake](#Cmake) and [meson](#Meson) build systems
# Usage # Usage
@ -97,7 +97,7 @@ Bill (Heath) Gates,65,3.3
``` ```
```cpp ```cpp
// ... // ...
ss::parser<ss::throw_on_error> p{"students.csv", ","}; ss::parser<ss::throw_on_error> p{"students_with_header.csv"};
p.use_fields("Id", "Grade"); p.use_fields("Id", "Grade");
for(const auto& [id, grade] : p.iterate<std::string, float>()) { for(const auto& [id, grade] : p.iterate<std::string, float>()) {
@ -118,13 +118,13 @@ ss::parser<ss::ignore_header> p{file_name};
The fields with which the parser works with can be modified at any given time. The praser can also check if a field is present within the header by using the **`field_exists`** method. The fields with which the parser works with can be modified at any given time. The praser can also check if a field is present within the header by using the **`field_exists`** method.
```cpp ```cpp
// ... // ...
ss::parser<ss::throw_on_error> p{"students.csv", ","}; ss::parser<ss::throw_on_error> p{"students_with_header.csv"};
p.use_fields("Id", "Grade"); p.use_fields("Id", "Grade");
const auto& [id, grade] = p.get_next<std::string, float>(); const auto& [id, grade] = p.get_next<std::string, float>();
std::cout << id << ' ' << grade << std::endl; std::cout << id << ' ' << grade << std::endl;
if (p.field_exists("Age")) { if (p.field_exists("Id")) {
p.use_fields("Grade", "Id"); p.use_fields("Grade", "Id");
for (const auto& [grade, id] : p.iterate<float, std::string>()) { for (const auto& [grade, id] : p.iterate<float, std::string>()) {
std::cout << grade << ' ' << id << std::endl; std::cout << grade << ' ' << id << std::endl;
@ -135,18 +135,20 @@ The fields with which the parser works with can be modified at any given time. T
```shell ```shell
$ ./a.out $ ./a.out
James Bailey 2.5 James Bailey 2.5
1.9 Brian S. Wolfe 40 40 Brian S. Wolfe
3.3 Bill (Heath) Gates 65 65 Bill (Heath) Gates
``` ```
## Conversions ## Conversions
An alternate loop to the example above would look like: An alternate loop to the example above would look like:
```cpp ```cpp
// ...
ss::parser<ss::throw_on_error> p{"students.csv"};
while (!p.eof()) { while (!p.eof()) {
auto [id, age, grade] = p.get_next<std::string, int, float>(); const auto& [id, age, grade] = p.get_next<std::string, int, float>();
if (p.valid()) {
std::cout << id << ' ' << age << ' ' << grade << std::endl; std::cout << id << ' ' << age << ' ' << grade << std::endl;
} }
} // ...
``` ```
The alternate example will be used to show some of the features of the library. The **`get_next`** method returns a tuple of objects specified inside the template type list. The alternate example will be used to show some of the features of the library. The **`get_next`** method returns a tuple of objects specified inside the template type list.
@ -160,7 +162,7 @@ using student = std::tuple<std::string, int, float>;
// returns std::tuple<std::string, int, float> // returns std::tuple<std::string, int, float>
auto [id, age, grade] = p.get_next<student>(); auto [id, age, grade] = p.get_next<student>();
``` ```
*Note, it does not always return a student tuple since the returned tuples parameters may be altered as explained below (no void, no restrictions, ...)* *Note, it does not always return the specified tuple since the returned tuples parameters may be altered as explained below (no void, no restrictions, ...)*
Whole objects can be returned using the **`get_object`** function which takes the tuple, created in a similar way as **`get_next`** does it, and creates an object out of it: Whole objects can be returned using the **`get_object`** function which takes the tuple, created in a similar way as **`get_next`** does it, and creates an object out of it:
```cpp ```cpp
@ -176,13 +178,12 @@ auto student = p.get_object<student, std::string, int, float>();
``` ```
This works with any object if the constructor could be invoked using the template arguments given to **`get_object`**: This works with any object if the constructor could be invoked using the template arguments given to **`get_object`**:
```cpp ```cpp
// returns std::vector<std::string> containing 3 elements // returns std::vector<std::string> containing 2 elements
auto vec = p.get_object<std::vector<std::string>, std::string, std::string, auto vec = p.get_object<std::vector<std::string>, std::string, std::string>();
std::string>();
``` ```
An iteration loop as in the first example which returns objects would look like: An iterator loop as in the first example which returns objects would look like:
```cpp ```cpp
for(const auto& student : p.iterate_object<student, std::string, int, float>()) { for (const student& s : p.iterate_object<student, std::string, int, float>()) {
// ... // ...
} }
``` ```
@ -221,7 +222,14 @@ using my_setup = ss::setup<ss::escape<'\\'>, ss::quote<'"'>>;
ss::parser<my_setup> p2{file_name}; ss::parser<my_setup> p2{file_name};
``` ```
Invalid setups will be met with **`static_asserts`**. Invalid setups will be met with **`static_asserts`**.
*Note, each setup parameter defined comes with a slight performance loss, so use them only if needed.* *Note, most setup parameters defined come with a slight performance loss, so use them only if needed.*
### Delimiter
By default, **`,`** is used as the default delimiter, a custom delimiter can be specified as the second constructor parameter.
```cpp
ss::parser p{file_name, "--"};
```
*Note, the delimiter can consist of multiple characters but the parser is slightliy faster when using single character delimiters.*
### Empty lines ### Empty lines
Empty lines can be ignored by defining **`ss::ignore_empty`** within the setup parameters: Empty lines can be ignored by defining **`ss::ignore_empty`** within the setup parameters:
@ -303,14 +311,11 @@ ss::parser<ss::escape<'\\'>,
ss::trim<' ', '\t'>, ss::trim<' ', '\t'>,
ss::multiline_restricted<5>> p{file_name}; ss::multiline_restricted<5>> p{file_name};
while(!p.eof()) { for (const auto& [id, age, grade] : p.iterate<std::string, int, float>()) {
auto [id, age, grade] = p.get_next<std::string, int, float>(); if (p.valid()) {
if(!p.valid()) { std::cout << "'" << id << ' ' << age << ' ' << grade << "'\n";
continue;
} }
std::cout << "'" << id << ' ' << age << ' ' << grade << "'" << std::endl;
} }
``` ```
input: input:
``` ```
@ -331,7 +336,7 @@ Gates 65 3.3'
``` ```
## Special types ## Special types
Passing **`void`** makes the parser ignore a column. In the given example **`void`** could be given as the second template parameter to ignore the second (age) column in the csv, a tuple of only 2 parameters would be retuned: Passing **`void`** makes the parser ignore a column. In the initial example **`void`** could be given as the second template parameter to ignore the second (age) column in the csv, a tuple of only 2 parameters would be retuned:
```cpp ```cpp
// returns std::tuple<std::string, float> // returns std::tuple<std::string, float>
auto [id, grade] = p.get_next<std::string, void, float>(); auto [id, grade] = p.get_next<std::string, void, float>();
@ -357,9 +362,9 @@ bool parser::ignore_next();
```cpp ```cpp
// returns std::tuple<std::string, int, std::optional<float>> // returns std::tuple<std::string, int, std::optional<float>>
auto [id, age, grade] = p.get_next<std::string, int, std::optional<float>(); auto [id, age, grade] = p.get_next<std::string, int, std::optional<float>>();
if (grade) { if (grade) {
// do something with grade std::cout << grade.value() << std::endl;
} }
``` ```
Similar to **`std::optional`**, **`std::variant`** could be used to try other conversions if the previous failed _(Note, conversion to std::string will always pass)_: Similar to **`std::optional`**, **`std::variant`** could be used to try other conversions if the previous failed _(Note, conversion to std::string will always pass)_:
@ -407,9 +412,11 @@ struct even {
}; };
``` ```
```cpp ```cpp
// only even numbers will pass // ...
// only even numbers will pass without invoking error handling
// returns std::tuple<std::string, int> // returns std::tuple<std::string, int>
auto [id, age] = p.get_next<std::string, even<int>, void>(); const auto& [id, age] = p.get_next<std::string, even<int>, void>();
// ...
``` ```
## Custom conversions ## Custom conversions
@ -436,6 +443,8 @@ The shape enum will be used in an example below. The **`inline`** is there just
## Error handling ## Error handling
By default, the parser handles errors only using the **`valid`** method which would return **`false`** if the file could not be opened, or if the conversion could not be made (invalid types, invalid number of columns, ...). The **`eof`** method can be used to detect if the end of the file was reached.
Detailed error messages can be accessed via the **`error_msg`** method, and to enable them **`ss::string_error`** needs to be included in the setup. If **`ss::string_error`** is not defined, the **`error_msg`** method will not be defined either. Detailed error messages can be accessed via the **`error_msg`** method, and to enable them **`ss::string_error`** needs to be included in the setup. If **`ss::string_error`** is not defined, the **`error_msg`** method will not be defined either.
```cpp ```cpp
@ -446,11 +455,18 @@ bool parser::eof();
// ... // ...
ss::parser<ss::string_error> parser; ss::parser<ss::string_error> parser;
``` ```
An error can be detected using the **`valid`** method which would return **`false`** if the file could not be opened, or if the conversion could not be made (invalid types, invalid number of columns, ...). The **`eof`** method can be used to detect if the end of the file was reached.
The above two methods are prefferable if invalid inputs are expected and allows for fast handling, but the parser can also be forced to throw an exception in case of an invalid input using the **`ss::throw_on_error`** setup option.
```cpp
ss::parser<ss::throw_on_error> parser;
```
*Note, enabling this option will also make the parser throw if the constructor fails.*\
## Substitute conversions ## Substitute conversions
The parser can also be used to effectively parse files whose rows are not always in the same format (not a classical csv but still csv-like). A more complicated example would be the best way to demonstrate such a scenario. The parser can also be used to effectively parse files whose rows are not always in the same format (not a classical csv but still csv-like). A more complicated example would be the best way to demonstrate such a scenario.\
***Important, substitute conversions do not work when throw_on_error is enabled.***\
Supposing we have a file containing different shapes in given formats: Supposing we have a file containing different shapes in given formats:
* circle RADIUS * circle RADIUS
@ -484,6 +500,11 @@ while (!p.eof()) {
.or_else<ss::nx<shape, shape::triangle>, udbl, udbl, udbl>() .or_else<ss::nx<shape, shape::triangle>, udbl, udbl, udbl>()
.values(); .values();
if (!p.valid()) {
// handle error
continue;
}
if (circle_or_square) { if (circle_or_square) {
auto& [s, x] = circle_or_square.value(); auto& [s, x] = circle_or_square.value();
double area = (s == shape::circle) ? x * x * M_PI : x * x; double area = (s == shape::circle) ? x * x * M_PI : x * x;
@ -527,6 +548,7 @@ Each of those **`composite`** conversions can accept a lambda (or anything calla
// non negative double // non negative double
using udbl = ss::gte<double, 0>; using udbl = ss::gte<double, 0>;
while (!p.eof()) {
p.try_next<ss::nx<shape, shape::circle, shape::square>, udbl>( p.try_next<ss::nx<shape, shape::circle, shape::square>, udbl>(
[&](const auto& data) { [&](const auto& data) {
const auto& [s, x] = data; const auto& [s, x] = data;
@ -534,24 +556,26 @@ p.try_next<ss::nx<shape, shape::circle, shape::square>, udbl>(
shapes.emplace_back(s, area); shapes.emplace_back(s, area);
}) })
.or_else<ss::nx<shape, shape::rectangle>, udbl, udbl>( .or_else<ss::nx<shape, shape::rectangle>, udbl, udbl>(
[&](const shape s, const double a, const double b) { [&](shape s, double a, double b) { shapes.emplace_back(s, a * b); })
shapes.emplace_back(s, a * b);
})
.or_else<ss::nx<shape, shape::triangle>, udbl, udbl, udbl>( .or_else<ss::nx<shape, shape::triangle>, udbl, udbl, udbl>(
[&](auto&& s, auto& a, const double& b, double& c) { [&](auto s, auto a, auto b, auto c) {
double sh = (a + b + c) / 2; double sh = (a + b + c) / 2;
if (sh >= a && sh >= b && sh >= c) { if (sh >= a && sh >= b && sh >= c) {
double area = sqrt(sh * (sh - a) * (sh - b) * (sh - c)); double area = sqrt(sh * (sh - a) * (sh - b) * (sh - c));
shapes.emplace_back(s, area); shapes.emplace_back(s, area);
} }
})
.on_error([] {
// handle error
}); });
}
``` ```
It is a bit less readable, but it removes the need to check which conversion was invoked. The **`composite`** also has an **`on_error`** method which accepts a lambda which will be invoked if no previous conversions were successful. The lambda can take no arguments or just one argument, an **`std::string`**, in which the error message is stored if **`string_error`** is enabled: It is a bit less readable, but it removes the need to check which conversion was invoked. The **`composite`** also has an **`on_error`** method which accepts a lambda which will be invoked if no previous conversions were successful. The lambda can take no arguments or just one argument, an **`std::string`**, in which the error message is stored if **`string_error`** is enabled:
```cpp ```cpp
p.try_next<int>() p.try_next<int>()
.on_error([](const std::string& e) { /* int conversion failed */ }) .on_error([](const std::string& e) { /* int conversion failed */ })
.or_object<x, double>() .or_object<x, double>()
.on_error([] { /* int and x (all) conversions failed */ }); .on_error([] { /* int and x conversions failed (all previous failed) */ });
``` ```
*See unit tests for more examples.* *See unit tests for more examples.*