[skip ci] Update README

This commit is contained in:
ado 2023-08-06 12:26:54 +02:00
parent 6a832ba11a
commit 974257e099

140
README.md
View File

@ -17,11 +17,11 @@
A header only "csv" parser which is fast and versatile with modern C++ api. Requires compiler with C++17 support. [Can also be used to convert strings to specific types.](#The-converter)
Conversion for floating point values invoked using [fast-float](https://github.com/fastfloat/fast_float) .
Function traits taken from [qt-creator](https://code.woboq.org/qt5/qt-creator/src/libs/utils/functiontraits.h.html) .
Conversion for floating point values invoked using [fast-float](https://github.com/fastfloat/fast_float) .\
Function traits taken from [qt-creator](https://code.woboq.org/qt5/qt-creator/src/libs/utils/functiontraits.h.html) .\
# Example
Lets say we have a csv file containing students in a given format 'ID,AGE,GRADE' and we want to parse and print all the valid values:
Lets say we have a csv file containing students in a given format |Id,Age,Grade| and we want to parse and print all the valid values:
```shell
$ cat students.csv
@ -34,7 +34,7 @@ Bill (Heath) Gates,65,3.3
#include <ss/parser.hpp>
int main() {
ss::parser<ss::throw_on_error> p{"students.csv", ","};
ss::parser<ss::throw_on_error> p{"students.csv"};
for (const auto& [id, age, grade] : p.iterate<std::string, int, float>()) {
std::cout << id << ' ' << age << ' ' << grade << std::endl;
@ -58,9 +58,9 @@ Bill (Heath) Gates 65 3.3
* [Works with quotes, escapes and spacings](#setup)
* [Works with values containing new lines](#multiline)
* [Columns and rows can be ignored](#special-types)
* Works with any type of delimiter
* [Works with any type of delimiter](#delimiter)
* Can return whole objects composed of converted values
* [Descriptive error handling can be enabled](#error-handling)
* [Error handling can be configured](#error-handling)
* [Restrictions can be added for each column](#restrictions)
* [Works with `std::optional` and `std::variant`](#special-types)
* Works with **`CRLF`** and **`LF`**
@ -80,7 +80,7 @@ $ cmake --configure .
$ sudo make install
```
*Note, this will also install the fast_float library*
*Note, this will also install the fast_float library.*\
The library supports [CMake](#Cmake) and [meson](#Meson) build systems
# Usage
@ -97,7 +97,7 @@ Bill (Heath) Gates,65,3.3
```
```cpp
// ...
ss::parser<ss::throw_on_error> p{"students.csv", ","};
ss::parser<ss::throw_on_error> p{"students_with_header.csv"};
p.use_fields("Id", "Grade");
for(const auto& [id, grade] : p.iterate<std::string, float>()) {
@ -118,13 +118,13 @@ ss::parser<ss::ignore_header> p{file_name};
The fields with which the parser works with can be modified at any given time. The praser can also check if a field is present within the header by using the **`field_exists`** method.
```cpp
// ...
ss::parser<ss::throw_on_error> p{"students.csv", ","};
ss::parser<ss::throw_on_error> p{"students_with_header.csv"};
p.use_fields("Id", "Grade");
const auto& [id, grade] = p.get_next<std::string, float>();
std::cout << id << ' ' << grade << std::endl;
if (p.field_exists("Age")) {
if (p.field_exists("Id")) {
p.use_fields("Grade", "Id");
for (const auto& [grade, id] : p.iterate<float, std::string>()) {
std::cout << grade << ' ' << id << std::endl;
@ -135,18 +135,20 @@ The fields with which the parser works with can be modified at any given time. T
```shell
$ ./a.out
James Bailey 2.5
1.9 Brian S. Wolfe 40
3.3 Bill (Heath) Gates 65
40 Brian S. Wolfe
65 Bill (Heath) Gates
```
## Conversions
An alternate loop to the example above would look like:
```cpp
while(!p.eof()) {
auto [id, age, grade] = p.get_next<std::string, int, float>();
if (p.valid()) {
// ...
ss::parser<ss::throw_on_error> p{"students.csv"};
while (!p.eof()) {
const auto& [id, age, grade] = p.get_next<std::string, int, float>();
std::cout << id << ' ' << age << ' ' << grade << std::endl;
}
}
// ...
```
The alternate example will be used to show some of the features of the library. The **`get_next`** method returns a tuple of objects specified inside the template type list.
@ -160,7 +162,7 @@ using student = std::tuple<std::string, int, float>;
// returns std::tuple<std::string, int, float>
auto [id, age, grade] = p.get_next<student>();
```
*Note, it does not always return a student tuple since the returned tuples parameters may be altered as explained below (no void, no restrictions, ...)*
*Note, it does not always return the specified tuple since the returned tuples parameters may be altered as explained below (no void, no restrictions, ...)*
Whole objects can be returned using the **`get_object`** function which takes the tuple, created in a similar way as **`get_next`** does it, and creates an object out of it:
```cpp
@ -176,13 +178,12 @@ auto student = p.get_object<student, std::string, int, float>();
```
This works with any object if the constructor could be invoked using the template arguments given to **`get_object`**:
```cpp
// returns std::vector<std::string> containing 3 elements
auto vec = p.get_object<std::vector<std::string>, std::string, std::string,
std::string>();
// returns std::vector<std::string> containing 2 elements
auto vec = p.get_object<std::vector<std::string>, std::string, std::string>();
```
An iteration loop as in the first example which returns objects would look like:
An iterator loop as in the first example which returns objects would look like:
```cpp
for(const auto& student : p.iterate_object<student, std::string, int, float>()) {
for (const student& s : p.iterate_object<student, std::string, int, float>()) {
// ...
}
```
@ -221,7 +222,14 @@ using my_setup = ss::setup<ss::escape<'\\'>, ss::quote<'"'>>;
ss::parser<my_setup> p2{file_name};
```
Invalid setups will be met with **`static_asserts`**.
*Note, each setup parameter defined comes with a slight performance loss, so use them only if needed.*
*Note, most setup parameters defined come with a slight performance loss, so use them only if needed.*
### Delimiter
By default, **`,`** is used as the default delimiter, a custom delimiter can be specified as the second constructor parameter.
```cpp
ss::parser p{file_name, "--"};
```
*Note, the delimiter can consist of multiple characters but the parser is slightliy faster when using single character delimiters.*
### Empty lines
Empty lines can be ignored by defining **`ss::ignore_empty`** within the setup parameters:
@ -303,14 +311,11 @@ ss::parser<ss::escape<'\\'>,
ss::trim<' ', '\t'>,
ss::multiline_restricted<5>> p{file_name};
while(!p.eof()) {
auto [id, age, grade] = p.get_next<std::string, int, float>();
if(!p.valid()) {
continue;
for (const auto& [id, age, grade] : p.iterate<std::string, int, float>()) {
if (p.valid()) {
std::cout << "'" << id << ' ' << age << ' ' << grade << "'\n";
}
std::cout << "'" << id << ' ' << age << ' ' << grade << "'" << std::endl;
}
```
input:
```
@ -331,7 +336,7 @@ Gates 65 3.3'
```
## Special types
Passing **`void`** makes the parser ignore a column. In the given example **`void`** could be given as the second template parameter to ignore the second (age) column in the csv, a tuple of only 2 parameters would be retuned:
Passing **`void`** makes the parser ignore a column. In the initial example **`void`** could be given as the second template parameter to ignore the second (age) column in the csv, a tuple of only 2 parameters would be retuned:
```cpp
// returns std::tuple<std::string, float>
auto [id, grade] = p.get_next<std::string, void, float>();
@ -357,9 +362,9 @@ bool parser::ignore_next();
```cpp
// returns std::tuple<std::string, int, std::optional<float>>
auto [id, age, grade] = p.get_next<std::string, int, std::optional<float>();
if(grade) {
// do something with grade
auto [id, age, grade] = p.get_next<std::string, int, std::optional<float>>();
if (grade) {
std::cout << grade.value() << std::endl;
}
```
Similar to **`std::optional`**, **`std::variant`** could be used to try other conversions if the previous failed _(Note, conversion to std::string will always pass)_:
@ -367,9 +372,9 @@ Similar to **`std::optional`**, **`std::variant`** could be used to try other co
// returns std::tuple<std::string, int, std::variant<float, char>>
auto [id, age, grade] =
p.get_next<std::string, int, std::variant<float, char>>();
if(std::holds_alternative<float>(grade)) {
if (std::holds_alternative<float>(grade)) {
// grade set as float
} else if(std::holds_alternative<char>(grade)) {
} else if (std::holds_alternative<char>(grade)) {
// grade set as char
}
```
@ -407,9 +412,11 @@ struct even {
};
```
```cpp
// only even numbers will pass
// ...
// only even numbers will pass without invoking error handling
// returns std::tuple<std::string, int>
auto [id, age] = p.get_next<std::string, even<int>, void>();
const auto& [id, age] = p.get_next<std::string, even<int>, void>();
// ...
```
## Custom conversions
@ -436,6 +443,8 @@ The shape enum will be used in an example below. The **`inline`** is there just
## Error handling
By default, the parser handles errors only using the **`valid`** method which would return **`false`** if the file could not be opened, or if the conversion could not be made (invalid types, invalid number of columns, ...). The **`eof`** method can be used to detect if the end of the file was reached.
Detailed error messages can be accessed via the **`error_msg`** method, and to enable them **`ss::string_error`** needs to be included in the setup. If **`ss::string_error`** is not defined, the **`error_msg`** method will not be defined either.
```cpp
@ -446,11 +455,18 @@ bool parser::eof();
// ...
ss::parser<ss::string_error> parser;
```
An error can be detected using the **`valid`** method which would return **`false`** if the file could not be opened, or if the conversion could not be made (invalid types, invalid number of columns, ...). The **`eof`** method can be used to detect if the end of the file was reached.
The above two methods are prefferable if invalid inputs are expected and allows for fast handling, but the parser can also be forced to throw an exception in case of an invalid input using the **`ss::throw_on_error`** setup option.
```cpp
ss::parser<ss::throw_on_error> parser;
```
*Note, enabling this option will also make the parser throw if the constructor fails.*\
## Substitute conversions
The parser can also be used to effectively parse files whose rows are not always in the same format (not a classical csv but still csv-like). A more complicated example would be the best way to demonstrate such a scenario.
The parser can also be used to effectively parse files whose rows are not always in the same format (not a classical csv but still csv-like). A more complicated example would be the best way to demonstrate such a scenario.\
***Important, substitute conversions do not work when throw_on_error is enabled.***\
Supposing we have a file containing different shapes in given formats:
* circle RADIUS
@ -479,11 +495,16 @@ while (!p.eof()) {
using udbl = ss::gte<double, 0>;
auto [circle_or_square, rectangle, triangle] =
p.try_next<ss::nx<shape, shape::circle, shape::square>, udbl>()
p.try_next<ss::nx<shape, shape::circle, shape::square>, udbl>()
.or_else<ss::nx<shape, shape::rectangle>, udbl, udbl>()
.or_else<ss::nx<shape, shape::triangle>, udbl, udbl, udbl>()
.values();
if (!p.valid()) {
// handle error
continue;
}
if (circle_or_square) {
auto& [s, x] = circle_or_square.value();
double area = (s == shape::circle) ? x * x * M_PI : x * x;
@ -527,31 +548,34 @@ Each of those **`composite`** conversions can accept a lambda (or anything calla
// non negative double
using udbl = ss::gte<double, 0>;
p.try_next<ss::nx<shape, shape::circle, shape::square>, udbl>(
[&](const auto& data) {
const auto& [s, x] = data;
double area = (s == shape::circle) ? x * x * M_PI : x * x;
shapes.emplace_back(s, area);
})
.or_else<ss::nx<shape, shape::rectangle>, udbl, udbl>(
[&](const shape s, const double a, const double b) {
shapes.emplace_back(s, a * b);
})
.or_else<ss::nx<shape, shape::triangle>, udbl, udbl, udbl>(
[&](auto&& s, auto& a, const double& b, double& c) {
double sh = (a + b + c) / 2;
if (sh >= a && sh >= b && sh >= c) {
double area = sqrt(sh * (sh - a) * (sh - b) * (sh - c));
shapes.emplace_back(s, area);
}
while (!p.eof()) {
p.try_next<ss::nx<shape, shape::circle, shape::square>, udbl>(
[&](const auto& data) {
const auto& [s, x] = data;
double area = (s == shape::circle) ? x * x * M_PI : x * x;
shapes.emplace_back(s, area);
})
.or_else<ss::nx<shape, shape::rectangle>, udbl, udbl>(
[&](shape s, double a, double b) { shapes.emplace_back(s, a * b); })
.or_else<ss::nx<shape, shape::triangle>, udbl, udbl, udbl>(
[&](auto s, auto a, auto b, auto c) {
double sh = (a + b + c) / 2;
if (sh >= a && sh >= b && sh >= c) {
double area = sqrt(sh * (sh - a) * (sh - b) * (sh - c));
shapes.emplace_back(s, area);
}
})
.on_error([] {
// handle error
});
}
```
It is a bit less readable, but it removes the need to check which conversion was invoked. The **`composite`** also has an **`on_error`** method which accepts a lambda which will be invoked if no previous conversions were successful. The lambda can take no arguments or just one argument, an **`std::string`**, in which the error message is stored if **`string_error`** is enabled:
```cpp
p.try_next<int>()
.on_error([](const std::string& e) { /* int conversion failed */ })
.or_object<x, double>()
.on_error([] { /* int and x (all) conversions failed */ });
.on_error([] { /* int and x conversions failed (all previous failed) */ });
```
*See unit tests for more examples.*