diff --git a/README.md b/README.md index 242dc50..f032799 100644 --- a/README.md +++ b/README.md @@ -17,11 +17,11 @@ A header only "csv" parser which is fast and versatile with modern C++ api. Requires compiler with C++17 support. [Can also be used to convert strings to specific types.](#The-converter) -Conversion for floating point values invoked using [fast-float](https://github.com/fastfloat/fast_float) . -Function traits taken from [qt-creator](https://code.woboq.org/qt5/qt-creator/src/libs/utils/functiontraits.h.html) . +Conversion for floating point values invoked using [fast-float](https://github.com/fastfloat/fast_float) .\ +Function traits taken from [qt-creator](https://code.woboq.org/qt5/qt-creator/src/libs/utils/functiontraits.h.html) .\ # Example -Lets say we have a csv file containing students in a given format 'ID,AGE,GRADE' and we want to parse and print all the valid values: +Lets say we have a csv file containing students in a given format |Id,Age,Grade| and we want to parse and print all the valid values: ```shell $ cat students.csv @@ -34,7 +34,7 @@ Bill (Heath) Gates,65,3.3 #include int main() { - ss::parser p{"students.csv", ","}; + ss::parser p{"students.csv"}; for (const auto& [id, age, grade] : p.iterate()) { std::cout << id << ' ' << age << ' ' << grade << std::endl; @@ -58,9 +58,9 @@ Bill (Heath) Gates 65 3.3 * [Works with quotes, escapes and spacings](#setup) * [Works with values containing new lines](#multiline) * [Columns and rows can be ignored](#special-types) - * Works with any type of delimiter + * [Works with any type of delimiter](#delimiter) * Can return whole objects composed of converted values - * [Descriptive error handling can be enabled](#error-handling) + * [Error handling can be configured](#error-handling) * [Restrictions can be added for each column](#restrictions) * [Works with `std::optional` and `std::variant`](#special-types) * Works with **`CRLF`** and **`LF`** @@ -80,7 +80,7 @@ $ cmake --configure . $ sudo make install ``` -*Note, this will also install the fast_float library* +*Note, this will also install the fast_float library.*\ The library supports [CMake](#Cmake) and [meson](#Meson) build systems # Usage @@ -97,7 +97,7 @@ Bill (Heath) Gates,65,3.3 ``` ```cpp // ... - ss::parser p{"students.csv", ","}; + ss::parser p{"students_with_header.csv"}; p.use_fields("Id", "Grade"); for(const auto& [id, grade] : p.iterate()) { @@ -118,13 +118,13 @@ ss::parser p{file_name}; The fields with which the parser works with can be modified at any given time. The praser can also check if a field is present within the header by using the **`field_exists`** method. ```cpp // ... - ss::parser p{"students.csv", ","}; + ss::parser p{"students_with_header.csv"}; p.use_fields("Id", "Grade"); const auto& [id, grade] = p.get_next(); std::cout << id << ' ' << grade << std::endl; - if (p.field_exists("Age")) { + if (p.field_exists("Id")) { p.use_fields("Grade", "Id"); for (const auto& [grade, id] : p.iterate()) { std::cout << grade << ' ' << id << std::endl; @@ -135,18 +135,20 @@ The fields with which the parser works with can be modified at any given time. T ```shell $ ./a.out James Bailey 2.5 -1.9 Brian S. Wolfe 40 -3.3 Bill (Heath) Gates 65 +40 Brian S. Wolfe +65 Bill (Heath) Gates ``` ## Conversions An alternate loop to the example above would look like: ```cpp -while(!p.eof()) { - auto [id, age, grade] = p.get_next(); - if (p.valid()) { + // ... + ss::parser p{"students.csv"}; + + while (!p.eof()) { + const auto& [id, age, grade] = p.get_next(); std::cout << id << ' ' << age << ' ' << grade << std::endl; } -} + // ... ``` The alternate example will be used to show some of the features of the library. The **`get_next`** method returns a tuple of objects specified inside the template type list. @@ -160,7 +162,7 @@ using student = std::tuple; // returns std::tuple auto [id, age, grade] = p.get_next(); ``` -*Note, it does not always return a student tuple since the returned tuples parameters may be altered as explained below (no void, no restrictions, ...)* +*Note, it does not always return the specified tuple since the returned tuples parameters may be altered as explained below (no void, no restrictions, ...)* Whole objects can be returned using the **`get_object`** function which takes the tuple, created in a similar way as **`get_next`** does it, and creates an object out of it: ```cpp @@ -176,13 +178,12 @@ auto student = p.get_object(); ``` This works with any object if the constructor could be invoked using the template arguments given to **`get_object`**: ```cpp -// returns std::vector containing 3 elements -auto vec = p.get_object, std::string, std::string, - std::string>(); +// returns std::vector containing 2 elements +auto vec = p.get_object, std::string, std::string>(); ``` -An iteration loop as in the first example which returns objects would look like: +An iterator loop as in the first example which returns objects would look like: ```cpp -for(const auto& student : p.iterate_object()) { +for (const student& s : p.iterate_object()) { // ... } ``` @@ -221,7 +222,14 @@ using my_setup = ss::setup, ss::quote<'"'>>; ss::parser p2{file_name}; ``` Invalid setups will be met with **`static_asserts`**. -*Note, each setup parameter defined comes with a slight performance loss, so use them only if needed.* +*Note, most setup parameters defined come with a slight performance loss, so use them only if needed.* + +### Delimiter +By default, **`,`** is used as the default delimiter, a custom delimiter can be specified as the second constructor parameter. +```cpp +ss::parser p{file_name, "--"}; +``` +*Note, the delimiter can consist of multiple characters but the parser is slightliy faster when using single character delimiters.* ### Empty lines Empty lines can be ignored by defining **`ss::ignore_empty`** within the setup parameters: @@ -303,14 +311,11 @@ ss::parser, ss::trim<' ', '\t'>, ss::multiline_restricted<5>> p{file_name}; -while(!p.eof()) { - auto [id, age, grade] = p.get_next(); - if(!p.valid()) { - continue; +for (const auto& [id, age, grade] : p.iterate()) { + if (p.valid()) { + std::cout << "'" << id << ' ' << age << ' ' << grade << "'\n"; } - std::cout << "'" << id << ' ' << age << ' ' << grade << "'" << std::endl; } - ``` input: ``` @@ -331,7 +336,7 @@ Gates 65 3.3' ``` ## Special types -Passing **`void`** makes the parser ignore a column. In the given example **`void`** could be given as the second template parameter to ignore the second (age) column in the csv, a tuple of only 2 parameters would be retuned: +Passing **`void`** makes the parser ignore a column. In the initial example **`void`** could be given as the second template parameter to ignore the second (age) column in the csv, a tuple of only 2 parameters would be retuned: ```cpp // returns std::tuple auto [id, grade] = p.get_next(); @@ -357,9 +362,9 @@ bool parser::ignore_next(); ```cpp // returns std::tuple> -auto [id, age, grade] = p.get_next(); -if(grade) { - // do something with grade +auto [id, age, grade] = p.get_next>(); +if (grade) { + std::cout << grade.value() << std::endl; } ``` Similar to **`std::optional`**, **`std::variant`** could be used to try other conversions if the previous failed _(Note, conversion to std::string will always pass)_: @@ -367,9 +372,9 @@ Similar to **`std::optional`**, **`std::variant`** could be used to try other co // returns std::tuple> auto [id, age, grade] = p.get_next>(); -if(std::holds_alternative(grade)) { +if (std::holds_alternative(grade)) { // grade set as float -} else if(std::holds_alternative(grade)) { +} else if (std::holds_alternative(grade)) { // grade set as char } ``` @@ -407,9 +412,11 @@ struct even { }; ``` ```cpp -// only even numbers will pass +// ... +// only even numbers will pass without invoking error handling // returns std::tuple -auto [id, age] = p.get_next, void>(); +const auto& [id, age] = p.get_next, void>(); +// ... ``` ## Custom conversions @@ -436,6 +443,8 @@ The shape enum will be used in an example below. The **`inline`** is there just ## Error handling +By default, the parser handles errors only using the **`valid`** method which would return **`false`** if the file could not be opened, or if the conversion could not be made (invalid types, invalid number of columns, ...). The **`eof`** method can be used to detect if the end of the file was reached. + Detailed error messages can be accessed via the **`error_msg`** method, and to enable them **`ss::string_error`** needs to be included in the setup. If **`ss::string_error`** is not defined, the **`error_msg`** method will not be defined either. ```cpp @@ -446,11 +455,18 @@ bool parser::eof(); // ... ss::parser parser; ``` -An error can be detected using the **`valid`** method which would return **`false`** if the file could not be opened, or if the conversion could not be made (invalid types, invalid number of columns, ...). The **`eof`** method can be used to detect if the end of the file was reached. + +The above two methods are prefferable if invalid inputs are expected and allows for fast handling, but the parser can also be forced to throw an exception in case of an invalid input using the **`ss::throw_on_error`** setup option. + +```cpp +ss::parser parser; +``` +*Note, enabling this option will also make the parser throw if the constructor fails.*\ ## Substitute conversions -The parser can also be used to effectively parse files whose rows are not always in the same format (not a classical csv but still csv-like). A more complicated example would be the best way to demonstrate such a scenario. +The parser can also be used to effectively parse files whose rows are not always in the same format (not a classical csv but still csv-like). A more complicated example would be the best way to demonstrate such a scenario.\ +***Important, substitute conversions do not work when throw_on_error is enabled.***\ Supposing we have a file containing different shapes in given formats: * circle RADIUS @@ -479,11 +495,16 @@ while (!p.eof()) { using udbl = ss::gte; auto [circle_or_square, rectangle, triangle] = - p.try_next, udbl>() + p.try_next, udbl>() .or_else, udbl, udbl>() .or_else, udbl, udbl, udbl>() .values(); + if (!p.valid()) { + // handle error + continue; + } + if (circle_or_square) { auto& [s, x] = circle_or_square.value(); double area = (s == shape::circle) ? x * x * M_PI : x * x; @@ -527,31 +548,34 @@ Each of those **`composite`** conversions can accept a lambda (or anything calla // non negative double using udbl = ss::gte; -p.try_next, udbl>( - [&](const auto& data) { - const auto& [s, x] = data; - double area = (s == shape::circle) ? x * x * M_PI : x * x; - shapes.emplace_back(s, area); - }) - .or_else, udbl, udbl>( - [&](const shape s, const double a, const double b) { - shapes.emplace_back(s, a * b); - }) - .or_else, udbl, udbl, udbl>( - [&](auto&& s, auto& a, const double& b, double& c) { - double sh = (a + b + c) / 2; - if (sh >= a && sh >= b && sh >= c) { - double area = sqrt(sh * (sh - a) * (sh - b) * (sh - c)); - shapes.emplace_back(s, area); - } +while (!p.eof()) { + p.try_next, udbl>( + [&](const auto& data) { + const auto& [s, x] = data; + double area = (s == shape::circle) ? x * x * M_PI : x * x; + shapes.emplace_back(s, area); + }) + .or_else, udbl, udbl>( + [&](shape s, double a, double b) { shapes.emplace_back(s, a * b); }) + .or_else, udbl, udbl, udbl>( + [&](auto s, auto a, auto b, auto c) { + double sh = (a + b + c) / 2; + if (sh >= a && sh >= b && sh >= c) { + double area = sqrt(sh * (sh - a) * (sh - b) * (sh - c)); + shapes.emplace_back(s, area); + } + }) + .on_error([] { + // handle error }); +} ``` It is a bit less readable, but it removes the need to check which conversion was invoked. The **`composite`** also has an **`on_error`** method which accepts a lambda which will be invoked if no previous conversions were successful. The lambda can take no arguments or just one argument, an **`std::string`**, in which the error message is stored if **`string_error`** is enabled: ```cpp p.try_next() .on_error([](const std::string& e) { /* int conversion failed */ }) .or_object() - .on_error([] { /* int and x (all) conversions failed */ }); + .on_error([] { /* int and x conversions failed (all previous failed) */ }); ``` *See unit tests for more examples.*