ssp/README.md

687 lines
29 KiB
Markdown
Raw Permalink Normal View History

2021-02-27 14:28:45 +01:00
```
2021-02-27 14:29:45 +01:00
__________ ____
/ ___/ ___// __ \
\__ \\__ \/ /_/ /
___/ /__/ / ____/
/____/____/_/
2021-02-27 14:27:05 +01:00
```
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT)
2023-08-08 17:26:48 +02:00
[![coverage](https://coveralls.io/repos/github/red0124/ssp/badge.svg?branch=master)](https://coveralls.io/github/red0124/ssp?branch=master)
[![single-header](https://github.com/red0124/ssp/workflows/single-header-ci/badge.svg)](https://github.com/red0124/ssp/actions/workflows/single-header.yml)
2023-07-28 21:28:32 +02:00
[![ubuntu-latest-gcc](https://github.com/red0124/ssp/workflows/ubuntu-latest-gcc-ci/badge.svg)](https://github.com/red0124/ssp/actions/workflows/ubuntu-latest-gcc.yml)
[![ubuntu-latest-clang](https://github.com/red0124/ssp/workflows/ubuntu-latest-clang-ci/badge.svg)](https://github.com/red0124/ssp/actions/workflows/ubuntu-latest-clang.yml)
[![ubuntu-latest-icc](https://github.com/red0124/ssp/workflows/ubuntu-latest-icc-ci/badge.svg)](https://github.com/red0124/ssp/actions/workflows/ubuntu-latest-icc.yml)
[![windows-msys2-gcc](https://github.com/red0124/ssp/workflows/win-msys2-gcc-ci/badge.svg)](https://github.com/red0124/ssp/actions/workflows/win-msys2-gcc.yml)
[![windows-msys2-clang](https://github.com/red0124/ssp/workflows/win-msys2-clang-ci/badge.svg)](https://github.com/red0124/ssp/actions/workflows/win-msys2-clang.yml)
2023-08-08 17:26:48 +02:00
[![windows-msvc](https://github.com/red0124/ssp/workflows/win-msvc-ci/badge.svg)](https://github.com/red0124/ssp/actions/workflows/win-msvc.yml)
2024-02-28 00:01:37 +01:00
A header only CSV parser which is fast and versatile with modern C++ API. Requires compiler with C++17 support. [Can also be used to efficiently convert strings to specific types.](#the-converter)
2023-08-06 12:33:43 +02:00
Conversion for floating point values invoked using [fast-float](https://github.com/fastfloat/fast_float) . \
2023-08-06 12:35:53 +02:00
Function traits taken from *qt-creator* .
2020-12-27 22:13:43 +01:00
# Example
2024-02-28 00:02:58 +01:00
Lets say we have a CSV file containing students in a given format (Id,Age,Grade) and we want to parse and print all the valid values:
2021-02-07 15:46:03 +01:00
```shell
2020-12-27 20:54:47 +01:00
$ cat students.csv
James Bailey,65,2.5
2021-01-03 21:09:33 +01:00
Brian S. Wolfe,40,1.9
Bill (Heath) Gates,65,3.3
```
2020-12-27 20:54:47 +01:00
```cpp
#include <iostream>
#include <ss/parser.hpp>
int main() {
2023-08-06 12:26:54 +02:00
ss::parser<ss::throw_on_error> p{"students.csv"};
2023-08-06 00:16:43 +02:00
for (const auto& [id, age, grade] : p.iterate<std::string, int, float>()) {
std::cout << id << ' ' << age << ' ' << grade << std::endl;
}
return 0;
}
```
2020-12-27 20:54:47 +01:00
And if we compile and execute the program we get the following output:
2021-02-07 15:46:03 +01:00
```shell
2020-12-27 20:54:47 +01:00
$ ./a.out
James Bailey 65 2.5
2021-01-03 21:09:33 +01:00
Brian S. Wolfe 40 1.9
Bill (Heath) Gates 65 3.3
```
2020-12-27 22:13:43 +01:00
# Features
2023-07-20 00:19:31 +02:00
* [Works on any type](#custom-conversions)
* Easy to use
2023-08-06 12:48:53 +02:00
* Can work without exceptions
2023-07-20 00:19:31 +02:00
* [Works with headers](#headers)
* [Works with quotes, escapes and spacings](#setup)
2024-02-28 00:01:37 +01:00
* [Works with CSV data stored in buffers](#buffer-mode)
2023-07-20 00:19:31 +02:00
* [Works with values containing new lines](#multiline)
* [Columns and rows can be ignored](#special-types)
2023-08-06 12:26:54 +02:00
* [Works with any type of delimiter](#delimiter)
2020-12-27 22:13:43 +01:00
* Can return whole objects composed of converted values
2023-08-06 12:26:54 +02:00
* [Error handling can be configured](#error-handling)
2023-07-20 00:19:31 +02:00
* [Restrictions can be added for each column](#restrictions)
* [Works with `std::optional` and `std::variant`](#special-types)
2022-03-28 20:16:46 +02:00
* Works with **`CRLF`** and **`LF`**
2023-07-20 00:19:31 +02:00
* [Conversions can be chained if invalid](#substitute-conversions)
2021-02-27 20:54:34 +01:00
* Fast
2020-12-27 22:13:43 +01:00
2022-03-30 20:18:18 +02:00
# Single header
2022-03-30 20:21:23 +02:00
The library can be used with a single header file **`ssp.hpp`**, but it sufferes a slight performance loss when converting floating point values since the **`fast_float`** library is not present within the file.
2022-03-30 20:18:18 +02:00
2021-02-07 14:02:51 +01:00
# Installation
2021-02-07 15:46:03 +01:00
```shell
$ git clone https://github.com/red0124/ssp
$ cd ssp
$ cmake --configure .
$ sudo make install
2021-02-07 15:46:03 +01:00
```
2023-08-06 12:26:54 +02:00
*Note, this will also install the fast_float library.*\
2022-03-30 20:21:23 +02:00
The library supports [CMake](#Cmake) and [meson](#Meson) build systems
2021-02-07 15:41:00 +01:00
2020-12-27 22:13:43 +01:00
# Usage
2022-03-28 20:02:15 +02:00
## Headers
2022-03-28 20:17:15 +02:00
The parser can be told to use only certain columns by parsing the header. This can be done by using the **`use_fields`** method. It accepts any number of string-like arguments or even an **`std::vector<std::string>`** with the field names. If any of the fields are not found within the header or if any fields are defined multiple times it will result in an error.
2022-03-28 20:02:15 +02:00
```shell
$ cat students_with_header.csv
2023-08-06 00:16:43 +02:00
Id,Age,Grade
2022-03-28 20:02:15 +02:00
James Bailey,65,2.5
Brian S. Wolfe,40,1.9
Bill (Heath) Gates,65,3.3
```
```cpp
2023-08-06 12:51:11 +02:00
// ...
ss::parser<ss::throw_on_error> p{"students_with_header.csv"};
p.use_fields("Id", "Grade");
2022-03-28 20:02:15 +02:00
2023-08-06 12:51:11 +02:00
for(const auto& [id, grade] : p.iterate<std::string, float>()) {
std::cout << id << ' ' << grade << std::endl;
}
// ...
2022-03-28 20:02:15 +02:00
```
```shell
$ ./a.out
James Bailey 2.5
Brian S. Wolfe 1.9
Bill (Heath) Gates 3.3
```
2022-03-28 20:16:46 +02:00
The header can be ignored using the **`ss::ignore_header`** [setup](#Setup) option or by calling the **`ignore_next`** metod after the parser has been constructed.
2022-03-28 20:02:15 +02:00
```cpp
ss::parser<ss::ignore_header> p{file_name};
```
2022-03-29 12:31:59 +02:00
The fields with which the parser works with can be modified at any given time. The praser can also check if a field is present within the header by using the **`field_exists`** method.
2022-03-28 20:02:15 +02:00
```cpp
// ...
2023-08-06 12:26:54 +02:00
ss::parser<ss::throw_on_error> p{"students_with_header.csv"};
2023-08-06 00:16:43 +02:00
p.use_fields("Id", "Grade");
2022-03-28 20:02:15 +02:00
2023-08-06 00:16:43 +02:00
const auto& [id, grade] = p.get_next<std::string, float>();
std::cout << id << ' ' << grade << std::endl;
2022-03-28 20:02:15 +02:00
2023-08-06 12:26:54 +02:00
if (p.field_exists("Id")) {
2023-08-06 00:16:43 +02:00
p.use_fields("Grade", "Id");
for (const auto& [grade, id] : p.iterate<float, std::string>()) {
std::cout << grade << ' ' << id << std::endl;
2022-03-28 20:02:15 +02:00
}
}
// ...
```
```shell
$ ./a.out
James Bailey 2.5
2023-08-06 12:26:54 +02:00
40 Brian S. Wolfe
65 Bill (Heath) Gates
2022-03-28 20:02:15 +02:00
```
## Conversions
An alternate loop to the example above would look like:
```cpp
2023-08-06 12:48:53 +02:00
// ...
ss::parser p{"students.csv"};
while (!p.eof()) {
const auto& [id, age, grade] = p.get_next<std::string, int, float>();
2023-08-06 12:26:54 +02:00
2023-08-06 12:48:53 +02:00
if (p.valid()) {
2023-08-06 00:16:43 +02:00
std::cout << id << ' ' << age << ' ' << grade << std::endl;
}
2023-08-06 12:48:53 +02:00
}
// ...
```
2023-08-06 12:48:53 +02:00
The alternate example with exceptions disabled will be used to show some of the features of the library. The **`get_next`** method returns a tuple of objects specified inside the template type list.
2024-02-28 00:01:37 +01:00
If a conversion could not be applied, the method would return a tuple of default constructed objects, and the **`valid`** method would return **`false`**, for example if the third (grade) column in our CSV could not be converted to a float the conversion would fail.
2021-01-01 18:38:53 +01:00
2022-03-28 20:16:46 +02:00
If **`get_next`** is called with a **`tuple`** as template parameter it would behave identically to passing the same tuple parameters to **`get_next`**:
2021-01-01 18:38:53 +01:00
```cpp
2022-03-28 19:11:41 +02:00
using student = std::tuple<std::string, int, float>;
2021-01-01 18:38:53 +01:00
2022-03-28 19:11:41 +02:00
// returns std::tuple<std::string, int, float>
2023-08-06 00:16:43 +02:00
auto [id, age, grade] = p.get_next<student>();
2021-01-01 18:38:53 +01:00
```
2023-08-06 12:26:54 +02:00
*Note, it does not always return the specified tuple since the returned tuples parameters may be altered as explained below (no void, no restrictions, ...)*
2021-01-01 18:38:53 +01:00
2022-03-28 20:16:46 +02:00
Whole objects can be returned using the **`get_object`** function which takes the tuple, created in a similar way as **`get_next`** does it, and creates an object out of it:
2021-01-01 18:38:53 +01:00
```cpp
struct student {
2023-08-06 00:16:43 +02:00
std::string id;
2021-01-01 18:38:53 +01:00
int age;
2022-03-28 19:11:41 +02:00
float grade;
2021-01-01 18:38:53 +01:00
};
```
```cpp
// returns student
2022-03-28 19:11:41 +02:00
auto student = p.get_object<student, std::string, int, float>();
2021-01-01 18:38:53 +01:00
```
2022-03-28 20:16:46 +02:00
This works with any object if the constructor could be invoked using the template arguments given to **`get_object`**:
2021-01-01 18:38:53 +01:00
```cpp
2023-08-06 12:26:54 +02:00
// returns std::vector<std::string> containing 2 elements
auto vec = p.get_object<std::vector<std::string>, std::string, std::string>();
2021-01-01 18:38:53 +01:00
```
2023-08-06 12:26:54 +02:00
An iterator loop as in the first example which returns objects would look like:
```cpp
2023-08-06 12:26:54 +02:00
for (const student& s : p.iterate_object<student, std::string, int, float>()) {
2023-08-06 12:52:40 +02:00
// ...
}
```
2022-03-28 20:16:46 +02:00
And finally, using something I personally like to do, a struct (class) with a **`tied`** method which returns a tuple of references to to the members of the struct.
2021-01-01 18:38:53 +01:00
```cpp
struct student {
2023-08-06 00:16:43 +02:00
std::string id;
2021-01-01 18:38:53 +01:00
int age;
2022-03-28 19:11:41 +02:00
float grade;
2021-01-01 18:38:53 +01:00
2023-08-06 00:16:43 +02:00
auto tied() { return std::tie(id, age, grade); }
2021-01-01 18:38:53 +01:00
};
```
2024-02-28 00:01:37 +01:00
The method can be used to compare the object, serialize it, deserialize it, etc. Now **`get_next`** can accept such a struct and deduce the types to which to convert the CSV.
2021-01-01 18:38:53 +01:00
```cpp
// returns student
auto s = p.get_next<student>();
```
This works with the iteration loop too.
2024-02-28 00:01:37 +01:00
*Note, the order in which the members of the tied method are returned must match the order of the elements in the CSV*.
2021-01-01 18:38:53 +01:00
2024-02-27 23:56:13 +01:00
## Buffer mode
2024-02-28 00:08:04 +01:00
The parser also works with buffers containing CSV data instead of files. To parse buffer data with the parser simply create the parser by giving it the buffer, as **`const char*`**, and its size. The initial example using a buffer instead of a file would look similar to this:
2024-02-27 23:56:13 +01:00
```cpp
std::string buffer = "James Bailey,65,2.5\nBrian S. Wolfe,40,1.9\n";
ss::parser<ss::throw_on_error> p{buffer.c_str(), buffer.size()};
for (const auto& [id, age, grade] : p.iterate<std::string, int, float>()) {
std::cout << id << ' ' << age << ' ' << grade << std::endl;
}
return 0;
```
2021-02-27 21:30:47 +01:00
## Setup
2021-02-27 21:17:08 +01:00
By default, many of the features supported by the parser are disabled. They can be enabled within the template parameters of the parser. For example, to enable quoting and escaping the parser would look like:
2021-02-27 15:17:19 +01:00
```cpp
ss::parser<ss::quote<'"'>, ss::escape<'\\'>> p0{file_name};
```
The order of the defined setup parameters is not important:
```cpp
// equivalent to p0
ss::parser<ss::escape<'\\'>, ss::quote<'"'>> p1{file_name};
```
The setup can also be predefined:
```cpp
using my_setup = ss::setup<ss::escape<'\\'>, ss::quote<'"'>>;
// equivalent to p0 and p1
2021-02-27 18:08:07 +01:00
ss::parser<my_setup> p2{file_name};
2021-02-27 15:17:19 +01:00
```
2022-03-28 20:16:46 +02:00
Invalid setups will be met with **`static_asserts`**.
2023-08-06 12:26:54 +02:00
*Note, most setup parameters defined come with a slight performance loss, so use them only if needed.*
### Delimiter
2023-08-06 12:33:05 +02:00
By default, **`,`** is used as the delimiter, a custom delimiter can be specified as the second constructor parameter.
2023-08-06 12:26:54 +02:00
```cpp
ss::parser p{file_name, "--"};
```
*Note, the delimiter can consist of multiple characters but the parser is slightliy faster when using single character delimiters.*
2021-02-27 15:17:19 +01:00
2022-03-28 19:11:41 +02:00
### Empty lines
2022-03-28 20:16:46 +02:00
Empty lines can be ignored by defining **`ss::ignore_empty`** within the setup parameters:
2022-03-28 19:11:41 +02:00
```cpp
ss::parser<ss::ignore_empty> p{file_name};
```
2024-02-28 00:01:37 +01:00
If this setup option is not set then reading an empty line will result in an error (unless only one column is present within the CSV).
2022-03-28 19:11:41 +02:00
2021-02-27 21:30:47 +01:00
### Quoting
2022-03-28 20:16:46 +02:00
Quoting can be enabled by defining **`ss::quote`** within the setup parameters. A single character can be defined as the quoting character, for example to use **`"`** as a quoting character:
2021-02-27 18:29:33 +01:00
```cpp
ss::parser<ss::quote<'"'>> p{file_name};
2021-02-27 16:26:31 +01:00
```
2021-02-27 18:08:07 +01:00
Double quote can be used to escape a quote inside a quoted row.
2021-02-27 16:26:31 +01:00
```
2021-02-27 17:21:09 +01:00
"James ""Bailey""" -> 'James "Bailey"'
2021-02-27 16:26:31 +01:00
```
2021-02-27 18:29:33 +01:00
Unterminated quotes result in an error (if multiline is not enabled).
2021-02-27 16:26:31 +01:00
```
"James Bailey,65,2.5 -> error
```
2021-02-27 21:30:47 +01:00
### Escaping
2022-03-28 20:16:46 +02:00
Escaping can be enabled by defining **`ss::escape`** within the setup parameters. Multiple character can be defined as escaping characters.It simply removes any special meaning of the character behind the escaped character, anything can be escaped. For example to use ``\`` as an escaping character:
2021-02-27 18:29:33 +01:00
```cpp
ss::parser<ss::escape<'\\'>> p{file_name};
2021-02-27 16:26:31 +01:00
```
Double escape can be used to escape an escape.
```
2021-02-27 17:21:09 +01:00
James \\Bailey -> 'James \Bailey'
2021-02-27 16:26:31 +01:00
```
Unterminated escapes result in an error.
```
2021-02-27 18:08:07 +01:00
James Bailey,65,2.5\\0 -> error
2021-02-27 16:26:31 +01:00
```
2021-02-27 17:21:09 +01:00
Its usage has more impact when used with quoting or spacing:
2021-02-27 16:26:31 +01:00
```
2021-02-27 17:21:09 +01:00
"James \"Bailey\"" -> 'James "Bailey"'
2021-02-27 16:26:31 +01:00
```
2021-02-27 21:30:47 +01:00
### Spacing
2022-03-28 20:16:46 +02:00
Spacing can be enabled by defining **`ss::trim`** , **`ss::trim_left`** or **`ss::trim_right`** within the setup parameters. Multiple character can be defined as spacing characters, for example to use ``' '`` as an spacing character **`ss::trim<' '>`** needs to be defined. It removes any space from both sides of the row. To trim only the right side **`ss::trim_right`** can be used, and intuitively **`ss::trim_left`** to trim only the left side. If **`ss::trim`** is enabled, those lines would have an equivalent output:
2021-02-27 17:21:09 +01:00
```
James Bailey,65,2.5
James Bailey ,65,2.5
James Bailey, 65, 2.5
```
2021-02-27 18:08:07 +01:00
Escaping and quoting can be used to leave the space if needed.
2021-02-27 17:21:09 +01:00
```
" James Bailey " -> ' James Bailey '
2021-02-27 18:08:07 +01:00
" James Bailey " -> ' James Bailey '
2021-02-27 17:21:09 +01:00
\ James Bailey\ -> ' James Bailey '
2021-02-27 18:08:07 +01:00
\ James Bailey\ -> ' James Bailey '
2021-02-27 17:21:09 +01:00
"\ James Bailey\ " -> ' James Bailey '
```
2021-02-27 21:30:47 +01:00
### Multiline
2024-02-28 00:01:37 +01:00
Multiline can be enabled by defining **`ss::multilne`** within the setup parameters. It enables the possibility to have the new line characters within rows. The new line character needs to be either escaped or within quotes so either **`ss::escape`** or **`ss::quote`** need to be enabled. There is a specific problem when using multiline, for example, if a row had an unterminated quote, the parser would assume it to be a new line within the row, so until another quote is found, it will treat it as one line which is fine usually, but it can cause the whole CSV file to be treated as a single line by mistake. To prevent this **`ss::multiline_restricted`** can be used which accepts an unsigned number representing the maximum number of lines which can be allowed as a single multiline. Examples:
2021-02-27 15:17:19 +01:00
2021-02-27 17:21:09 +01:00
```cpp
ss::parser<ss::multiline, ss::quote<'\"'>, ss::escape<'\\'>> p{file_name};
```
```
"James\n\n\nBailey" -> 'James\n\n\nBailey'
James\\n\\n\\nBailey -> 'James\n\n\nBailey'
"James\n\n\n\n\nBailey" -> 'James\n\n\n\n\nBailey'
```
```cpp
ss::parser<ss::multiline_restricted<4>, ss::quote<'\"'>, ss::escape<'\\'>> p{file_name};
```
```
"James\n\n\nBailey" -> 'James\n\n\nBailey'
James\\n\\n\\nBailey -> 'James\n\n\nBailey'
"James\n\n\n\n\nBailey" -> error
```
2021-02-27 21:30:47 +01:00
### Example
2021-02-27 18:08:07 +01:00
An example with a more complicated setup:
```cpp
ss::parser<ss::escape<'\\'>,
ss::quote<'"'>,
ss::trim<' ', '\t'>,
ss::multiline_restricted<5>> p{file_name};
2023-08-06 12:26:54 +02:00
for (const auto& [id, age, grade] : p.iterate<std::string, int, float>()) {
if (p.valid()) {
std::cout << "'" << id << ' ' << age << ' ' << grade << "'\n";
2021-02-27 18:08:07 +01:00
}
}
```
input:
```
"James Bailey" , 65 , 2.5\t\t\t
\t \t Brian S. Wolfe, "40" , "\1.9"
"\"Nathan Fielder""" , 37 , Really good grades
"Bill
\"Heath""
Gates",65, 3.3
```
output:
```
'James Bailey 65 2.5'
'Brian S. Wolfe 40 1.9'
'Bill
"Heath"
Gates 65 3.3'
```
2021-02-27 21:32:10 +01:00
## Special types
2024-02-28 00:01:37 +01:00
Passing **`void`** makes the parser ignore a column. In the initial example **`void`** could be given as the second template parameter to ignore the second (age) column in the CSV, a tuple of only 2 parameters would be retuned:
```cpp
2022-03-28 19:11:41 +02:00
// returns std::tuple<std::string, float>
2023-08-06 00:16:43 +02:00
auto [id, grade] = p.get_next<std::string, void, float>();
```
2021-01-01 18:43:26 +01:00
Works with different types of conversions too:
```cpp
2022-03-28 19:11:41 +02:00
using student = std::tuple<std::string, void, float>;
2021-01-01 18:43:26 +01:00
2022-03-28 19:11:41 +02:00
// returns std::tuple<std::string, float>
2023-08-06 00:16:43 +02:00
auto [id, grade] = p.get_next<student>();
2021-01-01 18:43:26 +01:00
```
2022-03-28 20:16:46 +02:00
Values can also be converted to **`std::string_view`**. It is more efficient then converting values to **`std::string`** but one must be careful with the lifetime of it.
2022-03-28 19:45:00 +02:00
```cpp
2023-08-06 00:16:43 +02:00
// string_view id stays valid until the next line is read
auto [id, age, grade] = p.get_next<std::string_view, int, float>();
2022-03-28 19:45:00 +02:00
```
2022-03-28 19:40:48 +02:00
2022-03-28 20:16:46 +02:00
To ignore a whole row, **`ignore_next`** could be used, returns **`false`** if **`eof`**:
2021-01-01 18:38:53 +01:00
```cpp
bool parser::ignore_next();
```
2022-03-28 20:16:46 +02:00
**`std::optional`** could be passed if we wanted the conversion to proceed in the case of a failure returning **`std::nullopt`** for the specified column:
```cpp
2022-03-28 19:11:41 +02:00
// returns std::tuple<std::string, int, std::optional<float>>
2023-08-06 12:26:54 +02:00
auto [id, age, grade] = p.get_next<std::string, int, std::optional<float>>();
if (grade) {
std::cout << grade.value() << std::endl;
}
```
2022-03-28 20:16:46 +02:00
Similar to **`std::optional`**, **`std::variant`** could be used to try other conversions if the previous failed _(Note, conversion to std::string will always pass)_:
```cpp
2022-03-28 19:11:41 +02:00
// returns std::tuple<std::string, int, std::variant<float, char>>
2023-08-06 00:16:43 +02:00
auto [id, age, grade] =
p.get_next<std::string, int, std::variant<float, char>>();
2023-08-06 12:26:54 +02:00
if (std::holds_alternative<float>(grade)) {
2022-03-28 19:11:41 +02:00
// grade set as float
2023-08-06 12:26:54 +02:00
} else if (std::holds_alternative<char>(grade)) {
// grade set as char
}
```
2024-02-27 23:56:13 +01:00
Passing **`char`** and types that are aliases to it such as **`uint8_t`** and **`int8_t`** make the parser interpret the input data as a single character in a similar way to how **`std::cin`** does it. To read numeric values into something like **`uint8_t`** the **`ss::uint8`** and **`ss::int8`** types can be used. These are wrappers arround the corresponding char aliases and can be implicitly converted to and from them. When these types are given to the parser he will try to read the given data and store it in the underlying element, but this time as a numeric value instead of a single character.
```cpp
// returns std::tuple<std::string, ss::uint8, float>
auto [id, age, grade] = p.get_next<std::string, ss::uint8, float>();
uint8_t age_copy = age;
```
2021-02-27 21:30:47 +01:00
## Restrictions
2023-05-25 01:10:37 +02:00
Custom **`restrictions`** can be used to narrow down the conversions of unwanted values. **`ss::ir`** (in range) and **`ss::ne`** (none empty) are some of those:
```cpp
2023-08-06 00:16:43 +02:00
// ss::ne makes sure that the id is not empty
// ss::ir makes sure that the grade will be in range [0, 10]
2022-03-28 19:11:41 +02:00
// returns std::tuple<std::string, int, float>
2023-08-06 00:16:43 +02:00
auto [id, age, grade] =
2022-03-28 19:11:41 +02:00
p.get_next<ss::ne<std::string>, int, ss::ir<float, 0, 10>>();
```
2022-03-28 20:16:46 +02:00
If the restrictions are not met, the conversion will fail. Other predefined restrictions are **`ss::ax`** (all except), **`ss::nx`** (none except) and **`ss::oor`** (out of range), **`ss::lt`** (less than), ...(see *restrictions.hpp*):
```cpp
// all ints exept 10 and 20
ss::ax<int, 10, 20>
// only 10 and 20
ss::nx<int, 10, 20>
// all values except the range [0, 10]
ss::oor<int, 0, 10>
```
2022-03-28 20:16:46 +02:00
To define a restriction, a class/struct needs to be made which has a **`ss_valid`** method which returns a **`bool`** and accepts one object. The type of the conversion will be the same as the type of the passed object within **`ss_valid`** and not the restriction itself. Optionally, an **`error`** method can be made to describe the invalid conversion.
```cpp
template <typename T>
struct even {
bool ss_valid(const T& value) const {
return value % 2 == 0;
}
2021-01-01 18:38:53 +01:00
// optional
const char* error() const {
return "number not even";
}
};
```
```cpp
2023-08-06 12:26:54 +02:00
// ...
// only even numbers will pass without invoking error handling
// returns std::tuple<std::string, int>
2023-08-06 12:26:54 +02:00
const auto& [id, age] = p.get_next<std::string, even<int>, void>();
// ...
```
## Custom conversions
2022-03-28 20:16:46 +02:00
Custom types can be used when converting values. A specialization of the **`ss::extract`** function needs to be made and you are good to go. A custom conversion for an enum would look like this:
```cpp
2021-01-03 03:18:33 +01:00
enum class shape { circle, square, rectangle, triangle };
template <>
inline bool ss::extract(const char* begin, const char* end, shape& dst) {
const static std::unordered_map<std::string, shape>
shapes{{"circle", shape::circle},
2021-01-03 03:18:33 +01:00
{"square", shape::square},
{"rectangle", shape::rectangle},
{"triangle", shape::triangle}};
if (auto it = shapes.find(std::string(begin, end)); it != shapes.end()) {
dst = it->second;
return true;
}
return false;
}
```
2022-03-28 20:16:46 +02:00
The shape enum will be used in an example below. The **`inline`** is there just to prevent multiple definition errors. The function returns **`true`** if the conversion was a success, and **`false`** otherwise. The function uses **`const char*`** begin and end for performance reasons.
## Error handling
2023-08-06 12:33:05 +02:00
By default, the parser handles errors only using the **`valid`** method which would return **`false`** if the file could not be opened, or if the conversion could not be made (invalid types, invalid number of columns, ...).\
The **`eof`** method can be used to detect if the end of the file was reached.
2023-08-06 12:26:54 +02:00
2022-03-28 20:16:46 +02:00
Detailed error messages can be accessed via the **`error_msg`** method, and to enable them **`ss::string_error`** needs to be included in the setup. If **`ss::string_error`** is not defined, the **`error_msg`** method will not be defined either.
2021-02-27 18:29:33 +01:00
2023-08-06 20:02:08 +02:00
The line number can be fetched using the **`line`** method.
2024-02-27 23:56:13 +01:00
The cursor position can be fetched using the **`position`** method.
```cpp
2024-02-27 23:56:13 +01:00
const std::string& parser::error_msg() const;
bool parser::valid() const;
bool parser::eof() const;
size_t parser::line() const;
size_t parser::position() const;
// ...
ss::parser<ss::string_error> parser;
```
2023-08-06 12:26:54 +02:00
2023-08-06 12:33:05 +02:00
The above two methods are preferable if invalid inputs are expected and allows for fast handling, but the parser can also be forced to throw an exception in case of an invalid input using the **`ss::throw_on_error`** setup option. Enabling exceptions also makes the **`valid`** method always return **`true`**.
2023-08-06 12:26:54 +02:00
```cpp
ss::parser<ss::throw_on_error> parser;
```
2023-08-06 12:33:05 +02:00
*Note, enabling this option will also make the parser throw if the constructor fails.*
## Substitute conversions
2024-02-28 00:01:37 +01:00
The parser can also be used to effectively parse files whose rows are not always in the same format (not a classical CSV but still CSV-like). A more complicated example would be the best way to demonstrate such a scenario.\
2023-08-06 12:33:05 +02:00
***Important, substitute conversions do not work when throw_on_error is enabled.***
Supposing we have a file containing different shapes in given formats:
2021-01-03 17:37:01 +01:00
* circle RADIUS
* square SIDE
* rectangle SIDE_A SIDE_B
* triangle SIDE_A SIDE_B SIDE_C
```
rectangle 2 3
circle 10
triangle 3 4 5
...
```
2021-01-03 21:08:05 +01:00
The delimiter is " ", and the number of columns varies depending on which shape it is. We are required to read the file and to store information (shape and area) of the shapes into a data structure in the same order as they are in the file.
```cpp
ss::parser p{"shapes.txt", " "};
if (!p.valid()) {
exit(EXIT_FAILURE);
}
std::vector<std::pair<shape, double>> shapes;
2021-01-03 03:07:39 +01:00
while (!p.eof()) {
// non negative double
using udbl = ss::gte<double, 0>;
2021-01-03 16:22:56 +01:00
2021-01-03 03:18:33 +01:00
auto [circle_or_square, rectangle, triangle] =
2023-08-06 12:26:54 +02:00
p.try_next<ss::nx<shape, shape::circle, shape::square>, udbl>()
.or_else<ss::nx<shape, shape::rectangle>, udbl, udbl>()
.or_else<ss::nx<shape, shape::triangle>, udbl, udbl, udbl>()
2021-01-03 03:07:39 +01:00
.values();
2023-08-06 12:26:54 +02:00
if (!p.valid()) {
// handle error
continue;
}
2021-01-03 03:18:33 +01:00
if (circle_or_square) {
auto& [s, x] = circle_or_square.value();
double area = (s == shape::circle) ? x * x * M_PI : x * x;
shapes.emplace_back(s, area);
}
2021-01-03 03:07:39 +01:00
if (rectangle) {
auto& [s, a, b] = rectangle.value();
shapes.emplace_back(s, a * b);
}
2021-01-03 03:07:39 +01:00
if (triangle) {
auto& [s, a, b, c] = triangle.value();
double sh = (a + b + c) / 2;
if (sh >= a && sh >= b && sh >= c) {
double area = sqrt(sh * (sh - a) * (sh - b) * (sh - c));
shapes.emplace_back(s, area);
}
}
}
2021-01-03 03:07:39 +01:00
/* do something with the stored shapes */
/* ... */
```
2021-01-03 21:08:05 +01:00
It is quite hard to make an error this way since most things will be checked at compile time.
2022-03-29 12:31:59 +02:00
The **`try_next`** method works in a similar way as **`get_next`** but returns a **`composite`** which holds a **`tuple`** with an **`optional`** to the **`tuple`** returned by **`get_next`**. This **`composite`** has an **`or_else`** method (looks a bit like **`tl::expected`**) which is able to try additional conversions if the previous failed. **`or_else`** also returns a **`composite`**, but in its tuple is the **`optional`** to the **`tuple`** of the previous conversions and an **`optional`** to the **`tuple`** of the new conversion. (sounds more complicated than it is.
2022-03-28 20:16:46 +02:00
To fetch the **`tuple`** from the **`composite`** the **`values`** method is used. The value of the above used conversion would look something like this:
```cpp
std::tuple<
std::optional<std::tuple<shape, double>>,
std::optional<std::tuple<shape, double, double>>,
std::optional<std::tuple<shape, double, double, double>>
>
```
2022-03-28 20:16:46 +02:00
Similar to the way that **`get_next`** has a **`get_object`** alternative, **`try_next`** has a **`try_object`** alternative, and **`or_else`** has a **`or_object`** alternative. Also all rules applied to **`get_next`** also work with **`try_next`** , **`or_else`**, and all the other **`composite`** conversions.
2021-01-03 21:08:05 +01:00
2022-03-28 20:16:46 +02:00
Each of those **`composite`** conversions can accept a lambda (or anything callable) as an argument and invoke it in case of a valid conversion. That lambda itself need not have any arguments, but if it does, it must either accept the whole **`tuple`**/object as one argument or all the elements of the tuple separately. If the lambda returns something that can be interpreted as **`false`** the conversion will fail, and the next conversion will try to apply. Rewriting the whole while loop using lambdas would look like this:
2021-01-03 16:22:56 +01:00
```cpp
// non negative double
using udbl = ss::gte<double, 0>;
2023-08-06 12:26:54 +02:00
while (!p.eof()) {
p.try_next<ss::nx<shape, shape::circle, shape::square>, udbl>(
[&](const auto& data) {
const auto& [s, x] = data;
double area = (s == shape::circle) ? x * x * M_PI : x * x;
shapes.emplace_back(s, area);
})
.or_else<ss::nx<shape, shape::rectangle>, udbl, udbl>(
[&](shape s, double a, double b) { shapes.emplace_back(s, a * b); })
.or_else<ss::nx<shape, shape::triangle>, udbl, udbl, udbl>(
[&](auto s, auto a, auto b, auto c) {
double sh = (a + b + c) / 2;
if (sh >= a && sh >= b && sh >= c) {
double area = sqrt(sh * (sh - a) * (sh - b) * (sh - c));
shapes.emplace_back(s, area);
}
})
.on_error([] {
// handle error
2021-01-03 16:22:56 +01:00
});
2023-08-06 12:26:54 +02:00
}
2021-01-03 16:22:56 +01:00
```
2022-03-28 20:16:46 +02:00
It is a bit less readable, but it removes the need to check which conversion was invoked. The **`composite`** also has an **`on_error`** method which accepts a lambda which will be invoked if no previous conversions were successful. The lambda can take no arguments or just one argument, an **`std::string`**, in which the error message is stored if **`string_error`** is enabled:
2021-01-03 16:22:56 +01:00
```cpp
p.try_next<int>()
.on_error([](const std::string& e) { /* int conversion failed */ })
.or_object<x, double>()
2023-08-06 12:26:54 +02:00
.on_error([] { /* int and x conversions failed (all previous failed) */ });
2021-01-03 16:22:56 +01:00
```
*See unit tests for more examples.*
2021-01-03 19:45:49 +01:00
# Rest of the library
2021-01-03 21:08:05 +01:00
First of all, *type_traits.hpp* and *function_traits.hpp* contain many handy traits used in the parser. Most of them are operating on tuples of elements and can be utilized in projects.
2021-01-03 19:45:49 +01:00
## The converter
2022-03-28 20:16:46 +02:00
**`ss::parser`** is used to manipulate on files. It has a builtin file reader, but the conversions themselves are done using the **`ss::converter`**.
2021-01-03 19:45:49 +01:00
2022-03-28 20:16:46 +02:00
To convert a string the **`convert`** method can be used. It accepts a c-string as input and a delimiter, as **`std::string`**, and retruns a **`tuple`** of objects in the same way **`get_next`** does it for the parser. A whole object can be returned too using the **`convert_object`** method, again in an identical way **`get_object`** doest it for the parser.
2021-01-03 19:45:49 +01:00
```cpp
ss::converter c;
auto [x, y, z] = c.convert<int, double, char>("10::2.2::3", "::");
if (c.valid()) {
// do something with x y z
}
auto s = c.convert_object<student, std::string, int, double>("name,20,10", ",");
if (c.valid()) {
// do something with s
}
```
All setup parameters, special types and restrictions work on the converter too.
Error handling is also identical to error handling of the parser.
2021-01-03 19:45:49 +01:00
2023-08-06 00:16:43 +02:00
The converter has also the ability to just split the line, and depending if either quoting or escaping are enabled it may change the line, rather than creating a copy, for performance reasons. It returns an **`std::vector`** of **`std::pair`**s of pointers, begin and end, each pair representing a split segment (column) of the whole string. The vector can then be used in a overloaded **`convert`** method. This allows the reuse of the same line without splitting it on every conversion.
2021-01-03 19:45:49 +01:00
```cpp
ss::converter c;
auto split_line = c.split("circle 10", " ");
auto [s, r] = c.convert<shape, int>(split_line);
```
Using the converter is also an easy and fast way to convert single values.
```cpp
ss::converter c;
std::string s;
std::cin >> s;
int num = c.convert<int>(s.c_str());
```
2022-03-28 20:16:46 +02:00
The same setup parameters also apply for the converter, tho multiline has not impact on it. Since escaping and quoting potentially modify the content of the given line, a converter which has those setup parameters defined does not have the same convert method, **`the input line cannot be const`**.
2021-02-27 18:29:33 +01:00
2021-02-07 16:02:10 +01:00
# Using as a project dependency
2021-02-07 15:41:00 +01:00
2021-02-07 15:53:43 +01:00
## CMake
2021-02-07 15:41:00 +01:00
2021-02-07 15:59:17 +01:00
If the repository is cloned within the CMake project, it can be added in the following way:
2021-02-07 15:46:03 +01:00
```cmake
2021-02-07 15:41:00 +01:00
add_subdirectory(ssp)
```
2021-02-07 15:59:17 +01:00
Alternatively, it can be fetched from the repository:
2021-02-07 15:46:03 +01:00
```cmake
2021-02-07 15:41:00 +01:00
include(FetchContent)
FetchContent_Declare(
ssp
GIT_REPOSITORY https://github.com/red0124/ssp.git
GIT_TAG origin/master
GIT_SHALLOW TRUE)
FetchContent_MakeAvailable(ssp)
```
Either way, after you prepare the target, you just have to invoke it in your project:
2021-02-07 15:46:03 +01:00
```cmake
2021-02-07 18:55:04 +01:00
target_link_libraries(project PUBLIC ssp fast_float)
2021-02-07 15:41:00 +01:00
```
2021-02-07 15:53:43 +01:00
## Meson
2021-02-07 15:41:00 +01:00
2021-02-07 15:59:17 +01:00
Create an *ssp.wrap* file in your *subprojects* directory with the following content:
```wrap
[wrap-git]
url = https://github.com/red0124/ssp
revision = origin/master
```
Then simply fetch the dependency and it is ready to be used:
2021-02-07 15:53:43 +01:00
```meson
2023-02-01 00:52:23 +01:00
ssp_dep = dependency('ssp')
2021-02-07 15:53:43 +01:00
```