9 Commits

Author SHA1 Message Date
ado
ddaa446819 Update version 2024-02-28 00:58:14 +01:00
red0124
8bad2d72ea Merge pull request #35 from red0124/feature/csv_buffer
Feature/csv buffer
2024-02-28 00:13:20 +01:00
ado
899a6e6f5e [skip ci] Update README 2024-02-28 00:08:04 +01:00
ado
0d3d8fa83e [skip ci] Update README 2024-02-28 00:04:59 +01:00
ado
7bbe2879cd [skip ci] Update README 2024-02-28 00:02:58 +01:00
ado
063d56fad9 [skip ci] Update README 2024-02-28 00:01:37 +01:00
ado
df78865f04 [skip ci] Update README 2024-02-27 23:56:13 +01:00
ado
852481d233 Fix converter unit tests 2024-02-27 02:49:50 +01:00
ado
c516a6f826 Fix extraction tests 2024-02-26 02:37:30 +01:00
5 changed files with 65 additions and 28 deletions

View File

@@ -2,7 +2,7 @@ cmake_minimum_required(VERSION 3.14)
project(
ssp
VERSION 1.6.2
VERSION 1.7.0
DESCRIPTION "csv parser"
HOMEPAGE_URL "https://github.com/red0124/ssp"
LANGUAGES CXX

View File

@@ -17,13 +17,13 @@
[![windows-msys2-clang](https://github.com/red0124/ssp/workflows/win-msys2-clang-ci/badge.svg)](https://github.com/red0124/ssp/actions/workflows/win-msys2-clang.yml)
[![windows-msvc](https://github.com/red0124/ssp/workflows/win-msvc-ci/badge.svg)](https://github.com/red0124/ssp/actions/workflows/win-msvc.yml)
A header only "csv" parser which is fast and versatile with modern C++ api. Requires compiler with C++17 support. [Can also be used to convert strings to specific types.](#the-converter)
A header only CSV parser which is fast and versatile with modern C++ API. Requires compiler with C++17 support. [Can also be used to efficiently convert strings to specific types.](#the-converter)
Conversion for floating point values invoked using [fast-float](https://github.com/fastfloat/fast_float) . \
Function traits taken from *qt-creator* .
# Example
Lets say we have a csv file containing students in a given format \<Id,Age,Grade\> and we want to parse and print all the valid values:
Lets say we have a CSV file containing students in a given format (Id,Age,Grade) and we want to parse and print all the valid values:
```shell
$ cat students.csv
@@ -58,6 +58,7 @@ Bill (Heath) Gates 65 3.3
* Can work without exceptions
* [Works with headers](#headers)
* [Works with quotes, escapes and spacings](#setup)
* [Works with CSV data stored in buffers](#buffer-mode)
* [Works with values containing new lines](#multiline)
* [Columns and rows can be ignored](#special-types)
* [Works with any type of delimiter](#delimiter)
@@ -158,7 +159,7 @@ while (!p.eof()) {
The alternate example with exceptions disabled will be used to show some of the features of the library. The **`get_next`** method returns a tuple of objects specified inside the template type list.
If a conversion could not be applied, the method would return a tuple of default constructed objects, and the **`valid`** method would return **`false`**, for example if the third (grade) column in our csv could not be converted to a float the conversion would fail.
If a conversion could not be applied, the method would return a tuple of default constructed objects, and the **`valid`** method would return **`false`**, for example if the third (grade) column in our CSV could not be converted to a float the conversion would fail.
If **`get_next`** is called with a **`tuple`** as template parameter it would behave identically to passing the same tuple parameters to **`get_next`**:
```cpp
@@ -202,14 +203,27 @@ struct student {
auto tied() { return std::tie(id, age, grade); }
};
```
The method can be used to compare the object, serialize it, deserialize it, etc. Now **`get_next`** can accept such a struct and deduce the types to which to convert the csv.
The method can be used to compare the object, serialize it, deserialize it, etc. Now **`get_next`** can accept such a struct and deduce the types to which to convert the CSV.
```cpp
// returns student
auto s = p.get_next<student>();
```
This works with the iteration loop too.
*Note, the order in which the members of the tied method are returned must match the order of the elements in the csv*.
*Note, the order in which the members of the tied method are returned must match the order of the elements in the CSV*.
## Buffer mode
The parser also works with buffers containing CSV data instead of files. To parse buffer data with the parser simply create the parser by giving it the buffer, as **`const char*`**, and its size. The initial example using a buffer instead of a file would look similar to this:
```cpp
std::string buffer = "James Bailey,65,2.5\nBrian S. Wolfe,40,1.9\n";
ss::parser<ss::throw_on_error> p{buffer.c_str(), buffer.size()};
for (const auto& [id, age, grade] : p.iterate<std::string, int, float>()) {
std::cout << id << ' ' << age << ' ' << grade << std::endl;
}
return 0;
```
## Setup
By default, many of the features supported by the parser are disabled. They can be enabled within the template parameters of the parser. For example, to enable quoting and escaping the parser would look like:
```cpp
@@ -241,7 +255,7 @@ Empty lines can be ignored by defining **`ss::ignore_empty`** within the setup p
```cpp
ss::parser<ss::ignore_empty> p{file_name};
```
If this setup option is not set then reading an empty line will result in an error (unless only one column is present within the csv).
If this setup option is not set then reading an empty line will result in an error (unless only one column is present within the CSV).
### Quoting
Quoting can be enabled by defining **`ss::quote`** within the setup parameters. A single character can be defined as the quoting character, for example to use **`"`** as a quoting character:
@@ -290,7 +304,7 @@ Escaping and quoting can be used to leave the space if needed.
```
### Multiline
Multiline can be enabled by defining **`ss::multilne`** within the setup parameters. It enables the possibility to have the new line characters within rows. The new line character needs to be either escaped or within quotes so either **`ss::escape`** or **`ss::quote`** need to be enabled. There is a specific problem when using multiline, for example, if a row had an unterminated quote, the parser would assume it to be a new line within the row, so until another quote is found, it will treat it as one line which is fine usually, but it can cause the whole csv file to be treated as a single line by mistake. To prevent this **`ss::multiline_restricted`** can be used which accepts an unsigned number representing the maximum number of lines which can be allowed as a single multiline. Examples:
Multiline can be enabled by defining **`ss::multilne`** within the setup parameters. It enables the possibility to have the new line characters within rows. The new line character needs to be either escaped or within quotes so either **`ss::escape`** or **`ss::quote`** need to be enabled. There is a specific problem when using multiline, for example, if a row had an unterminated quote, the parser would assume it to be a new line within the row, so until another quote is found, it will treat it as one line which is fine usually, but it can cause the whole CSV file to be treated as a single line by mistake. To prevent this **`ss::multiline_restricted`** can be used which accepts an unsigned number representing the maximum number of lines which can be allowed as a single multiline. Examples:
```cpp
ss::parser<ss::multiline, ss::quote<'\"'>, ss::escape<'\\'>> p{file_name};
@@ -341,7 +355,7 @@ Gates 65 3.3'
```
## Special types
Passing **`void`** makes the parser ignore a column. In the initial example **`void`** could be given as the second template parameter to ignore the second (age) column in the csv, a tuple of only 2 parameters would be retuned:
Passing **`void`** makes the parser ignore a column. In the initial example **`void`** could be given as the second template parameter to ignore the second (age) column in the CSV, a tuple of only 2 parameters would be retuned:
```cpp
// returns std::tuple<std::string, float>
auto [id, grade] = p.get_next<std::string, void, float>();
@@ -383,6 +397,12 @@ if (std::holds_alternative<float>(grade)) {
// grade set as char
}
```
Passing **`char`** and types that are aliases to it such as **`uint8_t`** and **`int8_t`** make the parser interpret the input data as a single character in a similar way to how **`std::cin`** does it. To read numeric values into something like **`uint8_t`** the **`ss::uint8`** and **`ss::int8`** types can be used. These are wrappers arround the corresponding char aliases and can be implicitly converted to and from them. When these types are given to the parser he will try to read the given data and store it in the underlying element, but this time as a numeric value instead of a single character.
```cpp
// returns std::tuple<std::string, ss::uint8, float>
auto [id, age, grade] = p.get_next<std::string, ss::uint8, float>();
uint8_t age_copy = age;
```
## Restrictions
Custom **`restrictions`** can be used to narrow down the conversions of unwanted values. **`ss::ir`** (in range) and **`ss::ne`** (none empty) are some of those:
@@ -454,12 +474,13 @@ The **`eof`** method can be used to detect if the end of the file was reached.
Detailed error messages can be accessed via the **`error_msg`** method, and to enable them **`ss::string_error`** needs to be included in the setup. If **`ss::string_error`** is not defined, the **`error_msg`** method will not be defined either.
The line number can be fetched using the **`line`** method.
The cursor position can be fetched using the **`position`** method.
```cpp
const std::string& parser::error_msg();
bool parser::valid();
bool parser::eof();
size_t parser::line();
const std::string& parser::error_msg() const;
bool parser::valid() const;
bool parser::eof() const;
size_t parser::line() const;
size_t parser::position() const;
// ...
ss::parser<ss::string_error> parser;
@@ -474,7 +495,7 @@ ss::parser<ss::throw_on_error> parser;
## Substitute conversions
The parser can also be used to effectively parse files whose rows are not always in the same format (not a classical csv but still csv-like). A more complicated example would be the best way to demonstrate such a scenario.\
The parser can also be used to effectively parse files whose rows are not always in the same format (not a classical CSV but still CSV-like). A more complicated example would be the best way to demonstrate such a scenario.\
***Important, substitute conversions do not work when throw_on_error is enabled.***
Supposing we have a file containing different shapes in given formats:

View File

@@ -6,7 +6,7 @@ project(
'cpp_std=c++17',
'buildtype=debugoptimized',
'wrap_mode=forcefallback'],
version: '1.6.2',
version: '1.7.0',
meson_version:'>=0.54.0')
fast_float_dep = dependency('fast_float')

View File

@@ -119,7 +119,7 @@ TEST_CASE_TEMPLATE("converter test valid conversions", T, int, ss::uint8) {
c.convert<void, std::variant<T, double>, double>("junk;5;6.6", ";");
REQUIRE(c.valid());
REQUIRE(std::holds_alternative<T>(std::get<0>(tup)));
CHECK_EQ(tup, std::make_tuple(std::variant<T, double>{5}, 6.6));
CHECK_EQ(tup, std::make_tuple(std::variant<T, double>{T(5)}, 6.6));
}
{
auto tup =
@@ -248,7 +248,7 @@ TEST_CASE_TEMPLATE("converter test valid conversions with exceptions", T, int,
c.convert<void, std::variant<T, double>, double>("junk;5;6.6", ";");
REQUIRE(c.valid());
REQUIRE(std::holds_alternative<T>(std::get<0>(tup)));
CHECK_EQ(tup, std::make_tuple(std::variant<T, double>{5}, 6.6));
CHECK_EQ(tup, std::make_tuple(std::variant<T, double>{T(5)}, 6.6));
} catch (ss::exception& e) {
FAIL(std::string{e.what()});
}

View File

@@ -2,15 +2,31 @@
#include <algorithm>
#include <ss/extract.hpp>
template <typename T>
struct std::numeric_limits<ss::numeric_wrapper<T>>
: public std::numeric_limits<T> {};
namespace {
template <typename T>
struct std::is_signed<ss::numeric_wrapper<T>> : public std::is_signed<T> {};
struct numeric_limits : public std::numeric_limits<T> {};
template <typename T>
struct std::is_unsigned<ss::numeric_wrapper<T>> : public std::is_unsigned<T> {};
struct numeric_limits<ss::numeric_wrapper<T>> : public std::numeric_limits<T> {
};
template <typename T>
struct is_signed : public std::is_signed<T> {};
template <>
struct is_signed<ss::int8> : public std::true_type {};
template <typename T>
struct is_unsigned : public std::is_unsigned<T> {};
template <>
struct is_unsigned<ss::uint8> : public std::true_type {};
} /* namespace */
static_assert(is_signed<ss::int8>::value);
static_assert(is_unsigned<ss::uint8>::value);
TEST_CASE("testing extract functions for floating point values") {
CHECK_FLOATING_CONVERSION(123.456, float);
@@ -38,7 +54,7 @@ TEST_CASE("testing extract functions for floating point values") {
CHECK_EQ(value, type(input)); \
} \
/* check negative too */ \
if (std::is_signed_v<type>) { \
if (is_signed<type>::value) { \
std::string s = std::string("-") + #input; \
type value; \
bool valid = ss::extract(s.c_str(), s.c_str() + s.size(), value); \
@@ -89,7 +105,7 @@ TEST_CASE_TEMPLATE(
"extract test functions for numbers with out of range inputs", T, short, us,
int, ui, long, ul, ll, ull, ss::uint8) {
{
std::string s = std::to_string(std::numeric_limits<T>::max());
std::string s = std::to_string(numeric_limits<T>::max());
auto t = ss::to_num<T>(s.c_str(), s.c_str() + s.size());
CHECK(t.has_value());
for (auto& i : s) {
@@ -102,14 +118,14 @@ TEST_CASE_TEMPLATE(
CHECK_FALSE(t.has_value());
}
{
std::string s = std::to_string(std::numeric_limits<T>::min());
std::string s = std::to_string(numeric_limits<T>::min());
auto t = ss::to_num<T>(s.c_str(), s.c_str() + s.size());
CHECK(t.has_value());
for (auto& i : s) {
if (std::is_signed_v<T> && i != '9' && i != '.') {
if (is_signed<T>::value && i != '9' && i != '.') {
i = '9';
break;
} else if (std::is_unsigned_v<T>) {
} else if (is_unsigned<T>::value) {
s = "-1";
break;
}