Title: | Read Large Text Files |
---|---|
Description: | Read large text files by splitting them in smaller files. Package 'bigreadr' also provides some convenient wrappers around fread() and fwrite() from package 'data.table'. |
Authors: | Florian Privé [aut, cre] |
Maintainer: | Florian Privé <[email protected]> |
License: | GPL-3 |
Version: | 0.2.5 |
Built: | 2024-11-18 03:54:38 UTC |
Source: | https://github.com/privefl/bigreadr |
Read large text file by splitting lines.
big_fread1(file, every_nlines, .transform = identity, .combine = rbind_df, skip = 0, ..., print_timings = TRUE)
big_fread1(file, every_nlines, .transform = identity, .combine = rbind_df, skip = 0, ..., print_timings = TRUE)
file |
Path to file that you want to read. |
every_nlines |
Maximum number of lines in new file parts. |
.transform |
Function to transform each data frame corresponding to each
part of the |
.combine |
Function to combine results (list of data frames). |
skip |
Number of lines to skip at the beginning of |
... |
Other arguments to be passed to data.table::fread,
excepted |
print_timings |
Whether to print timings? Default is |
A data.frame
by default; a data.table
when data.table = TRUE
.
Read large text file by splitting columns.
big_fread2(file, nb_parts = NULL, .transform = identity, .combine = cbind_df, skip = 0, select = NULL, progress = FALSE, part_size = 500 * 1024^2, ...)
big_fread2(file, nb_parts = NULL, .transform = identity, .combine = cbind_df, skip = 0, select = NULL, progress = FALSE, part_size = 500 * 1024^2, ...)
file |
Path to file that you want to read. |
nb_parts |
Number of parts in which to split reading (and transforming).
Parts are referring to blocks of selected columns.
Default uses |
.transform |
Function to transform each data frame corresponding to each block of selected columns. Default doesn't change anything. |
.combine |
Function to combine results (list of data frames). |
skip |
Number of lines to skip at the beginning of |
select |
Indices of columns to keep (sorted). Default keeps them all. |
progress |
Show progress? Default is |
part_size |
Size of the parts if |
... |
Other arguments to be passed to data.table::fread,
excepted |
The outputs of fread2
+ .transform
, combined with .combine
.
Merge data frames
cbind_df(list_df)
cbind_df(list_df)
list_df |
A list of multiple data frames with the same observations in the same order. |
One merged data frame.
str(iris) str(cbind_df(list(iris, iris)))
str(iris) str(cbind_df(list(iris, iris)))
Read text file(s)
fread2(input, ..., data.table = FALSE, nThread = getOption("bigreadr.nThread"))
fread2(input, ..., data.table = FALSE, nThread = getOption("bigreadr.nThread"))
input |
Path to the file(s) that you want to read from. This can also be a command, some text or an URL. If a vector of inputs is provided, resulting data frames are appended. |
... |
Other arguments to be passed to data.table::fread. |
data.table |
Whether to return a |
nThread |
Number of threads to use. Default uses all threads minus one. |
A data.frame
by default; a data.table
when data.table = TRUE
.
tmp <- fwrite2(iris) iris2 <- fread2(tmp) all.equal(iris2, iris) ## fread doesn't use factors
tmp <- fwrite2(iris) iris2 <- fread2(tmp) all.equal(iris2, iris) ## fread doesn't use factors
Write a data frame to a text file
fwrite2(x, file = tempfile(), ..., quote = FALSE, nThread = getOption("bigreadr.nThread"))
fwrite2(x, file = tempfile(), ..., quote = FALSE, nThread = getOption("bigreadr.nThread"))
x |
Data frame to write. |
file |
Path to the file that you want to write to.
Defaults uses |
... |
Other arguments to be passed to data.table::fwrite. |
quote |
Whether to quote strings (default is |
nThread |
Number of threads to use. Default uses all threads minus one. |
Input parameter file
, invisibly.
tmp <- fwrite2(iris) iris2 <- fread2(tmp) all.equal(iris2, iris) ## fread doesn't use factors
tmp <- fwrite2(iris) iris2 <- fread2(tmp) all.equal(iris2, iris) ## fread doesn't use factors
Get the number of lines of a file.
nlines(file)
nlines(file)
file |
Path of the file. |
The number of lines as one integer.
tmp <- fwrite2(iris) nlines(tmp)
tmp <- fwrite2(iris) nlines(tmp)
Merge data frames
rbind_df(list_df)
rbind_df(list_df)
list_df |
A list of multiple data frames with the same variables in the same order. |
One merged data frame with the names of the first input data frame.
str(iris) str(rbind_df(list(iris, iris)))
str(iris) str(rbind_df(list(iris, iris)))
Split file every nlines
Get files from splitting.
split_file(file, every_nlines, prefix_out = tempfile(), repeat_header = FALSE) get_split_files(split_file_out)
split_file(file, every_nlines, prefix_out = tempfile(), repeat_header = FALSE) get_split_files(split_file_out)
file |
Path to file that you want to split. |
every_nlines |
Maximum number of lines in new file parts. |
prefix_out |
Prefix for created files. Default uses |
repeat_header |
Whether to repeat the header row in each file.
Default is |
split_file_out |
Output of split_file. |
A list with
name_in
: input parameter file
,
prefix_out
: input parameter 'prefix_out“,
nfiles
: Number of files (parts) created,
nlines_part
: input parameter every_nlines
,
nlines_all
: total number of lines of file
.
Vector of file paths created by split_file.
tmp <- fwrite2(iris) infos <- split_file(tmp, 100) str(infos) get_split_files(infos)
tmp <- fwrite2(iris) infos <- split_file(tmp, 100) str(infos) get_split_files(infos)