pip install FileSampler
get_line()
method returns a string which represents one row.get_lines()
methods returns a list of strings which represents multiple rows.get_random_lines()
method returns a list of stirngs that represents multple rowsfrom FileSampler import TextSampler sampler_text = TextSampler('c:\file path\text_file.txt') # single line string_line = sampler_text.get_a_line(int_line_number) print(string_line) # multiple lines list_lines = sampler_text.get_lines(list_line_numbers) for line in list_lines: print(line) # random lines list_random_lines = sampler_text.get_random_lines(int_number_of_random_lines) for line in list_random_lines: print(line)
m_string_endline_character
- self-explanatory (default is endline character\n
)m_bool_estimate
- if set toTrue
, blank lines in the file will not be read or indexed (default isFalse
)
number_of_lines
-> type: int; returns the number of lines in the fileestimate_mode
-> type: bool; flag if the class counted all the line lenghts in the
file or estimated the line length based on a sample
header
: returns the header of the csv file if there is one in the form of a tuple of stringshas_header
: a boolean flag which returns True or False if a header exists
from FileSampler import CsvSampler sampler_csv = CsvSampler('~/myfile.csv') # single line series_line = sampler_csv.get_a_csv_line(int_line_number) # returns a pandas Series with the; the index is the header of it exists # multiple lines df_lines = sampler_csv.get_csv_lines(list_line_numbers) for string_column in df_lines: for int_line in range(0, len(df_lines)): print(df_lines[string_column].iloc[int_line]) # returns a pandas DataFrame where the columns are the file headers; the above example will # print each line of each column in the dataframe # random lines df_random_lines = sampler_csv.get_csv_random_lines(int_number_of_random_lines) for int_line in range(0, len(df_random_lines)): print(df_random_lines.iloc[int_line]) # returns a pandas DataFrame whre the columns are the header of it exists # the above example prints each full line of the csv file
m_bool_ignore_bad_lines
- if set toTrue
, lines that do not fit the csv file format will be ignored (default isFalse
)string_values_delimiter
- character used by the csv to separate values within a line (default is,
)string_quotechar
- character used by the csv to surround values that contain the value delimiting character (default is"
)m_bool_has_header
- if set toTrue
, the first line of the csv file will be used at the header / column names for the DataFrame (default isTrue
)