Most significantly, a file-based dataset will, for the most part, contain the same information whether you open it today or a week, a month, or a year from now. 1 While their specific structures and details vary, all of these formats are what I would describe as file-based-that is, they contain (more or less) historical data in static files that can be downloaded from a database, emailed by a colleague, or accessed via file-sharing sites. Often, these formats came with their own file extensions-some of which you may have seen: xls, csv, dbf, and spss are all file formats typically associated with “data” files. For decades, data wrangling was a highly specialized pursuit, leading companies and organizations to create a whole range of distinct (and sometimes proprietary) digital data formats designed to meet their particular needs. Obviously, it’s impossible to begin assessing the quality of a dataset without first reviewing its contents-but this is sometimes easier said than done. But how do we actually accomplish these things in practice? We discussed the need to both “clean” and standardize data, as well as the need to augment it by combining it with other datasets. In Chapter 3, we focused on the many characteristics that contribute to data quality-from the completeness, consistency, and clarity of data integrity to the reliability, validity, and representativeness of data fit. Working with File-Based and Feed-Based Data in Python
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |