0 Replies - 10335 Views - Last Post: 08 September 2016 - 03:03 PM

#1 NeoTifa  Icon User is offline

  • NeoTifa Codebreaker, the Scourge of Devtester
  • member icon





Reputation: 4099
  • View blog
  • Posts: 18,184
  • Joined: 24-September 08

[MATLAB] FormatData for Mixed-Typed Datasets Using importdata

Posted 08 September 2016 - 03:03 PM

When using the importdata method on mixed-typed datasets, such as the classic Iris.csv, it will return an nx1 vector of strings, with each string being the row data, as such:

[
{'sepal_length,sepal_width, petal_length, petal_width, class'},
{'5.1,3.5,1.4,0.2,Iris-setosa'},
{'4.9,3.0,1.4,0.2,Iris-setosa'},
{'4.7,3.2,1.3,0.2,Iris-setosa'},
...
]



Kinda annoying when you want to index. This method will return the split nxm string cell array.
[
[{'sepal_length'}, {'sepal_width'}, {'petal_length'}, {'petal_width'}, {'class'}],
[{'5.1'}, {'3.5'}, {'1.4'}, {'0.2'}, {'Iris-setosa'}],
...
]



FormatData
function [ formatted_data, header_labels ] = FormatData( data )
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% FormatData - Formats incoming data into usable data arrays
%
% Incoming data was gathered from importdata and contains numbers and
% and strings, so it's a column vector of cell array strings,
% which the strings contain the individual attributes.
% Author: Neotifa @ dreamincode.net
%
% args:
% data - col vector with string cells of data
%
% rets:
% formatted_data - a 2D cell array of data of strings
% header_labels - a vector of string cells containing header labels
%
% Assumptions:
% - Data has first row as header labels
% - Data has at least 1 row of data after labels
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

    data_size = size(data);
    rows = data_size(1);
    cols = data_size(2);
    
    % Preallocating space, accounting for header
    formatted_data = cell(rows-1:cols);
    % Grab labels
    header_labels = strsplit(cell2mat(data(1,: )), ',');
    
    % Converting data
    for i = 2:rows
        % i - 1 for formatted_data due to ignoring label data
        formatted_data(i-1,: ) = strsplit(cell2mat(data(i,: )), ',');
    end
    
    % Redefining data_size with new dims. Next 2 lines can be removed,
    % they're just nice for displaying and debugging. 
    data_size = size(formatted_data);
    fprintf('Dataset has %i records and %i attributes.', data_size(1), data_size(2));
end



Running this code:
rawdata = importdata('Iris.csv');
disp(strcat(datafile, ' loaded successfully.'));
disp('Formatting data...');
data = FormatData(rawdata);



should give this output:
Iris.csv loaded successfully.
Formatting data...
Dataset has 150 records and 5 attributes.>> 



If you use this code please give credit in comments or something. It could lead to academic integrity issues.

Is This A Good Question/Topic? 0
  • +

Page 1 of 1