Remove duplicate rows in table (2024)

138 views (last 30 days)

Show older comments

DavidL88 on 20 Jan 2021

  • Link

    Direct link to this question

    https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table

  • Link

    Direct link to this question

    https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table

Commented: DavidL88 on 28 Jan 2021

Accepted Answer: Adam Danz

Open in MATLAB Online

Hi

I have a table with four columns and roughly 45,000 rows (example below). The first column is the name of statistical test (of which there are several hundred different tests). For every statistical test the values in the 4th column are duplicated (at .25 and 0.5). Can anyone advise how I delete the first of these rows (the first one of the .25 and the first one of the 0.5 rows) for every statistical test?

'Perm t-test equal [250ms,500ms 92, 108]: Avg: 11_right FCL'-1.3490.1850.492

'Perm t-test equal [250ms,500ms 92, 108]: Avg: 11_right FCL'-1.4570.1550.496

'Perm t-test equal [250ms,500ms 92, 108]: Avg: 11_right FCL'-1.5440.1340.500

'Perm t-test equal [500ms,900ms 92, 108]: Avg: 11_right FCL'-1.5440.1290.500

'Perm t-test equal [500ms,900ms 92, 108]: Avg: 11_right FCL'-1.6150.1120.503

'Perm t-test equal [500ms,900ms 92, 108]: Avg: 11_right FCL'-1.6650.1000.507

1 Comment

Show -1 older commentsHide -1 older comments

dpb on 20 Jan 2021

Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#comment_1275414

  • Link

    Direct link to this comment

    https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#comment_1275414

I don't see the duplication in the sample dataset? (I'm presuming the 0.25 and 0.5 are confidence limits of the test and not values of the statistic as Adam presumed below).

To my eyes anyways, the above data are all for the same test for the first three and then the second set of three; but the fourth column data values are unique other than by happenstance it appears that the last @250ms is same in second and fourth columns as the first @500ms.

Not at all clear what is the result wanted from this dataset, to me, anyways...

Sign in to comment.

Sign in to answer this question.

Accepted Answer

Adam Danz on 20 Jan 2021

  • Link

    Direct link to this answer

    https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#answer_602570

  • Link

    Direct link to this answer

    https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#answer_602570

Edited: Adam Danz on 20 Jan 2021

Open in MATLAB Online

Follow the demo.

  • T is a table
  • T.Test contains the test names which can be strings, character vectors, categoricals, or numeric.
  • T.col4 is the name of column 4.

The demo removes the first line where column 4 equals 0.25 or 0.50 for each test. The tests do not have to be in order.

% Create table

rng('default') % for reproducibility

T = table(repelem({'A';'B';'C'},5,1),rand(15,1), rand(15,1), repmat([0;.25;.25;.5;.5],3,1),...

'VariableNames',{'Test','col2','col3','col4'});

T.col4([7,14]) = .33;

disp(T)

Test col2 col3 col4 _____ _______ ________ ____ {'A'} 0.81472 0.14189 0 {'A'} 0.90579 0.42176 0.25 {'A'} 0.12699 0.91574 0.25 {'A'} 0.91338 0.79221 0.5 {'A'} 0.63236 0.95949 0.5 {'B'} 0.09754 0.65574 0 {'B'} 0.2785 0.035712 0.33 {'B'} 0.54688 0.84913 0.25 {'B'} 0.95751 0.93399 0.5 {'B'} 0.96489 0.67874 0.5 {'C'} 0.15761 0.75774 0 {'C'} 0.97059 0.74313 0.25 {'C'} 0.95717 0.39223 0.25 {'C'} 0.48538 0.65548 0.33 {'C'} 0.80028 0.17119 0.5

% For each testtype, identify the first row where col4 is .25 and .50

[testID, testNames] = findgroups(T.Test);

rowNum1 = arrayfun(@(i) {find(testID==i & T.col4==0.25, 2)}, unique(testID));

rowNum2 = arrayfun(@(i) {find(testID==i & T.col4==0.50, 2)}, unique(testID));

rowNums = cell2mat(cellfun(@(c){padarray(c,[2-numel(c),0],NaN,'post')},[rowNum1', rowNum2']));

rmRows = rowNums(2, ~isnan(rowNums(2,:)));

% remove rows from table

T(rmRows, : ) = []

T = 11x4 table

Test col2 col3 col4 _____ _______ ________ ____ {'A'} 0.81472 0.14189 0 {'A'} 0.90579 0.42176 0.25 {'A'} 0.91338 0.79221 0.5 {'B'} 0.09754 0.65574 0 {'B'} 0.2785 0.035712 0.33 {'B'} 0.54688 0.84913 0.25 {'B'} 0.95751 0.93399 0.5 {'C'} 0.15761 0.75774 0 {'C'} 0.97059 0.74313 0.25 {'C'} 0.48538 0.65548 0.33 {'C'} 0.80028 0.17119 0.5

15 Comments

Show 13 older commentsHide 13 older comments

DavidL88 on 20 Jan 2021

Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#comment_1275426

  • Link

    Direct link to this comment

    https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#comment_1275426

Hi Adam,

Thank you. I adapted this to my dataset and it seems to erase any row with 0.25 or 0.5. Looking through values listed in rowNum1 and 2, both rows in T that match the values 0.25 and 0.5 seem to be identified and are listed.

Adam Danz on 20 Jan 2021

Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#comment_1275483

  • Link

    Direct link to this comment

    https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#comment_1275483

@DavidL88 My original solution removed the first row that contains .25 or .50 in col4 for each test even if there were not duplicates!

I've updated the answer fix that.

Now the solution only removes the rows if the .25 or .50 is a duplicate within each test.

DavidL88 on 20 Jan 2021

Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#comment_1275493

  • Link

    Direct link to this comment

    https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#comment_1275493

Open in MATLAB Online

Hi Adam

The effect as you demonstrate is what I'm looking for. I'm not sure why I got a different result. There is a duplicate of all 0.25 and 0.5s. I copy a sample of the table below before running this code. The 0.25 values for this section are in rows 39 and 40 of the table T.

FCL'0.4493778418169440.6530867283179210.242187500000000

FCL' 0.3791172178920760.7055736065983500.246093750000000

FCL'0.4117158947985100.6838290427393150.250000000000000

FCL'0.4117158947985100.6803299175206200.250000000000000

FCL'0.5641012876531560.5738565358660340.253906250000000

FCL'0.7941318306287340.4291427143214200.257812500000000

This is the same section after running the code. In rowNum1 I can see both 39 and 40 listed.

FCL'0.4493778418169440.6530867283179210.242187500000000

FCL'0.3791172178920760.7055736065983500.246093750000000

FCL'0.5641012876531560.5738565358660340.253906250000000

FCL'0.7941318306287340.4291427143214200.257812500000000

This the exact code I ran on my table T. T3 is the last column and T4 is the first column.

% For each testtype, identify the first row where col4 is .25 and .50

[testID, testNames] = findgroups(T.T4);

rowNum1 = arrayfun(@(i) {find(testID==i & T.T3==0.25, 1)}, unique(testID));

rowNum2 = arrayfun(@(i) {find(testID==i & T.T3==0.50, 1)}, unique(testID));

% remove rows from table

T([rowNum1{:}, rowNum2{:}], : ) = [];

Adam Danz on 20 Jan 2021

Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#comment_1275503

  • Link

    Direct link to this comment

    https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#comment_1275503

Edited: Adam Danz on 20 Jan 2021

Open in MATLAB Online

First, please see my previous comment which I may have added while you were typing your response (I updated my solution).

Second, those trailing 0s are suspicious. Are you sure they are duplicates? For example,

0.25 == 0.2500000000000001

ans = logical

If floating point representation is causing problems like this, you'll need to modify these two lines of my solution

rowNum1 = arrayfun(@(i) {find(testID==i & abs(T.col4-0.25)<0.00001, 2)}, unique(testID));

rowNum2 = arrayfun(@(i) {find(testID==i & abs(T.col4-0.50)<0.00001, 2)}, unique(testID));

% ---> ---> ---> ---> ---> ---> ---> ---> ^^^^^^^^^^^^^^^^^^^^^^^

DavidL88 on 20 Jan 2021

Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#comment_1275633

  • Link

    Direct link to this comment

    https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#comment_1275633

Open in MATLAB Online

Thanks. I ran that second script (adapted below). It didn't make any change to the table. The duplicate rows remained after running it this time. For rowNum1 and 2 the same vales are there (both are listed as 648x1 cell same as before with first script (with 39 and 40, the first set of duplicates, listed there too).

% For each testtype, identify the first row where col4 is .25 and .50

[testID, testNames] = findgroups(T.T4);

rowNum1 = arrayfun(@(i) {find(testID==i & T.T3==0.25, 2)}, unique(testID));

rowNum2 = arrayfun(@(i) {find(testID==i & T.T3==0.50, 2)}, unique(testID));

rowNums = cell2mat(cellfun(@(c){padarray(c,[2-numel(c),0],NaN,'post')},[rowNum1', rowNum2']));

rmRows = rowNums(2, ~isnan(rowNums(2,:)));

% remove rows from table

T(rmRows, : ) = []

It shouldn't be a floating point as those numbers represent exact time-stamps. I double-checked and confirmed this on one duplicate.

tf = isequal(T.T3(39),T.T3(40))

tf =

logical

1

Adam Danz on 21 Jan 2021

Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#comment_1275648

  • Link

    Direct link to this comment

    https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#comment_1275648

Edited: Adam Danz on 21 Jan 2021

Open in MATLAB Online

> For rowNum1 and 2 the same vales are there

Impossible. rowNum1 values are based on T.T3==.25; rowNum2 values are based on T.T3==.50; It would therefore be impossible to have the same values in both variables unless the result is an empty array (no matches). Or maybe you meant that they have the same values as the previous version which would only happen if all tests had duplicates for .25 and .50.

> both are listed as 648x1 cell

It's expected that they are cell arrays with the same size.

> It shouldn't be a floating point as those numbers represent exact time-stamps

They aren't integers so it's not debatable whether they are represented by floating point or not. The question is whether their floating point representation is causing a problem with the equality tests. It doesn't matter that T.T3(39) equals T.T3(40). What matters is if those values equal 0.25 or 0.50, exactly.

Example:

4/3

ans = 1.3333

4/3-1

ans = 0.3333

(4/3- 1)*3

ans = 1.0000

(4/3- 1)*3 == 1

ans = logical

I wonder if you're using long format which would also explain the trailing 0s.

Could you attach a mat file containing the table?

DavidL88 on 21 Jan 2021

Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#comment_1275658

  • Link

    Direct link to this comment

    https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#comment_1275658

Edited: DavidL88 on 21 Jan 2021

Open in MATLAB Online

Just tried

tf = isequal(T.T3(39),0.25)

tf =

logical

1

Given that rowNum1 and 2 extracted these cells they should be equal to .25 and .5 exactly?

I sorted rowNum1 and got back this. Should it not be 39, 246, 453, etc?

39

40

246

247

453

454

660

661

867

868

1074

1075

rmRows is coming back as []

Can I email you the table rather than uploading it here?

Adam Danz on 21 Jan 2021

Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#comment_1276763

  • Link

    Direct link to this comment

    https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#comment_1276763

The mathwork contact button does not support uploads.

You could upload it and remove it after I confirm that I received it.

Or you could upload it to a cloud service (e.g. Dropbox), send the link, and then remove it from the cloud service.

Or you could create a very similar table with dummy-data and make sure that it has the same problem as your table.

DavidL88 on 21 Jan 2021

Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#comment_1276863

  • Link

    Direct link to this comment

    https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#comment_1276863

Hi Adam,

I figured it out. Sorry my mistake. If you look at the first example you'll see that in the first column of the duplicate rows the names are slight different (where the ms times are). I split this column after the second ms and re-ran your code, using the second new column as the test idenifier, so that the two rows are now identified as the same test. It worked perfectly and removed all the duplicate rows. Thanks for your help!

Adam Danz on 21 Jan 2021

Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#comment_1276993

  • Link

    Direct link to this comment

    https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#comment_1276993

Ahh.... dpb actually pointed that out in a comment 19 hrs ago :)

DavidL88 on 23 Jan 2021

Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#comment_1280992

  • Link

    Direct link to this comment

    https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#comment_1280992

Open in MATLAB Online

Hi Adam,

Is there a way to adapt this code to remove any unique test that does not have a value <0.05 in col3? So if none of the cells below in col3 for the 'A' test have a value that is <0.05 then remove all rows for 'A'?

Test col2 col3 col4

_____ _______ ________ ____

{'A'} 0.81472 0.14189 0

{'A'} 0.90579 0.42176 0.25

{'A'} 0.91338 0.79221 0.5

{'B'} 0.09754 0.65574 0

{'B'} 0.2785 0.035712 0.33

{'B'} 0.54688 0.84913 0.25

{'B'} 0.95751 0.93399 0.5

{'C'} 0.15761 0.75774 0

{'C'} 0.97059 0.74313 0.25

{'C'} 0.48538 0.65548 0.33

Adam Danz on 23 Jan 2021

Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#comment_1281157

  • Link

    Direct link to this comment

    https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#comment_1281157

Yep, change "T.col4==0.25" to T.col4<0.05 and then test for empty cells using cellfun('isempty,___). It will return a logical column vector indicating which test groups do not contain any col4 values less than 0.05.

DavidL88 on 23 Jan 2021

Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#comment_1281292

  • Link

    Direct link to this comment

    https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#comment_1281292

Open in MATLAB Online

Thank you. I tried this.

rowNum3 = arrayfun(@(i) {find(testID==i & TableMain.T2<0.50)}, unique(testID));

rowNums4 = cell2mat(cellfun(@isempty,rowNum3))

% remove rows from table

TableMain(rmRows4, : ) = []

I assume this is the right structure. rowNum3 finds those Tests with a value < 0.05 in T2. rowNums4 identifies those that came back negative and I can use this vector to remove those rows.

I'm getting the below message. I thought this might be due to the dimensions being different in that the previous code expected two matches/duplicates but this code is open to a variable number of matches? I tried removing ", 2" after <0.05 in the first line but same response.

Matrix dimensions must agree.

Error in Untitled4>@(i){find(testID==i&TableMain.T2<0.50)} (line 6)

rowNum3 = arrayfun(@(i) {find(testID==i & TableMain.T2<0.50)}, unique(testID));

Error in Untitled4 (line 6)

rowNum3 = arrayfun(@(i) {find(testID==i & TableMain.T2<0.50)}, unique(testID));

Adam Danz on 23 Jan 2021

Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#comment_1281352

  • Link

    Direct link to this comment

    https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#comment_1281352

Open in MATLAB Online

You were close...

idx below returns a logical vector the same size as testNames indicating which test-names are flagged. Then you have to identify which rows of the table have those test names.

rowNum3 = arrayfun(@(i) {find(testID==i & T.col4<0.50)}, unique(testID));

idx = cellfun(@isempty,rowNum3);

rmIdx = ismember(T.Test, testNames(idx));

T(rmIdx,:) = []

DavidL88 on 28 Jan 2021

Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#comment_1292288

  • Link

    Direct link to this comment

    https://www.mathworks.com/matlabcentral/answers/722478-remove-duplicate-rows-in-table#comment_1292288

This worked thanks!

Sign in to comment.

More Answers (0)

Sign in to answer this question.

See Also

Categories

AI, Data Science, and StatisticsStatistics and Machine Learning ToolboxANOVAAnalysis of Variance and Covariance

Find more on Analysis of Variance and Covariance in Help Center and File Exchange

Tags

  • table

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

An Error Occurred

Unable to complete the action because of changes made to the page. Reload the page to see its updated state.


Remove duplicate rows in table (19)

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list

Americas

  • América Latina (Español)
  • Canada (English)
  • United States (English)

Europe

  • Belgium (English)
  • Denmark (English)
  • Deutschland (Deutsch)
  • España (Español)
  • Finland (English)
  • France (Français)
  • Ireland (English)
  • Italia (Italiano)
  • Luxembourg (English)
  • Netherlands (English)
  • Norway (English)
  • Österreich (Deutsch)
  • Portugal (English)
  • Sweden (English)
  • Switzerland
    • Deutsch
    • English
    • Français
  • United Kingdom(English)

Asia Pacific

Contact your local office

Remove duplicate rows in table (2024)

References

Top Articles
Latest Posts
Article information

Author: Rueben Jacobs

Last Updated:

Views: 5302

Rating: 4.7 / 5 (77 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Rueben Jacobs

Birthday: 1999-03-14

Address: 951 Caterina Walk, Schambergerside, CA 67667-0896

Phone: +6881806848632

Job: Internal Education Planner

Hobby: Candle making, Cabaret, Poi, Gambling, Rock climbing, Wood carving, Computer programming

Introduction: My name is Rueben Jacobs, I am a cooperative, beautiful, kind, comfortable, glamorous, open, magnificent person who loves writing and wants to share my knowledge and understanding with you.