[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database is locked #2999

Open
aixiangwang opened this issue Jun 25, 2022 · 5 comments
Open

database is locked #2999

aixiangwang opened this issue Jun 25, 2022 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@aixiangwang
Copy link

When I try to increase the train_clients_per_round ,there is an error is that database is locked.I then individually tested the clients with each error and found that they were accessible.
image

@aixiangwang aixiangwang added the bug Something isn't working label Jun 25, 2022
@zcharles8
Copy link
Collaborator

Hi @aixiangwang. Can you provide the information requested on the new bug template, including:

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
  • Python package versions (e.g., TensorFlow Federated, TensorFlow)
  • A minimal reproduction of the bug.

This looks like some edge case around the SQL-backed datasets TFF provides, but without the information above I'm not certain what's actually going on.

@aixiangwang
Copy link
Author

Hi @aixiangwang. Can you provide the information requested on the new bug template, including:

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
  • Python package versions (e.g., TensorFlow Federated, TensorFlow)
  • A minimal reproduction of the bug.

This looks like some edge case around the SQL-backed datasets TFF provides, but without the information above I'm not certain what's actually going on.

Thank you for your reply.I use tensorflow2.8 and tensorflow-federated0.21 running on the Windows 10.
image
image
you can see client_id = 'f3928_39' in the error appear in lastest 10 randomly selected clients,that make the error.But I found client_id = 'f3928_39' have happened several times before.So I'm very confused.Looking forward to your reply!

@zcharles8
Copy link
Collaborator

Could you add the full code that actually causes this bug? Even better, if you can narrow it down to a smaller reproduction of it, that'd be really helpful.

@aixiangwang
Copy link
Author
aixiangwang commented Aug 1, 2022

The complete code example is /tensorflow_federated/ simple_fedavg in the current Github project path.I just increased the train_clients_per_round parameter in emnist_fedavg_main.py from 2 to 10, which represents the number of clients sampled per round.
image

Some of the key code in the example is shown below:
train_data, test_data = get_emnist_dataset()

def tff_model_fn():
"""Constructs a fully initialized model for use in federated averaging."""
keras_model = create_original_fedavg_cnn_model(only_digits=True)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metrics = [tf.keras.metrics.SparseCategoricalAccuracy()]
return tff.learning.from_keras_model(
keras_model,
loss=loss,
metrics=metrics,
input_spec=train_data.element_type_structure)

iterative_process = simple_fedavg_tff.build_federated_averaging_process(
tff_model_fn, server_optimizer_fn, client_optimizer_fn)
server_state = iterative_process.initialize()
keras_model = create_original_fedavg_cnn_model(only_digits=True)
for round_num in range(FLAGS.total_rounds):
sampled_clients = np.random.choice(
train_data.client_ids,
size=FLAGS.train_clients_per_round,
replace=False)
print(sampled_clients)
sampled_train_data = [
train_data.create_tf_dataset_for_client(client)
for client in sampled_clients
]
server_state, train_metrics = iterative_process.next(
server_state, sampled_train_data)
print(f'Round {round_num}')
print(f'\tTraining metrics: {train_metrics}')
if round_num % FLAGS.rounds_per_eval == 0:
server_state.model.assign_weights_to(keras_model)
accuracy = evaluate(keras_model, test_data)
print(f'\tValidation accuracy: {accuracy * 100.0:.2f}%')

As shown in the figure below, the dataset loaded successfully and went through several iterations successfully.
image
However, an error occurs at a later turn, as shown in the figure because the database is locked and the next federated procedure fails.
image
image
Looking forward to your reply!

@zcharles8
Copy link
Collaborator

@aixiangwang I have not been able to repro this issue, and we have seen no other reports about this.

I suspect this might be something about the environment you are executing in. A similar type of error occurred in #3479, and was because the user had cached the dataset to a locked directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants